Professional Documents
Culture Documents
(MS113)
2
INTRODUCTION TO R LANGUAGE
3
VECTORS
1. Creating Vectors
4
Output:-
[1] 1 2 3 4 5
Output:-
Output:-
Explanation:
• c() is used to concatenate elements into a vector.
• numeric_vector contains numeric values.
• character_vector contains character strings.
• logical_vector contains logical values (TRUE and FALSE).
5
# Creating a numeric vector
numbers <- c(10, 20, 30, 40, 50)
numbers
Output:-
[1] 10 20 30 40 50
Output:-
[1] 30
3. Vector Operations:
• Arithmetic operations (+, -, *, /) and related functions (sum(),
mean(), min(), max(), median(), etc.).
• sort(): Sort a vector.
• rank(): Compute the rank of elements in a vector.
6
• unique(): Find unique elements in a vector.
• duplicated(): Identify duplicated elements.
• diff(): Calculate differences between consecutive elements.
• cumsum(): Calculate cumulative sums.
• cumprod(): Calculate cumulative products.
Vector Operations
# Addition of vectors
sum_vector <- x + y
sum_vector
Output:-
[1] 6 6 6 6 6
# Multiplication of vectors
product_vector <- x * y
product_vector
Output:-
[1] 5 8 9 8 5
Explanation:
• x + y performs element-wise addition, resulting in sum_vector (c(6,
6, 6, 6, 6)).
7
• x * y performs element-wise multiplication, resulting in
product_vector (c(5, 8, 9, 8, 5)).
8
Explanation:
• mean(grades) calculates the mean of the grades vector.
• sort(grades) arranges the elements of grades in ascending order,
creating sorted_grades.
Vectorized Functions
Explanation:
• paste() combines elements from fruits with additional text.
• factor(fruits) converts fruits into a factor, useful for categorical
data.
9
Lists
Output:-
[[2]]
[1] 2
10
[[3]]
[1] 3
[[4]]
[1] 4
[[5]]
[1] 5
>
$age
[1] 23
$city
[1] "Ambala"
11
data_frame_list <- list(data = data.frame(x = 1:3, y = c("A", "B", "C")))
data_frame_list
>
[[1]][[2]]
[1] 2
[[2]]
[[2]][[1]]
[1] "a"
[[2]][[2]]
[1] "b"
[[2]][[3]]
[1] "c"
12
>
num_list[[6]] <- 6
num_list
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 4
[[5]]
[1] 5
[[6]]
[1] 6
>
13
# Add a new element to a list:
named_list
$age
[1] 23
$city
[1] "Ambala"
$gender
[1] “Female”
>
> named_list$gender <- NULL
> named_list
$name
[1] "Pushpa"
$age
[1] 23
$city
[1] "Ambala"
>
14
# Combine two lists:
$name
[1] "Pushpa"
$age
[1] 23
$city
[1] "Ambala"
>
>
15
squared_list
[[2]]
[1] 4
[[3]]
[1] 9
[[4]]
[1] 16
[[5]]
[1] 25
[[6]]
[1] 36
>
[[2]]
[1] 42
[[3]]
[1] TRUE
[[4]]
[1] 1 2 3
16
>
[[1]]$b
[1] 2
[[2]]
[[2]]$x
[1] "apple"
[[2]]$y
[1] "banana"
>
>
17
# Convert a list to a matrix:
matrix_from_list <- matrix(unlist(num_list), nrow = 2)
matrix_from_list
> matrix_from_list <- matrix(unlist(num_list), nrow = 2)
> matrix_from_list
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
>
>
18
Matrices
>
# Access the element in the first row and second column of 'matrix1'.
element <- matrix1[1, 2]
element
> element <- matrix1[1, 2]
> element
[1] 3
19
>
>
>
>
20
# Find the transpose of 'matrix2'. Store the result in a new matrix
named 'transpose_matrix'.
transpose_matrix <- t(matrix2)
> transpose_matrix <- t(matrix2)
> transpose_matrix
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
>
>
>
21
# Multiply 'matrix2' by 'matrix4'.
product_matrix <- matrix2 %*% matrix4
product_matrix
> product_matrix <- matrix2 %*% matrix4
> product_matrix
[,1] [,2]
[1,] 30 66
[2,] 36 81
[3,] 42 96
>
>
>
22
diagonal_matrix
> diagonal_matrix <- diag(1:3)
> diagonal_matrix
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3
>
>
23
is_square <- dim(matrix2)[1] == dim(matrix2)[2]
is_square
> is_square <- dim(matrix2)[1] == dim(matrix2)[2]
> is_square
[1] TRUE
>
>
>
24
Factors
Factors are the data objects which are used to categorize the data and store it as
levels. They can store both strings and integers. They are useful in the columns
which have a limited number of unique values. Like "Male, "Female" and True,
False etc. They are useful in data analysis for statistical modeling.
# You can create a factor by using the factor() function.
color <- factor(c("red", "green", "blue", "red", "green"))
color
> color <- factor(c("red", "green", "blue", "red", "green"))
> color
[1] red green blue red green
Levels: blue green red
>
# You can use the levels() function to check the levels of a factor.
levels(color)
levels
> levels(color)
[1] "blue" "green" "red"
# Number of levels
num_levels <- nlevels(color)
print(num_levels)
25
# Total number of elements
num_elements <- length(color)
print(num_elements)
> num_levels <- nlevels(color)
> print(num_levels)
[1] 3
> # Total number of elements
> num_elements <- length(color)
> print(num_elements)
[1] 5
>
26
Data frames
>
The structure of the data frame can be seen by using str() function.
The statistical summary and nature of the data can be obtained by
applying summary() function.
> str(df)
27
'data.frame': 3 obs. of 3 variables:
$ Name : chr "Pushpa” "neeraj" "ankit"
$ Age : num 23 23 23
$ Salary: num 50000 60000 70000
> summary(df)
Name Age Salary
Length:3 Min. :23 Min. :50000
Class :character 1st Qu.:23 1st Qu.:55000
Mode :character Median :23 Median :60000
Mean :23 Mean :60000
3rd Qu.:23 3rd Qu.:65000
Max. :23 Max. :70000
>
>
28
CHARTS AND GRAPHS IN R
Line chart
x <- 1:10
y <- c(3, 5, 6, 7, 8, 9, 8, 7, 6, 5)
plot(x, y, type = "l", xlab = "X-axis label", ylab = "Y-axis label", main = "MY
FIRST GRAPH IN R")
29
Bar chart
Scatter plot
x <- c(1, 2, 3, 4, 5)
y <- c(5, 4, 3, 2, 1)
plot(x, y, xlab = "X-axis label", ylab = "Y-axis label", main = "MY SCATTER
PLOT IN R")
30
Histogram
x <- rnorm(100)
hist(x, xlab = "X-axis label", ylab = "Y-axis label", main = "Histogram
example")
Pie chart
# Example data
labels <- c("A", "B", "C", "D", "E")
values <- c(20, 30, 10, 15, 25)
colors <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2")
# Create a pie chart with custom colors
pie(values, labels = labels, col = colors)
31
Stacked bar chart
# Example data
labels <- c("A", "B", "C", "D")
values1 <- c(20, 30, 10, 15)
values2 <- c(10, 25, 15, 20)
values3 <- c(5, 15, 10, 5)
32
Kernel density plot in r
# Generate example data
set.seed(123)
data <- rnorm(1000)
33
For Loop
A for loop in R is a control flow structure that allows you to execute a block of code
repeatedly for a specified number of iterations. It is particularly useful when you
need to perform a task a fixed number of times or when you want to iterate over
elements in a vector or other data structures.
• variable: This is a loop variable that takes on the values from the sequence
during each iteration.
• sequence: A sequence of values, typically a vector, over which the loop
variable will iterate.
# Use a for loop to generate a sequence of numbers
for (i in 1:5) {
cat(i,"JK ")
}
for (i in 1:5) {
cat(i,"JK ","\n")
}
34
# Create a 9x9 multiplication table
for (i in 1:9) {
for (j in 1:9) {
cat(i, "x", j, "=", i * j, "\t")
}
cat("\n")
}
35
Descriptive Statistics using R
Descriptive statistics is a fundamental aspect of data analysis, and R is a powerful tool for
performing such analyses. To demonstrate descriptive statistics using the mtcars dataset in R, we
will cover some of the most commonly used functions and techniques.
36
Step 2: Summary Statistics
You can use the summary() function to obtain an overview of the dataset. This function provides
statistics for each variable in the dataset.
summary(mtcars)
This will display descriptive statistics for each variable, such as mean, median, quartiles, and
min/max values.
37
Step 3: Measures of Central Tendency
To calculate the mean, median, and mode, you can use the following functions:
### mean ###
mean(mtcars$mpg)
Output:-
#######mode #######
mode(mtcars)
Output:-
38
mode_mpg <- Mode(mtcars$mpg)
39
###### table ########
table(mtcars$mpg)
Output:
40
####### test of normality ######
library(ggplot2)
qqnorm(mtcars$mpg)
qqline(mtcars$mpg)
Output:
41
Various types of Distribution
Binomial Distribution
The binomial distribution model deals with finding the probability of success of an event which has only
two possible outcomes in a series of experiments.
Normal Distribution
In a random collection of data from independent sources, it is generally observed that the distribution of
data is normal.
Which means, on plotting a graph with the value of the variable in the horizontal axis and the count of
the values in the vertical axis we get a bell shape curve.
42
Poisson Distribution
The Poisson distribution is the probability distribution of independent event occurrences in an
interval. If λ is the mean occurrence per interval, then the probability of having x occurrences
within a given interval is:
43
Hypothesis Testing
What is Hypothesis ?
in statistics, is a claim or statement about a property of a population
Types of Hypothesis :-
44
ANOVA
ANOVA test or analysis of variance is a statistical test to evaluate how a quantitative
dependent variable is affected by other individual variables. In simpler terms, it’s a
form of hypothesis testing for variance in a given group or groups.
TYPES OF ANOVA:
Concept:
One-Way ANOVA is used when comparing means across three or more groups that are
independent of each other. It helps determine whether there are any statistically significant
differences among the means of the groups.
Application:
Examples in business include comparing the average sales performance of different regions,
analyzing the effectiveness of various marketing strategies, or evaluating employee
performance across multiple departments.
45
Key Steps:
a. Formulate hypotheses.
b. Collect data from multiple groups.
c. Calculate the F-statistic.
d. Compare the calculated F-statistic with the critical value.
e. Draw conclusions based on the comparison
46
Regression Analysis
Regression analysis is a statistical method used in various fields to examine the
relationship between one dependent variable and one or more independent
variables.
The goal is to understand how changes in the independent variables correlate
with changes in the dependent variable.
The regression model generates an equation that represents the relationship
between the variables.
The equation takes the form of Y=a+bX+e, where Y is the dependent variable, X is
the independent variable, a is the intercept, b is the slope, and e is the error term
accounting for unobserved factors affecting Y.
47
Regression Analysis in R :-
# Coefficients
coefficients <- coef(lm_model)
print(coefficients)
# Intercept
intercept <- coefficients[1]
print(intercept)
# Residuals
residuals <- resid(lm_model)
print(residuals)
48
# R-squared
r_squared <- summary(lm_model)$r.squared
print(r_squared)
# p-values
p_values <- summary(lm_model)$coefficients[, "Pr(>|t|)"]
print(p_values)
# F-statistic
f_statistic <- summary(lm_model)$fstatistic
print(f_statistic)
# Standard errors
standard_errors <- summary(lm_model)$coefficients[, "Std. Error"]
print(standard_errors)
Output:-
49
50
Logistic Regression
Logistic regression is a statistical technique used for predicting binary outcomes.
It is a popular method for classification problems where the dependent variable is
categorical and can take only two values, such as "yes" or "no," "success" or
"failure," or "spam" or "not spam."
51
Neural Network
What is Neural Network ?
Neural Network in R, Neural Network is just like a human nervous system, which is made up of
interconnected neurons, in other words, a neural network is made up of interconnected information
processing units.
A neural network helps us to extract meaningful information and detect hidden patterns from complex data
sets.
This method is developed to solve problems that are easy for humans and difficult for machines.
For example, identifying pictures like dogs and cats. These problems are often referred to as pattern
recognition.
52