Professional Documents
Culture Documents
HIMANSHU GOEL
2023-01-16
Table of Contents
Introduction to
R ................................................................................................................................................... 4
Basics of
R ................................................................................................................................................
........... 4 Numeric, integer, Character, and logical data
type ........................................................................ 4 Examine the characteristic of a variable
............................................................................................ 4 Simple Common
Operators ...................................................................................................................... 5
Simple
Functions ..................................................................................................................................
....... 6
Vectors ..........................................................................................................................................
........................ 9 Scalar
operations ..................................................................................................................................
.... 12 Indexing and subsetting
vectors ......................................................................................................... 13
Operations on
Vectors ............................................................................................................................ 15
Array and
Matrices .......................................................................................................................................
17
Matrix
multiplication ............................................................................................................................
.. 19
Factors ...........................................................................................................................................
.................... 22
Lists ..........................................................................................................................................
...................... 22
User Defined
Functions ................................................................................................................................... 28
Functions in
R ................................................................................................................................................
. 28 Function with no argument and no
return ..................................................................................... 28 Function with no argument
but it returns a value ...................................................................... 28 Function with
arguments but no return value .............................................................................. 29
Function with arguments with return
value .................................................................................. 29
Conditional
Statements.................................................................................................................................... 29
Types of
Statements .................................................................................................................................
.... 29 If
Statement ..................................................................................................................................
............... 29 If Else
Statement ..................................................................................................................................
..... 30
Nested If Else
Statement ........................................................................................................................ 30
Loops In
R ................................................................................................................................................
......... 31 For
Loop ..........................................................................................................................................
............. 31 While
Loop ..........................................................................................................................................
........ 31
Repeat
Loop ................................................................................................................................................
32
Data
Preparation ....................................................................................................................................
............ 35 Exploring Data in
R ....................................................................................................................................... 35
R Functions For understanding Data in Data
Frames ..................................................................... 37
Reading
data ................................................................................................................................................
.... 40 Subsetting
dataframe ...................................................................................................................................
42 Data
Summary .......................................................................................................................................
......... 44
Descriptive
Statistics ....................................................................................................................................
51
Tidyverse ..........................................................................................................................................
..................... 71 Spreading across the
tibbles ..................................................................................................................... 71 Visualise
changes over time ......................................................................................................................
74 Pivot
Wider ............................................................................................................................................
.......... 76 Separating and
uniting ................................................................................................................................ 78
Missing
Values ...........................................................................................................................................
..... 80 Quick
plots ..............................................................................................................................................
......... 85 A beutiful barplot
here .............................................................................................................................. 105 A
pleasant histogram
here ....................................................................................................................... 106
A delightful data frame
here ................................................................................................................... 107
Date....................................................................................................................................................
.................... 108 Parse Date,Time,Months,and
Year ........................................................................................................ 108 Creating Date and
Time ............................................................................................................................ 108
Time Spans and
Duration ......................................................................................................................... 109
Assignment
1 .....................................................................................................................................................
111
Q1 ..................................................................................................................................................
.................... 111
Q2 ..................................................................................................................................................
.................... 112
Q3 ..................................................................................................................................................
.................... 115
Q4 ..................................................................................................................................................
.................... 116
Q5 ..................................................................................................................................................
.................... 118
Q6 ..................................................................................................................................................
.................... 121
Q7 ..................................................................................................................................................
.................... 121
Assignment
2 .....................................................................................................................................................
122
Q1 Define a user defined function to check whether the word is a palindrome. ............... 122
Q2 Identify whether the number is divisible by
3. ......................................................................... 122 Q3 Identify all prime numbers less than a
100. ............................................................................... 122 Q4 Calculate factorial of
n. ....................................................................................................................... 123
Q5 Find g.c.d of
(x,y) ................................................................................................................................... 123
Hypothesis
Testing ..........................................................................................................................................
123 One Sample T-
Test ...................................................................................................................................... 123
Paired Sample T-
Test ................................................................................................................................. 124
Independent Sample T-
Test .................................................................................................................... 124
ANOVA ........................................................................................................................................
..................... 125 Simple
Regression ....................................................................................................................................
... 125 Step-wise
Regression .................................................................................................................................
125
Multiple
Regression ....................................................................................................................................
127
Introduction to R
Basics of R
Numeric, integer, Character, and logical data type
#Create a numeric variable
i=1.5 j=4+i ab=1.3
k=data.class(i)
data.class(k)
## [1] "character"
## [1] "INDIA"
View(Country)
#Create a logical variable
flag=F
Examine the characteristic of a variable
#Abstract class class(i)
## [1] "numeric"
## [1] "double"
class(Country) ## [1]
"character"
typeof(Country) ## [1]
"character" class(flag)
## [1] "logical"
typeof(flag)
## [1] "logical"
p=1 class(p)
## [1] "numeric"
## [1] 1
#Multiplication
5*3
## [1] 15
#Division
4/3
## [1] 1.333333
#Exponential 4^3
## [1] 64
#Nesting Operators
(4- 3)^2
## [1] 1
(5- 3)*2
## [1] 4
## [1] 5 7 9 11
g=c(s,14)
g
## [1] 5 7 9 11 14
#length of a object
length(s) ## [1] 4 s[5] ##
[1] NA s[11]
## [1] NA
length(i)
## [1] 1
##Infinite Value
3^1250
## [1] Inf
-3^1250
## [1] -Inf
3/0
## [1] -Inf
## [1] "ab" "Country" "flag" "g" "i" "j" "k" ## [8] "p" "s"
Simple Functions
#natural log
log(15) ## [1]
2.70805 exp(4)
## [1] 56
# round off the value to 3 digits
round(19.5438, 3) ## [1] 19.544
round(19.5432, 1) ## [1] 19.5
## [1] 16 floor(15.85432)
## [1] 15
#testing the variable type and coercion into other type---- is.logical(i) ## [1]
FALSE is.numeric(i) ## [1] TRUE
j=3+4i data.class(j)
## [1] "complex"
is.complex(j)
## [1] TRUE
ac=as.character(i)
data.class(ac) ## [1]
"character" is.integer(a)
## [1] TRUE
j=1.5 is.integer(j)
## [1] FALSE
## [1] 1 is.integer(j)
## [1] TRUE
View(j)
#length of variable
#creating a vector
i=c(2,3,4,5)
k=c(2,"india") i=35
#concatenate any type of vector and automatically convert to one class g=c(i,k)
f=c(7,8,9,10, 14, 12, 13, 14)
# Operations and functions on variables #for Addition length
of vectors has to be same i+f
## [1] 42 43 44 45 49 47 48 49 i-f
## [1] 28 27 26 25 21 23 22 21 max(i)
35 prod(f)
mean(i) ## [1] 35
## [1] 35 sort.list(i) ##
length(Country) ## [1]
1 length(flag) ## [1] 1
s=2+3i is.complex(s)
## [1] TRUE
class(s)
## [1] 1.674149+0.895977i
## [1] NaN
Vectors
#Basic building block for data in R
#Simple R variables are actually vectors
#A vector can take values from the same class---- e=c(2,3,4,5)
I=c(2L,3L,4L,5L)
h=c("2","3", "4","5")
y=e==I is.vector(i) ## [1]
TRUE is.vector(Country) ##
[1] TRUE is.vector(flag)
## [1] TRUE
View(v) v[1]
## [1] NA
length(v) ##
[1] 3 class(v)
## [1] "character"
vn=c(2,3,4,5)
class(vn) ## [1]
"numeric" length(vn)
## [1] 4
View(v1)
v1[3] ##
[1] 3
sum(v1)
## [1] 15
### Seq()function ### The sequence function is more common facility for generating a sequence.
# Sequence function can take four argument (from=value),(to=value),(by=value) and (length=value)
vs=seq(1,20, by=0.5);vs
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
## [16] 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5
## [31] 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0 seq
## function (...)
## UseMethod("seq")
## <bytecode: 0x000000003b091538> ##
<environment: namespace:base>
##Rep function##---- # vn is
repeated thrice vr=rep(v1,
times=3) # element wise
repetition vr1= rep(v1, each=3)
vr;vr1
## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
## [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
## [1] 0 0 0 0
v5[3]=1.4;v5
## [1] 0 0 0
v6[1]=2L v6[2]=3
v6[3]="India"
v7=vector(mode="character", length=4)
v7
## [1] 3
Scalar operations
#Multiplication with a constant; addition and division
v2=v1*2;v2;length(v2) ## [1] 2 4 6 8 10
## [1] 5 va=v1+2;va
## [1] 3 4 5 6 7
print(va) ## [1] 3 4 5
6 7 vb=v1/2;vb
## [1] -1 0 1 2 3 v1
## [1] 1 2 3 4 5
## [1] 8
# if length of vectors differ than addition, sub, multiplication and division are not possible a=v1+k
a
## [1] 11 13 15 17 15 17 21 23
## [1] 1 2 3 4 5 6 7 8 9 10
## [1] 3
#output will give all items of the vector except third v1[-3]
## [1] 1 2 4 5 6 7 8 9 10
#output will give all items of the vector except third and fourth d=v1[-3:-4]; d
## [1] 1 2 5 6 7 8 9 10
View(d)
v
h=v1[-3] h1=h[-4];h1
## [1] 1 2 4 6 7 8 9 10 v3
## [1] 17 21 23 25
## [1] 23 25
## [1] 17 25
replace=F)
## [1] 17 23 25
## [1] 17 21 23 25
## [1] 17
Operations on Vectors
#Multiplication of two vectors---- v4=v2*v3;v2;v3
## [1] 8 11 12 13 ## [1] 17
21 23 25
r=s*k v4;s;k
## [1] 2+3i
## [1] 2 3 4 5 6 7 10 11 r
#quotient of division
v6=v3%/%v2 v6
## [1] 2 1 1 1
#Reminder of division
v7=v3%%v2 v7
## [1] 1 10 11 12
#exponent in vectors
v8=v5^v2 v8
v10=sin(v1) v10
###LOGICAL VECTORS###----
x=c(1,2,3) y=c(5,6,3) x==y
###CHARACTER VECTOR###----
##Paste Function##
#it Obtains a random number of arguments and
#concatenated them one by one
#into the character strings vr
## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
p=paste(vr,1:5, sep="")
p
## [1] "11" "22" "33" "44" "55" "11" "22" "33" "44" "55" "11" "22" "33" "44"
"55"
## , , 1
##
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0
## [3,] 0 0 0
## [4,] 0 0 0
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0
## [3,] 0 0 0
## [4,] 0 0 0
##
## , , 3
##
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0
## [3,] 0 0 0
## [4,] 0 0 0
A[2,2,2]=1000;
A[1,2,3]=5000;
A[1,1,1]=800;
A
## , , 1
##
## [,1] [,2] [,3]
## [1,] 800 0 0
## [4,] 0 0 0
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 1000 0
## [3,] 0 0 0
## [4,] 0 0 0
##
## , , 3
##
## [,1] [,2] [,3]
## [1,] 0 5000 0
## [2,] 0 0 0
## [3,] 0 0 0 ## [4,] 0 0
0 dim(A)
## [1] 4 3 3
#A 2D array is matrix----
#matrix()function
M=matrix(0, nrow=3, ncol=3)
M[1,1]=1
M[1,2]=2
M[1,3]=5;M
M[2,]=c(11,13,0)
M[3,]=c(5,4,3)
M
## [,1] [,2]
## [1,] 1 2
## [2,] 3 6
##Transpose of a Matrix----
tM1=t(M1) tM1
## PQ R
## [1,] 1 5 9
## [2,] 2 6 10
## [3,] 1 7 11
## [4,] 2 8 12
Matrix multiplication
library(matrixcalc)
View(M3)
M1%*%M1
#Power of matrix
M4=matrix.power(M1,5);M4
M1[2,]
## [1] 2 0 4
M1[,3]
## [1] 4 4 8
M1[1,3]
## [1] 4
M1[2,2]
## [1] 0
M1[3] ##
[1] 2
a1=seq(1:4) a2=rep(1:2,2)
a3=c(3,4,5,6)
MAT1=matrix(cbind(a1,a2,a3),nrow=3, ncol=4, byrow=TRUE);MAT1
Mat2=matrix(cbind(a1,a2,a3),nrow=3, ncol=4);Mat2
## [,1] [,2]
## [1,] "a" "c"
## [2,] "b" "d"
Factors
# A factor is basically the division of large values into smaller values. # In statistical data, different
categorical variables are used to specify some subdivision of data
U=c(0,4,1,1,2) f=factor(U,
levels=0:3)
levels(f)=c("none", "more","medium","large")
f
## [1] 1 NA 2 2 3
Lists
# A collection of objects of various types including other lists #list()function
#Double bracket notation[[]]
i=c(5L,4L) l=1.5 k=as.integer(l)
typeof(i) ## [1] "integer" class(i)
## [1] "integer"
j=4L class(j)
## [1] "integer"
v=c("Cricket","Badminton","Football") class(v)
## [1] "character"
L=list("Delhi",i,j,v,M);L
## [[1]]
## [1] "Delhi"
##
## [[2]]
## [1] 5 4
##
## [[3]]
## [1] 4
##
## [[4]]
## [1] "Cricket" "Badminton" "Football"
##
## [[5]]
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0 ## [3,] 0 0
0 l1=list(M1, M);l1
## [[1]]
## [,1] [,2] [,3]
## [1,] 1 6 4
## [2,] 2 0 4
## [3,] 2 5 8
##
## [[2]]
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0 ## [3,] 0 0
0 l2=list(L,l1);l2
## [[1]]
## [[1]][[1]]
## [1] "Delhi"
##
## [[1]][[2]]
## [1] 5 4
##
## [[1]][[3]]
## [1] 4
##
## [[1]][[4]]
## [1] "Cricket" "Badminton" "Football"
##
## [[1]][[5]]
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0
## [3,] 0 0 0
##
##
## [[2]]
## [[2]][[1]]
## [,1] [,2] [,3]
## [1,] 1 6 4
## [2,] 2 0 4
## [3,] 2 5 8
##
## [[2]][[2]]
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0 ## [3,] 0
0 0 class(L) ## [1] "list"
length(L)
## [1] 5
L[5]
## [[1]]
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0
## [3,] 0 0 0
L[[5]]
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0 ## [3,] 0
length(L[5]) ## [1] 1
class(L[[4]]) ## [1]
"character" length(L[[4]]) ##
[1] 3 str(L)
## List of 5
## $ : chr "Delhi"
## $ : int [1:2] 5 4
## $ : int 4
## $ : chr [1:3] "Cricket" "Badminton" "Football"
## $ : num [1:3, 1:3] 0 0 0 0 0 0 0 0 0
## [[1]]
## [1] "Delhi"
##
## [[2]]
## [1] 5 4
##
## [[3]]
## [1] 4
##
## $n
## [1] 2 3 4 5
##
## [[5]]
## [1] "Cricket" "Badminton" "Football"
##
## [[6]]
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0 ## [3,] 0 0
0 str(L1)
## List of 6
## $ : chr "Delhi"
## $ : int [1:2] 5 4
## $ : int 4
## $ n: num [1:4] 2 3 4 5
## $ : chr [1:3] "Cricket" "Badminton" "Football"
## $ : num [1:3, 1:3] 0 0 0 0 0 0 0 0 0
## List of 3
## $ : chr [1:2] "delhi" "mumbai"
## $ : int 1
## $ : num [1:4] 11 12 13 14
## List of 8
## $ : chr "Delhi"
## $ : int [1:2] 5 4
## $ : int 4
## $ : chr [1:3] "Cricket" "Badminton" "Football"
## $ : chr [1:2] "delhi" "mumbai"
## $ : int 1
## $ : num [1:4] 11 12 13 14
## $ : num [1:3, 1:3] 0 0 0 0 0 0 0 0 0
a1=c(2,7,3,8) a2=1:8
l1=list(a1,a2)
View(l1)
#Random sample of 2 elements from the list sample(L1,2)
## [[1]]
## [1] "Cricket" "Badminton" "Football"
##
## [[2]]
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0 ## [3,] 0 0 0
sample(L1, 3, replace = FALSE)
## $n
## [1] 2 3 4 5
##
## [[2]]
## [1] 5 4
##
## [[3]] ## [1] 4 sample(L1, 3,
replace = T)
## [[1]]
## [1] "Delhi"
##
## [[2]]
## [1] "Cricket" "Badminton" "Football"
##
## [[3]]
## [1] 5 4
## n1 n2
## "Delhi" "5" "4" "4" "2" "3"
## n3 n4
## "4" "5" "Cricket" "Badminton" "Football" "0"
## ## "0" "0" "0" "0" "0"
"0" ## ## "0" "0"
l1
## [[1]]
## [1] 2 7 3 8
##
## [[2]]
## [1] 1 2 3 4 5 6 7 8 lapply(l1,
function(x)sqrt(x))
## [[1]]
## [1] 1.414214 2.645751 1.732051 2.828427 ##
## [[2]]
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751
2.828427 lapply(l1, function(x) sum(x))
## [[1]]
## [1] 20
##
## [[2]] ##
[1] 36 l1[1]
## [[1]]
## [1] 2 7 3 8
## [1] 20
SumF3(1900, 300)
## [1] 300
MF=function(a,b,c){ d=a*b*c
print("Multiplication of three variables") print(d)
}
MF(1,2,3)
Conditional Statements
Types of Statements
If Statement
## [1] "The entered number is positive"
If Else Statement
#Structure: If (condition){Statements}else {Statements}
#Example 1 m=-6
m=as.integer(m)
if(m>0){"The entered number is positive"}else {"The entered
number is not positive"}
## [1] "The entered number is not positive"
#Example 2
n=8 n=as.integer(n) if((n%%2)==0){ print("The
entered number is even.") }else { print("The
entered number is odd.") }
#Example 1
#Find the greatest number among three numbers a=5
a=as.integer(a) b=10
b=as.integer(b) c=15
c=as.integer(c) if
((a>b)&(a>c))
{ print("First number is the greatest")
}else if (b>c)
{
print("Second number is the greatest") } else {
print("Third number is the greatest") }
#Example 2 PMarks=85
PMarks=as.integer(PMarks) if
((PMarks<50))
{
print("student has failed in annual examination.")
} else if ((50<= PMarks) &(PMarks<60))
{
print("Student passed with second division.") } else
{
print ("Student passed with first division.")
}
## [1] 1
## [1] 1 2
## [1] 1 2 3
## [1] 1 2 3 4
## [1] 1 2 3 4 5
#Example 2
for(n in c(2,5,10,20,50)) { print(2^n)
}
## [1] 4
## [1] 32
## [1] 1024
## [1] 1048576
## [1] 1.1259e+15
While Loop
#structure: Loop variable initialization
# while(condition){Statements
#loop variables increment/ decrements}
#Example 1
i=1 #initialization of variable i while (i<10)
{ print(4+i)
i=i+1
}
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
## [1] 11
## [1] 12
## [1] 13
a=-1 b=1
Fibbo=function(n)
{
while(n>0)
{ s=a+b
print(s) a=b
b=s n=n-1
}
} n=10
n=as.integer(n)
Fibbo(n)
## [1] 0
## [1] 1
## [1] 1
## [1] 2
## [1] 3
## [1] 5
## [1] 8
## [1] 13
## [1] 21
## [1] 34
Repeat Loop
Repeat Statement
#Structure: Loop variable initialization
#Example 1
print("Repeat Statement")
## [1] "Repeat Statement"
i=1 repeat
{print(paste("i=", i)) i=i+1
if(i==10)
{
#break statement
break }}
i=1
while(i<10)
{print(paste("i=", i)) i=i+1
if(i==5) #break statement
break
}
reaches 5") ## [1] "The loop continue execute after j reaches 5"
for(j in 1:10)
{
print (paste("j=", j)) if(j==5)
#next statement
next
}
Number in a list
#Example 3
#Searching number in a list n=25
n=as.integer(n) for (i in 1:50)
{
if (i==n) {d=1
print("Search is successful") break }
else
{d=0
}
}
if (d==0)
{print("Number is not in the list")
}
Data Preparation
Exploring Data in R
#Creating Data Frame
EmpNo= c(1000, 1001, 1002, 1003, 1004)
EmpName=c("Jack", "Jane", "Margaritta", "Joe", "Dave") ProjName= c("PO1",
"PO2", "PO3", "PO4", "PO5")
## EmpName
## 1 Jack
## 2 Jane
## 3 Margaritta
## 4 Joe
## 5 Dave
Employee[1:2]
## EmpNo EmpName
## 1 1000 Jack
## 2 1001 Jane
## 3 1002 Margaritta
## 4 1003 Joe
## 5 1004 Dave
Employee [3,]
Employee[3]
## ProjName
## 1 PO1
## 2 PO2
## 3 PO3
## 4 PO4
## 5 PO5
Employee[,3]
# New row
"Employee 4", "Employee 5") row.names (Employee)
## [1] "Employee 1" "Employee 2" "Employee 3" "Employee 4" "Employee 5"
Employee$EmpName
Employee[c("EmpNo", "ProjName")]
## EmpNo ProjName
## Employee 1 1000 PO1
## Employee 2 1001 PO2
## Employee 3 1002 PO3
## Employee 4 1003 PO4
## Employee 5 1004 PO5
#new column
Employee$EmpExpYears =c(5, 9, 6, 12, 7)
Employee
#Descending order
Employee[order(-Employee$EmpExpYears),]
## [1] 5 4
## [1] 5
## [1] 4
#5.summary() function to return result summaries for each column summary (Employee)
## [1] 5
data()
View(JohnsonJohnson)
View(mtcars) head(mtcars)
head(mtcars, n=8)
library(datasets) data()
library(readxl) data()
table1
## # A tibble: 6 x 4
## country year cases population
## <chr> <int> <int> <int>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272 ## 6 China
2000 213766 1280428583 table2
## # A tibble: 12 x 4
## country year type count
## <chr> <int> <chr> <int>
## 1 Afghanistan 1999 cases 745
## 2 Afghanistan 1999 population 19987071
## 3 Afghanistan 2000 cases 2666
## 4 Afghanistan 2000 population 20595360
## 5 Brazil 1999 cases 37737
## 6 Brazil 1999 population 172006362
## 7 Brazil 2000 cases 80488
## 8 Brazil 2000 population 174504898
## 9 China 1999 cases 212258
## 10 China 1999 population 1272915272
## 11 China 2000 cases 213766 ## 12 China 2000
population 1280428583
df=read_excel(file.choose()) getwd()
df3=read_excel("SedanCar.xlsx")
mtcars
carsdataframe=mtcars View(carsdataframe)
Subsetting dataframe
#Subsetting data frame submtcars=subset(carsdataframe, hp>=100)
View(submtcars)
#To subset the data frame and display only the category to which the items belong
subset(carsdataframe, hp>=100, select = c(carb,gear))
## carb gear
## Mazda RX4 4 4
## Mazda RX4 Wag 4 4
## Hornet 4 Drive 1 3
## Hornet Sportabout 2 3
## Valiant 1 3
## Duster 360 4 3
## Merc 280 4 4
## Merc 280C 4 4
## Merc 450SE 3 3
## Merc 450SL 3 3
## Merc 450SLC 3 3
## Cadillac Fleetwood 4 3
## Lincoln Continental 4 3
## Chrysler Imperial 4 3
## Dodge Challenger 2 3
## AMC Javelin 2 3
## Camaro Z28 4 3
## Pontiac Firebird 2 3
## Lotus Europa 2 5
## Ford Pantera L 4 5
## Ferrari Dino 6 5
## Maserati Bora 8 5 ## Volvo 142E
2 4 subset
Data Summary
#Data Summary#---- summary(Employee)
## [1] 5
max(Employee[4]) ##
[1] 12
range(Employee[4])
## [1] 5 12
Employee[,4]
## [1] 5 9 6 12 7 mean(Employee[,4])
## [1] 7
IQR(Employee[,4])
## [1] 3 quantile(Employee[,4])
#sapply() function is used to obtain the descriptive statistics sapply(Employee, mean, na.rm=TRUE)
C= as.data.frame (matrix(c(1:5,NA),ncol=2))
C
## V1 V2
## 1 1 4
## 2 2 5 ##
3 3 NA
na.omit(C)
## V1 V2
## 1 1 4 ## 2 2
5 na.exclude(C)
## V1 V2
## 1 1 4 ## 2 2
5 na.pass(C)
## V1 V2
## 1 1 4
## 2 2 5 ## 3 3
NA
library(tidyverse) who
## # A tibble: 7,240 x 60
## country iso2 iso3 year new_s~1 new_s~2 new_s~3 new_s~4 new_s~5 new_s~6
## <chr> <chr> <chr> <int> <int> <int> <int> <int> <int> <int>
## 1 Afghanistan AF AFG 1980 NA NA NA NA NA
NA
## 2 Afghanistan AF AFG 1981 NA NA NA NA NA
NA
## 3 Afghanistan AF AFG 1982 NA NA NA NA NA
NA
## 4 Afghanistan AF AFG 1983 NA NA NA NA NA
NA
## 5 Afghanistan AF AFG 1984 NA NA NA NA NA
NA
## 6 Afghanistan AF AFG 1985 NA NA NA NA NA
NA
## 7 Afghanistan AF AFG 1986 NA NA NA NA NA
NA
## 8 Afghanistan AF AFG 1987 NA NA NA NA NA
NA
## 9 Afghanistan AF AFG 1988 NA NA NA NA NA
NA
## 10 Afghanistan AF AFG 1989 NA NA NA NA NA NA
## # ... with 7,230 more rows, 50 more variables: new_sp_m65 <int>,
## # new_sp_f014 <int>, new_sp_f1524 <int>, new_sp_f2534 <int>,
## # new_sp_f3544 <int>, new_sp_f4554 <int>, new_sp_f5564 <int>,
## # new_sp_f65 <int>, new_sn_m014 <int>, new_sn_m1524 <int>,
## # new_sn_m2534 <int>, new_sn_m3544 <int>, new_sn_m4554 <int>,
## # new_sn_m5564 <int>, new_sn_m65 <int>, new_sn_f014 <int>, ## # new_sn_f1524
<int>, new_sn_f2534 <int>, new_sn_f3544 <int>, ...
v=na.exclude(who)
#Invalid Values and outliers
# An invalid value can be NA, NaN, Inf or -Inf.
#Functions for these invalid values include anyNA(x), anyInvalid(x) and is.invalid(x),
#where the value of x can be a vector, matrix or array.
#Here, anyNA function returns a TRUE value if the input has any Na or NaN values.
#Else, it returns a FALSE value. This function is equivalent to any(is.na(x)).
anyNA(c(-9,NaN,9)) ## [1]
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.0 38.0
50.0 51.7 64.0 146.7 t=hist(custdata$income, xlab="Income", col =
"red")
plot(density(custdata$income), data= custdata)
min(custdata$income)
## [1] -8700 max(custdata$income)
## [1] 615000
Descriptive Statistics
# Calculates the duration.
dis=mtcars$disp
#Apply max and min function to return the range
Range=max(dis) - min(dis);Range
## [1] 400.9
## gear
## Mazda RX4 4
## Mazda RX4 Wag 4
## Datsun 710 4
## Hornet 4 Drive 3
## Hornet Sportabout 3 ## Valiant
3 factor(mtcars$gear)
## [1] 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4
## Levels: 3 4 5
w= table(mtcars$gear) w
##
## 3 4 5 ## 15 12 5 cbind(w) #cbind()function can be used to display the result in column format
## w
## 3 15
## 4 12
## 5 5
## [1] 5
cardata=mtcars str(cardata)
cardata$am=as.factor(cardata$am)
levels(cardata$am)=c("Automatic", "Manual")
count=table(cardata$vs) count
##
## V shaped Straight or Inline ## 18
14 barplot(count)
bar=barplot(count, main="Cars of Engine Type", xlab="Type of Engine", ylab="Number of Cars",
col="yellow")
#provide value labels to the existing bar, cex=1, indicate minimum font size of the value label,
#pos=3 indicates that the labels are slightly above the lower horizontal line of the bar
text(bar, 0, count, cex=1, pos=3)
#for horizontal bar plots bar1=barplot(count, main="Cars of Engine Type", xlab="Type of
Engine",
ylab="Number of Cars", col="yellow", horiz = TRUE)
##
## V shaped Straight or Inline
## Automatic 12 7 ## Manual 6
7
barplot(stbc, main="Cars by Engine Type and Transmission Type",xlab="Engine Type and
Transmission Type",
ylab="Number of cars", legend= rownames(stbc), col=c("Lightblue", "green"),beside=TRUE)
barplot(stbc, main="Cars by Engine Type and Transmission Type",xlab="Engine Type and
Transmission Type",
ylab="Number of cars", legend=rownames(stbc), col=c("Lightblue",
"green"))
barplot(stbc, main="Cars by Engine Type and Transmission Type",xlab="Engine Type and
Transmission Type",
ylab="Number of cars", legend=rownames(stbc), col=c("Lightblue",
"green"),
args.legend = list(x="topright", bty="y", inset=c(0,-0.1)))
xlab="mpg", ylab="Frequency")
# For relative frequency hist(cardata$mpg, breaks=10, xlim=c(10,35), col="lightblue",
main="Histogram
of Miles per Gallon",
xlab="mpg", ylab="Probability", probability= TRUE) lines(density(cardata$mpg))
#Box Plot
boxplot(cardata$mpg)
of Cars", col="green")
fix(cardata) View(cardata)
boxplot(cardata$mpg~cardata$am, main="Milage per Gallon", ylab="Number of Cars", col="green")
## # A tibble: 6 x 4
## country year cases population
## <chr> <int> <int> <int>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272 ## 6 China
2000 213766 1280428583 table2
## # A tibble: 12 x 4
## country year type count
## <chr> <int> <chr> <int>
## 1 Afghanistan 1999 cases 745
## 2 Afghanistan 1999 population 19987071
## 3 Afghanistan 2000 cases 2666
## 4 Afghanistan 2000 population 20595360
## 5 Brazil 1999 cases 37737
## 6 Brazil 1999 population 172006362
## 7 Brazil 2000 cases 80488
## 8 Brazil 2000 population 174504898
## 9 China 1999 cases 212258
## 10 China 1999 population 1272915272
## 11 China 2000 cases 213766 ## 12 China 2000
population 1280428583 table3
## # A tibble: 6 x 3
## country year rate
## * <chr> <int> <chr>
## 1 Afghanistan 1999 745/19987071
## 2 Afghanistan 2000 2666/20595360
## 3 Brazil 1999 37737/172006362
## 4 Brazil 2000 80488/174504898
## 5 China 1999 212258/1272915272
## 6 China 2000 213766/1280428583
# Spread across two tibbles
# cases table4a
## # A tibble: 3 x 3
## country `1999` `2000`
## * <chr> <int> <int>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
# population table4b
## # A tibble: 3 x 3
## country `1999` `2000`
## * <chr> <int> <int>
## 1 Afghanistan 19987071 20595360
## 2 Brazil 172006362 174504898
## 3 China 1272915272 1280428583
## # A tibble: 6 x 4
## country year cases population
## <chr> <int> <int> <int>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272
## 6 China 2000 213766 1280428583
## # A tibble: 6 x 5
## country year cases population rate
## <chr> <int> <int> <int> <dbl>
## 1 Afghanistan 1999 745 19987071 0.373
## 2 Afghanistan 2000 2666 20595360 1.29
## 3 Brazil 1999 37737 172006362 2.19
## 4 Brazil 2000 80488 174504898 4.61
## 5 China 1999 212258 1272915272 1.67
## 6 China 2000 213766 1280428583 1.67
View(who)
t=mutate(table1, rate=cases/population*10000) t
## # A tibble: 6 x 5
## country year cases population rate
## <chr> <int> <int> <int> <dbl>
## 1 Afghanistan 1999 745 19987071 0.373
## 2 Afghanistan 2000 2666 20595360 1.29
## 3 Brazil 1999 37737 172006362 2.19
## 4 Brazil 2000 80488 174504898 4.61
## 5 China 1999 212258 1272915272 1.67 ## 6 China
2000 213766 1280428583 1.67 mtcars
## # A tibble: 3 x 2
## country n
## <chr> <int>
## 1 Afghanistan 3411
## 2 Brazil 118225
## 3 China 426024
## # A tibble: 3 x 3
## country `1999` `2000`
## * <chr> <int> <int>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
## # A tibble: 6 x 3
## country year cases
## <chr> <chr> <int>
## 1 Afghanistan 1999 745
## 2 Afghanistan 2000 2666
## 3 Brazil 1999 37737
## 4 Brazil 2000 80488
## 5 China 1999 212258
## 6 China 2000 213766
table4b
## # A tibble: 3 x 3
## country `1999` `2000`
## * <chr> <int> <int>
## 1 Afghanistan 19987071 20595360
## 2 Brazil 172006362 174504898 ## 3 China
1272915272 1280428583
## # A tibble: 6 x 3
## country year population
## <chr> <chr> <int>
## 1 Afghanistan 1999 19987071
## 2 Afghanistan 2000 20595360
## 3 Brazil 1999 172006362
## 4 Brazil 2000 174504898
## 5 China 1999 1272915272 ## 6 China
2000 1280428583 tidy4=left_join(tidy4a,
tidy4b);tidy4 ## Joining, by = c("country", "year")
## # A tibble: 6 x 4
## country year cases population
## <chr> <chr> <int> <int>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272
## 6 China 2000 213766 1280428583
Pivot Wider
#Joining, by = c("country", "year")
#Wider: pivot_wider() is the opposite of pivot_longer().----
#You use it when an observation is scattered across multiple rows.
table2
## # A tibble: 12 x 4
## country year type count
## <chr> <int> <chr> <int>
## 1 Afghanistan 1999 cases 745
## 2 Afghanistan 1999 population 19987071
## 3 Afghanistan 2000 cases 2666
## 4 Afghanistan 2000 population 20595360
## 5 Brazil 1999 cases 37737
## 7 Brazil 2000 cases 80488
## 8 Brazil 2000 population 174504898
## 9 China 1999 cases 212258
## 10 China 1999 population 1272915272
## 11 China 2000 cases 213766 ## 12 China 2000 population
1280428583 table2 %>% pivot_wider(names_from = type, values_from = count)
## # A tibble: 6 x 4
## country year cases population
## <chr> <int> <int> <int>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272
## 6 China 2000 213766 1280428583
## E F G I
## A 2 4 6 9
## B 2 4 6 10
## C 3 5 7 11 ## D 3
5 8 11
t=as_tibble(M1)
t
## # A tibble: 4 x 4
## E F G I
## <dbl> <dbl> <dbl> <dbl>
## 1 2 4 6 9
## 2 2 4 6 10
## 3 3 5 7 11 ## 4 3 5 8 11 t%>
%pivot_wider(names_from = E, values_from = F)
## # A tibble: 4 x 4
## G I `2` `3`
## <dbl> <dbl> <dbl> <dbl>
## 1 6 9 4 NA
## 2 6 10 4 NA
## 3 7 11 NA 5
## 4 8 11 NA 5
Separating and uniting
#Separate
#separate() pulls apart one column into multiple columns, #by splitting wherever
a separator character appears. table3
## # A tibble: 6 x 3
## country year rate
## * <chr> <int> <chr>
## 1 Afghanistan 1999 745/19987071
## 2 Afghanistan 2000 2666/20595360
## 3 Brazil 1999 37737/172006362
## 4 Brazil 2000 80488/174504898
## 5 China 1999 212258/1272915272 ## 6 China 2000
213766/1280428583 table3 %>% separate(rate, into = c("cases",
"population"))
## # A tibble: 6 x 4
## country year cases population
## <chr> <int> <chr> <chr>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272 ## 6 China 2000 213766 1280428583 table3
%>% separate(rate, into = c("cases", "population"), sep = "/")
## # A tibble: 6 x 4
## country year cases population
## <chr> <int> <chr> <chr>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272 ## 6 China 2000 213766 1280428583 table3 %>%
separate(rate, into = c("cases", "population"), convert = TRUE)
## # A tibble: 6 x 4
## country year cases population
## <chr> <int> <int> <int>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272
## 6 China 2000 213766 1280428583
table3 %>% separate(year, into = c("century", "year"), sep = 2)
## # A tibble: 6 x 4
## country century year rate
## <chr> <chr> <chr> <chr>
## 1 Afghanistan 19 99 745/19987071
## 2 Afghanistan 20 00 2666/20595360
## 3 Brazil 19 99 37737/172006362
## 4 Brazil 20 00 80488/174504898
## 5 China 19 99 212258/1272915272
## 6 China 20 00 213766/1280428583
## # A tibble: 6 x 4
## country century year rate
## * <chr> <chr> <chr> <chr>
## 1 Afghanistan 19 99 745/19987071
## 2 Afghanistan 20 00 2666/20595360
## 3 Brazil 19 99 37737/172006362
## 4 Brazil 20 00 80488/174504898
## 5 China 19 99 212258/1272915272 ## 6 China 20
00 213766/1280428583
table5 %>% unite(new, century, year) #default will place an underscore (_) between the values from different
columns
## # A tibble: 6 x 3
## country new rate
## <chr> <chr> <chr>
## 1 Afghanistan 19_99 745/19987071
## 2 Afghanistan 20_00 2666/20595360
## 3 Brazil 19_99 37737/172006362
## 4 Brazil 20_00 80488/174504898
## 5 China 19_99 212258/1272915272 ## 6 China
20_00 213766/1280428583 table5 %>% unite(new, century,
year, sep = "/")
## # A tibble: 6 x 3
## country new rate
## <chr> <chr> <chr>
## 1 Afghanistan 19/99 745/19987071
## 2 Afghanistan 20/00 2666/20595360
## 3 Brazil 19/99 37737/172006362
## 4 Brazil 20/00 80488/174504898
## 5 China 19/99 212258/1272915272
## 6 China 20/00 213766/1280428583
Missing Values
#Missing values----
#Changing the representation of a dataset brings up an important subtlety of missing values.
#Surprisingly, a value can be missing in one of two possible ways:
#Explicitly, i.e. flagged with NA.
#Implicitly, i.e. simply not present in the data.
stocks = tibble( year = c(2015, 2015, 2015, 2015, 2016, 2016, 2016),
qtr = c( 1, 2, 3, 4, 2, 3, 4), return = c(1.88, 0.59, 0.35, NA,
0.92, 0.17, 2.66)
)
stocks
## # A tibble: 7 x 3
## year qtr return
## <dbl> <dbl> <dbl>
## 1 2015 1 1.88
## 2 2015 2 0.59
## 3 2015 3 0.35
## 4 2015 4 NA
## 5 2016 2 0.92
## 6 2016 3 0.17
## 7 2016 4 2.66
# 2. The return for the first quarter of 2016 is implicitly missing, #because it simply does not appear in
the dataset.
## # A tibble: 4 x 3
## qtr `2015` `2016`
## <dbl> <dbl> <dbl>
## 1 1 1.88 NA
## 2 2 0.59 0.92
## 3 3 0.35 0.17 ## 4 4 NA
2.66
t=stocks %>%
pivot_wider(names_from = year, values_from = return) %>% pivot_longer(
cols = c(`2015`, `2016`), names_to =
"year", values_to = "return",
values_drop_na = TRUE
)t
## # A tibble: 6 x 3
## qtr year return
## <dbl> <chr> <dbl>
## 1 1 2015 1.88
## 2 2 2015 0.59
## 3 2 2016 0.92
## 4 3 2015 0.35
## 5 3 2016 0.17 ## 6 4
2016 2.66
stocks %>% complete(year, qtr)#complete() takes a set of columns, and finds all unique combinations.
## # A tibble: 8 x 3
## year qtr return
## <dbl> <dbl> <dbl>
## 1 2015 1 1.88
## 2 2015 2 0.59
## 3 2015 3 0.35
## 4 2015 4 NA
## 5 2016 1 NA
## 6 2016 2 0.92
## 7 2016 3 0.17
## 8 2016 4 2.66
## # A tibble: 4 x 3
## person treatment response
## <chr> <dbl> <dbl>
## 1 Derrick Whitmore 1 7
## 2 <NA> 2 10
## 3 <NA> 3 9
## 4 Katherine Burke 1 4
View(who) str(who)
## # A tibble: 56 x 2
## key n
## <chr> <int>
## 1 new_ep_f014 1032
## 2 new_ep_f1524 1021
## 3 new_ep_f2534 1021
## 4 new_ep_f3544 1021
## 5 new_ep_f4554 1017
## 6 new_ep_f5564 1017
## 7 new_ep_f65 1014
## 8 new_ep_m014 1038
## 9 new_ep_m1524 1026
## 10 new_ep_m2534 1020
## # ... with 46 more rows
who2 = who1 %>% mutate(key = stringr::str_replace(key,
"newrel", "new_rel")) str(who2)
tail(who2)
## # A tibble: 6 x 6
## country iso2 iso3 year key cases
## <chr> <chr> <chr> <int> <chr> <int>
## 1 Zimbabwe ZW ZWE 2013 new_rel_f1524 2069
## 2 Zimbabwe ZW ZWE 2013 new_rel_f2534 4649
## 3 Zimbabwe ZW ZWE 2013 new_rel_f3544 3526
## 4 Zimbabwe ZW ZWE 2013 new_rel_f4554 1453
## 5 Zimbabwe ZW ZWE 2013 new_rel_f5564 811 ## 6 Zimbabwe
ZW ZWE 2013 new_rel_f65 725
who3 = who2 %>% separate(key, c("new", "type", "sexage"), sep = "_") View(who3) who3
%>% count(new)
## # A tibble: 1 x 2
## new n
## <chr> <int> ## 1
new 76046
who4 = who3 %>% select(-new, -iso2, -iso3) View(who4) who5 = who4 %>%
separate(sexage, c("sex", "age"), sep = 1) View(who5)
## # A tibble: 7,240 x 60
## country iso2 iso3 year new_s~1 new_s~2 new_s~3 new_s~4 new_s~5 new_s~6
## <chr> <chr> <chr> <int> <dbl> <int> <int> <int> <int> <int>
## 1 Afghanistan AF AFG 1980 83.7 NA NA NA NA
NA
## 2 Afghanistan AF AFG 1981 83.7 NA NA NA NA
NA
## 3 Afghanistan AF AFG 1982 83.7 NA NA NA NA
NA
## 4 Afghanistan AF AFG 1983 83.7 NA NA NA NA
NA
## 5 Afghanistan AF AFG 1984 83.7 NA NA NA NA
NA
## 6 Afghanistan AF AFG 1985 83.7 NA NA NA NA
NA
## 7 Afghanistan AF AFG 1986 83.7 NA NA NA NA
NA
## 8 Afghanistan AF AFG 1987 83.7 NA NA NA NA
NA
## 9 Afghanistan AF AFG 1988 83.7 NA NA NA NA
NA
## 10 Afghanistan AF AFG 1989 83.7 NA NA NA NA NA
## # ... with 7,230 more rows, 50 more variables: new_sp_m65 <int>,
## # new_sp_f014 <int>, new_sp_f1524 <int>, new_sp_f2534 <int>,
## # new_sp_f3544 <int>, new_sp_f4554 <int>, new_sp_f5564 <int>,
## # new_sp_f65 <int>, new_sn_m014 <int>, new_sn_m1524 <int>,
## # new_sn_m2534 <int>, new_sn_m3544 <int>, new_sn_m4554 <int>,
## # new_sn_m5564 <int>, new_sn_m65 <int>, new_sn_f014 <int>, ## # new_sn_f1524
<int>, new_sn_f2534 <int>, new_sn_f3544 <int>, ... str(who5)
Quick plots
###Quick plots
#qplot(x, y, data=, color=, shape=, size=, alpha=, geom=, method=, formula=,
#facets=, xlim=, ylim=, xlab=, ylab=, main=, sub=) summary(mtcars)
##
yl)
y)
`.
pg, data = mtcars)
value with
fill=cylinder,
`.
transmission = factor(mtcars$am, levels=c(0, 1),
labels=c("Automatic", "Manual"))
# Dashboard ## A
nice qplot here
qplot(factor(cyl),wt,data=mtcars,geom=c("boxplot","jitter"),
fill=factor(cyl),
main="Box plots with superimposed data",
xlab="Number of cylinders", ylab="Miles per gallon")
##
## 0 1 ## 18 14
barplot(count)
A pleasant histogram here
hist(cardata$mpg)
hist(cardata$mpg,breaks=10,col="Sky blue",
main="histogram of miles per gallon",
xlab="mpg",ylab="frequency")
## A B C
## 1 A 12.5 NA
## 2 B 9.0 3
## 3 C 16.5 2
## 4 J NA NA
## 5 E 9.0 1
## 6 <NA> 20.0 NA ## 7
M 14.5 0
## A Om C
## 1 A 12.5 NA
## Dubey B 9.0 3
## 3 C 16.5 2
## 4 J NA NA
## 5 E 9.0 1
## 6 <NA> 20.0 NA
## 7 M 14.5 0
Date
Parse Date,Time,Months,and Year
library(tidyverse) library(lubridate)
##
## Attaching package: 'lubridate'
library(nycflights13) today()
"2022-11-09" mdy("November
dmy("09-Nov-2022") ## [1]
"2022-11-09" ymd(20221109)
## [1] "2022-11-09"
unclass(as.Date("2022-11-09"))
## [1] 19305
## [1] "0s" "86400s (~1 days)" "172800s (~2 days)" ## [4] "259200s (~3 days)"
"345600s (~4 days)" "432000s (~5 days)" dweeks(3)
ddays(1)
#Periods are time spans but don't have a fixed length in seconds, #instead they work with
"human" times, like days and months. one_pm
days(1)
minutes(2)
## [1] "2017-01-01"
days(1)
Assignment 1
Q1
#Q1----
v1=c(6,4,8,11,3,2,10,12,2,18)
v2=c(12, 14, 18, 11, 9,4, 2, 1, 10, 9)
#(A) v1*8
## [1] 48 32 64 88 24 16 80 96 16 144
#(B) v1+v2
## [1] 18 18 26 22 12 6 12 13 12 27
#(C) log(v2)
## [1] 2.4849066 2.6390573 2.8903718 2.3978953 2.1972246 1.3862944 0.6931472 ## [8] 0.0000000
2.3025851 2.1972246
#(D) v1[-7]
## [1] 6 4 8 11 3 2 12 2 18
#(E) v2[2]
## [1] 14
#(F) sample(v2,4)
## [1] 1 9 12 18
#(G) v1[v1>4]
## [1] 6 8 11 10 12 18
Q2
#Q2----
M1=matrix(c(5,4,1,8,9,2,3,6,7),nrow=3,ncol=3,byrow = T);M1
M4=matrix(c(2,3,4,1,5,7),nrow=3,ncol=2);M4
## [,1] [,2]
## [1,] 2 1
## [2,] 3 5
## [3,] 4 7
#(A)
M1*M3
#(B)
M1%*%M3
M1%*%M4
## [,1] [,2]
## [1,] 26 32
## [2,] 51 67
## [3,] 52 82
M2%*%M1
M2%*%M3
M2%*%M4
## [,1] [,2]
## [1,] 135 204
## [2,] 121 166
M3%*%M1
M3%*%M4
## [,1] [,2]
## [1,] 109 158
## [2,] 59 79
## [3,] 49 77
M4%*%M2
#(C)
M1+M3
#(D) t(M1)
#(E)
M2[1,]
## [1] 11 19 14
#(F)
M1[sample(nrow(M2),size=2),sample(ncol(M1),size=2)]
## [,1] [,2]
## [1,] 5 4
## [2,] 8 9
#(G)
M1[,2][M1[,2]<5]="na" ;M1
#(H)
M1[2,][M1[2,]<5]="na" ;M1
Q3
#Q3----
C1=c(1,2,3,4,5)#roll number class(C1)
## [1] "numeric"
## [1] "character"
## [1] "character"
result=c(1,1,2,1,2)
C4=factor(result,levels = 1:2)
levels(C4)=c("pass","fail")#pass fail C4
#(A) firstprac1=list(C1,C2,C3,C4);firstprac1
## [[1]]
## [1] 1 2 3 4 5
##
## [[2]]
## [1] "a" "b" "c" "d" "e"
##
## [[3]]
## [1] "aa" "bb" "cc" "dd" "ee"
##
## [[4]]
## [1] pass pass fail pass fail
## Levels: pass fail
firstprac1=data.frame(C1,C2,C3,C4)
colnames(firstprac1)=c("rno","first name","last name","pass status")#change column firstprac1
## first name
## 1 a
## 2 b
## 3 c
## 4 d
## 5 e
#(D) sample(firstprac1,2,replace = F)
Q4
#Q4----
C5=c(58,59,24,88,10) C6=c(77,98,20,56,22)
enrolled=c(1,2,1,1,2)
C7=factor(enrolled,levels = 1:2) levels(C7)=c("science","commerce")
#(A)
firstprac2=list(C5,C6,C7);firstprac2
## [[1]]
## [1] 58 59 24 88 10
##
## [[2]]
## [1] 77 98 20 56 22
##
## [[3]]
## [1] science commerce science science commerce
## Levels: science commerce firstprac2=data.frame(C5,C6,C7);firstprac2
## C5 C6 C7
## 1 58 77 science
## 2 59 98 commerce
## 3 24 20 science
## 4 88 56 science ## 5 10
22 commerce
#(B)
Combinelist=c(firstprac1,firstprac2);Combinelist
## $rno
## [1] 1 2 3 4 5
##
## $`first name`
## [1] "a" "b" "c" "d" "e"
##
## $`last name`
## [1] "aa" "bb" "cc" "dd" "ee"
##
## $`pass status`
## [1] pass pass fail pass fail
## Levels: pass fail
##
## $mathematics
## [1] 58 59 24 88 10
##
## $english
## [1] 77 98 20 56 22
##
## $`student enrolled`
## [1] science commerce science science commerce ## Levels: science
commerce
#(C)
unlist(firstprac1)
## rno1 rno2 rno3 rno4 rno5 first name1
## "1" "2" "3" "4" "5" "a"
## first name2 first name3 first name4 first name5 last name1 last name2
## "b" "c" "d" "e" "aa" "bb"
## last name3 last name4 last name5 pass status1 pass status2 pass status3
## "cc" "dd" "ee" "1" "1"
"2"
## pass status4 pass status5
## "1" "2"
Q5
#Q5----
Employeeno=c(1,2,3,4,5,6,7,8,9,10)
Designation=c("Sr Manager","Sr Manager","Manager","Sr Manager","Manager","Sr
Executive","Sr Executive","Manager","Sr Executive","Sr Executive")
Experience=c(15,14,12,17,13,10,9,10,7,8)
Education=c("PostGraduate","PostGraduate","PostGraduate","PostGraduate","Post
Graduate","Graduate","Graduate","Graduate","PostGraduate","PostGraduate")
Induction=c("F","F","T","F","F","T","T","T","T","T")
ex251.csv=data.frame(Employeeno,Designation,Experience,Education,Induction) ex251.csv
Training1=c(50,52,62,65,54,78,65,59,81,57)
Training2=c(77,NA,NA,78,NA,88,72,NA,90,62)
Training3=c(65,55,67,69,NA,89,75,67,91,63)
Training4=c(55,57,NA,NA,71,79,72,63,NA,61)
ex252.csv=data.frame(Training1,Training2,Training3,Training4,Training5) ex252.csv
#(A)
merged=data.frame(ex251.csv,ex252.csv) merged
#(B) #(i)
merged[,3]
## [1] 15 14 12 17 13 10 9 10 7 8
#(ii) merged[3,]
## Employeeno Designation Experience Education Induction Training1 Training2
## 3 3 Manager 12 PostGraduate T 62
NA
## Training3 Training4 Training5
## 3 67 NA NA
#(iii) merged[rowSums(is.na(merged))<3,]
#(iv)
totscore=Training3+Training4+Training5
mergedf=data.frame(merged,totscore) mergedf
#(C) f1[9]
## [1] 1
#(D) f2[15]
## [1] 1
Q7
#Q7---- vna=c(12, 3, NA, 6, 4, 19, NA, NA, 11,10,5, NA, NA, 18, 8)
#(A)
sum(is.na(vna))
## [1] 5
#(B) na.omit(vna)
## [1] 12 3 6 4 19 11 10 5 18 8
## attr(,"na.action")
## [1] 3 7 8 12 13
## attr(,"class") ## [1]
"omit" mean(na.omit(vna))
## [1] 9.6
#(C)
vna[is.na(vna)]=9.6 vna
## [1] 12.0 3.0 9.6 6.0 4.0 19.0 9.6 9.6 11.0 10.0 5.0 9.6 9.6 18.0
8.0
Assignment 2
Q1 Define a user defined function to check whether the word is a palindrome.
#Q1----
Pal=function(a){
b=stringi::stri_reverse(a) if(a==b){
print("Given word is a palindrome")} else{
print("Given word is not a palindrome")
}
## [1] 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83
89 97
Q4 Calculate factorial of n.
n =96 factorial = 1
if(n < 0) {
print("factorial does not exist for negative numbers")
} else if(n == 0) {
print("The factorial of 0 is 1")
} else { for(i in 1:n)
{
factorial = factorial * i
}
print(paste("The factorial of", n ,"is",factorial))
}
if(x > y) {
smaller = y }
else {
smaller = x
}
for(i in 1:smaller) {
if((x %% i == 0) && (y %% i == 0)) { hcf = i
}
}
return(hcf)
}
Hypothesis Testing
One Sample T-Test
# Hypothesis - H0:u = 155 (Average Height)
# Output
# One Sample T-test
# Data: Height
# t = 7.5105, df = 99, p-value = 2.641e-11
# alternative hypothesis: true mean is not equal to 155 # 95 percent
confidence interval:
# 157.6696 159.5866 #
sample estimates:
# mean of x
# 158.6281
# Interpretation
# p-value of the test is 2.641e-11 which is less than the significance level = 0.05. We can conclude that
mean height is significantly different from 155 with a p-value of 2.641e-11.
Paired Sample T-Test
# Hypothesis - There is no significant difference between serum levels before and after
# Output
# Paired t-test
# data: Serum_Prolactin_mVL_before and Serum_Prolactin_mVL_after
# t = 4.5876, df = 21, p-value = 0.0001595
# alternative hypothesis: true difference in means is not equal to 0 # 95 percent confidence
interval: # 48.45684 128.81588 # sample estimates:
# mean of the differences
# 88.63636
# Interpretation
# The p-value of the test is 0.0001595 which is less than significance level 0.05. We can conclude that the
serum levels before treatment is significantly different from the serum levels after treatment with a p-
value = 0.0001595
ANOVA
# Hypothesis - There is no significant difference between a,b, and c # Output
# Rcmdr> AnovaModel.1 <- aov(Sales.data.for.territory.1 ~ Terr, data=Dataset)
# Rcmdr> summary(AnovaModel.1)
# Df Sum Sq Mean Sq F value Pr(>F)
# Terr 2 9.16 4.580 1.215 0.313
# Residuals 27 101.81 3.771
# Rcmdr> with(Dataset, numSummary(Sales.data.for.territory.1, groups=Terr, statistics=c("mean",
"sd")))
# mean sd data:n
# A 3.600000 2.065591 10
# B 4.571429 1.272418 7
# C 4.846154 2.115268 13
# Interpretation
# As the p-value is greater than the significance level 0.05, we can conclude