You are on page 1of 74

Introduction to R

Dasar-dasar R, Pembersihan dan Persiapan


Data
Big Data Analytics and Social Simulation
SBM ITB

2019
PENDAHULUAN
1. Pengenalan R
R adalah bahasa dan lingkungan program untuk komputasi
statistik dan grafik.
R merupakan GNU project yang mirip dengan bahasa dan
lingkungan program S yang dikembangkan di Bell Laboratories
(dulu bagian dari AT&T, kemudian menjadi Lucent Technologies
dan merger dengan Alcatel) oleh John Chambers dan kolega.
Menurut John Chambers, bahasa tersebut dapat diekspresikan
menjadi “to turn ideas into software, quickly and faithfully”.
Link berguna
Karena R merupakan bahasa open source dan gratis, maka banyak
yang forum, bahan, materi, dan contoh yang dibagikan online secara
gratis. Berikut merupakan beberapa tempat yang bisa digunakan
untuk mencari materi analisis R yang menarik:
1.github.com
2.RStudio online learning
3.Harvard online resource
4.stackoverflow.com
5.swirl
6.R for data science
Prepare yourself!
Materi Modul
Modul ini berisi mengenai:
Pengenalan R dan RStudio
– Elemen-elemen RStudio
– Objek dasar pada R
– Perintah dasar pada R
R Programming
– Operasi matematika dasar
– Operasi matriks
– Fungsi iterasi
– Membuat fungsi di dalam R
Data Cleaning
– Import data
– Melihat data secara sekilas
– Mengetahui jenis objek
– Memberi nama variabel
– Mengetahui apakah ada data yang hilang atau tidak lengkap
– Deskripsi dan visualisasi data
Persiapan Data
– Subset
– Membuat variabel baru
– Transformasi bentuk data
MATERI
Pengenalan R dan RStudio
R merupakan bahasa pemrograman sedangkan RStudio merupakan
Integrated Development Environment (IDE) untuk bahasa R. Ketika membuka
RStudio maka ada beberapa bagian penting:
– Console Pada bagian ini berfungsi untuk menjalankan command atau function
per baris, atau ketika command sudah selesai ditulis dengan menekan enter.
– Output Sedangkan pada jendela output akan berisi hasil gambar grafis jika ada
hasil yang berbentuk grafis dari command
– Environment Pada bagian ini dapat dilihat objek apa saja yang ada di dalam
proses pengerjaan data
– Script Berbeda dengan console yang langsung dieksekusi per baris, script
memperbolehkan kita untuk membuat serangkaian perintah dan dieksekusi
semua atau sebagian jika dibutuhkan.
RStudio environment

script Workspace

Tab section
console
Objek-Objek Dasar R
Sama dengan bahasa pemrograman yang lain, tipe data (class)
pada R terdiri dari:
– numeric (bilangan riil, contoh: 2.3567)
– integer (bilangan bulat, contoh: 4)
– complex (bilangan imajiner, contoh: 3i)
– logical (true/false)
– character
Objek-Objek Dasar R
R memiliki basis data struktur yang dapat dikategorikan
berdasarkan dimensi dan apakah data tersebut isinya homogen
atau heterogen. Sehingga bentuk struktur data tersebut dapat
dirangkum menjadi:
Dimension Homogeneous Heterogeneous
1d Atomic vectors List
2d Matrix Data frame
nd Array
Data type
• Atomic data type:
Same Class Different Dimension
– numeric (real number) Object Class
– integer Object
Vector 
– complex
List 
– logical (true / false) Matrix  
– character Data
 
frame
Dealing with Vector(1)
• Construction:
> x <- c(1, 5, 4, 9, 0)
> typeof(x)
[1] "double"
> length(x)
[1] 5
x <- c(1, 5.4, TRUE, "hello")
>x
[1] "1" "5.4" "TRUE" "hello"
> typeof(x)
[1] "character"
Dealing with Vector(2)

• How to access Elements of a • How to access Elements of a


Vector? Vector?
– Using integer vector as index Using logical vector as index
>x
[1] 0 2 4 6 8 10 >x
> x[3] # access 3rd element [1] -3 -1 0 3
[1] 4
> x[c(2, 4)] # access 2nd and 4th element > x[c(TRUE, FALSE, FALSE, TRUE)]
[1] 2 6 [1] -3 3
> x[-1] # access all but 1st element
[1] 2 4 6 8 10 > x[x < 0] # filtering vectors based
> x[c(2, -4)] # cannot mix positive and negative on conditions
integers
Error in x[c(2, -4)] : only 0's may be mixed with [1] -3 -1
negative subscripts > x[x > 0]
> x[c(2.4, 3.54)] # real numbers are truncated to
integers [1] 3
[1] 2 4
Dealing with Vector(3)

• How to access Elements of a Vector? • How to modify a vector in R?


Using character vector as index >x
> x <- c("first"=3, "second"=0, [1] -3 -2 -1 0 1 2
"third"=9) > x[2] <- 0; x # modify 2nd
> names(x) element
[1] "first" "second" "third" [1] -3 0 -1 0 1 2
> x["second"] > x[x<0] <- 5; x # modify elements
second less than 0
0 [1] 5 0 5 0 1 2
> x[c("first", "third")] > x <- x[1:4]; x # truncate x to first
first third 4 elements
3 9 [1] 5 0 5 0
Dealing with Vector(4)

• How to delete a Vector?


>x
[1] -3 -2 -1 0 1 2
> x <- NULL
>x
NULL
> x[4]
NULL
Dealing with Matrix(1)
• Construction: column-wise • Construction: row-wise
> matrix(1:9, nrow = 3, ncol = 3)
[,1] [,2] [,3] > matrix(1:9, nrow=3,
[1,] 1 4 7 byrow=TRUE) # fill matrix
[2,] 2 5 8
[3,] 3 6 9 row-wise
> # same result is obtained by providing
only one dimension [,1] [,2] [,3]
> matrix(1:9, nrow = 3)
[,1] [,2] [,3]
[1,] 1 2 3
[1,] 1 4 7 [2,] 4 5 6
[2,] 2 5 8
[3,] 3 6 9 [3,] 7 8 9
Dealing with Matrix(2)

• How to give a name of row and col? Changing name:


> x <- matrix(1:9, nrow = 3, dimnames > colnames(x) <-
= list(c("X","Y","Z"), c("A","B","C"))) c("C1","C2","C3")
>x
ABC > rownames(x) <-
X147
c("R1","R2","R3")
Y258 >x
Z369 C1 C2 C3
> colnames(x) R1 1 4 7
[1] "A" "B" "C" R2 2 5 8
> rownames(x)
R3 3 6 9
[1] "X" "Y" "Z"
Dealing with Matrix(3)
• Create a matrix from a vector by setting
• Another way to create matrix: its dimension
> cbind(c(1,2,3),c(4,5,6)) > x <- c(1,2,3,4,5,6)
[,1] [,2] >x
[1] 1 2 3 4 5 6
[1,] 1 4 > class(x)
[2,] 2 5 [1] "numeric"
> dim(x) <- c(2,3)
[3,] 3 6 >x
> rbind(c(1,2,3),c(4,5,6)) [,1] [,2] [,3]
[1,] 1 3 5
[,1] [,2] [,3] [2,] 2 4 6
[1,] 1 2 3 > class(x)
[2,] 4 5 6 [1] "matrix"
Dealing with Matrix(4)
• How to access Elements of a matrix? • index a matrix with a single vector
Index:
>x >x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[,1] [,2] [,3]
[3,] 3 6 9
> x[c(1,2),c(2,3)] # select rows 1 & 2 and columns 2 & 3
[1,] 4 8 3
[,1] [,2]
[1,] 4 7 [2,] 6 0 7
[2,] 5 8
> x[c(3,2),] # leaving column field blank will select entire columns [3,] 1 2 9
[,1] [,2] [,3]
[1,] 3 6 9 > x[1:4]
[2,] 2 5 8
> x[,] # leaving row as well as column field blank will select entire matrix
[,1] [,2] [,3]
[1] 4 6 1 8
[1,] 1 4 7
[2,] 2 5 8
> x[c(3,5,7)]
[3,] 3 6 9
> x[-1,] # select all rows except first [1] 1 0 3
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
Dealing with Matrix(5)
• Using character vector as index
• Using logical vector as index:
>x >x
[,1] [,2] [,3] ABC
[1,] 4 8 3 [1,] 4 8 3
[2,] 6 0 7
[3,] 1 2 9
[2,] 6 0 7
> x[c(TRUE,FALSE,TRUE),c(TRUE,TRUE,FALSE)] [3,] 1 2 9
[,1] [,2] > x[,"A"]
[1,] 4 8
[1] 4 6 1
[2,] 1 2
> x[c(TRUE,FALSE),c(2,3)] # the 2 element logical vector is recycled to 3 element > x[TRUE,c("A","C")]
vector AC
[,1] [,2]
[1,] 4 3
[1,] 8 3
[2,] 2 9 [2,] 6 7
> x[x>5] # select elements greater than 5 [3,] 1 9
[1] 6 8 7 9 > x[2:3,c("A","C")]
> x[x%%2 == 0] # select even elements
[1] 4 6 8 0 2
AC
[1,] 6 7
[2,] 1 9
Dealing with Matrix(6)

• How to modify a matrix in R? • Transpose


>x >x
[,1] [,2] [,3]
[1,] 1 4 7 [,1] [,2] [,3]
[2,] 2 5 8
[3,] 3 6 9
[1,] 0 0 7
> x[2,2] <- 10; x # modify a single element [2,] 0 10 8
[,1] [,2] [,3]
[1,] 1 4 7 [3,] 0 6 9
[2,] 2 10 8 > t(x) # transpose a matrix
[3,] 3 6 9
> x[x<5] <- 0; x # modify elements less than 5 [,1] [,2] [,3]
[,1] [,2] [,3]
[1,] 0 0 7
[1,] 0 0 0
[2,] 0 10 8 [2,] 0 10 6
[3,] 0 6 9
[3,] 7 8 9
Dealing with Matrix(7)

• add row or column using rbind() and cbind() • Dimension of matrix can be modified as well,
> cbind(x, c(1, 2, 3)) # add column using the dim() function.
[,1] [,2] [,3] [,4] >x
[1,] 0 0 7 1
[2,] 0 10 8 2
[,1] [,2] [,3]
[3,] 0 6 9 3 [1,] 1 3 5
> rbind(x,c(1,2,3)) # add row [2,] 2 4 6
[,1] [,2] [,3]
[1,] 0 0 7 > dim(x) <- c(3,2); x # change to 3X2 matrix
[2,] 0 10 8 [,1] [,2]
[3,] 0 6 9
[4,] 1 2 3
[1,] 1 4
> x <- x[1:2,]; x # remove last row [2,] 2 5
[,1] [,2] [,3] [3,] 3 6
[1,] 0 0 7
[2,] 0 10 8 > dim(x) <- c(1,6); x # change to 1X6 matrix
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
Dealing with List(1)
• Structure
• Construction:
> str(x)
x <- list("a" = 2.5, "b" = TR List of 3
UE, "c" = 1:3) $ a: num 2.5
>x $ b: logi TRUE
$a $ c: int [1:3] 1 2 3
Without tag:
[1] 2.5 > x <- list(2.5,TRUE,1:3)
>x
$b [[1]]
[1] TRUE [1] 2.5
[[2]]
[1] TRUE
$c [[3]]
[1] 1 2 3 [1] 1 2 3
Dealing with List(2)
• How to access components of a list?
x=list("name"="John","age"=19,"speak"=c("English > x[-2] # using negative integer to exclude
","French")) second component
>x $name
$name [1] "John"
[1] "John" $speaks
[1] "English" "French"
$age
[1] 19 > x[c(T,F,F)] # index using logical vector
$name
$speak [1] "John"
[1] "English" "French“ > x[c("age","speaks")] # index using
character vector
> x[c(1:2)] # index using integer vector $age
$name [1] 19
[1] "John" $speaks
$age [1] "English" "French"
[1] 19
Dealing with List(3)

• [] and [[]]: • An alternative to [[, which is used often while


accessing content of a list is the $ operator.
> x["age"] • They are both the same except that $ can do
partial matching on tags.
$age > x$name # same as x[["name"]]
[1] 19 [1] "John"
> typeof(x["age"]) # single [ returns > x$a # partial matching, same as x$ag or
x$age
a list [1] 19
[1] "list" > x[["a"]] # cannot do partial match with [[
> x[["age"]] # double [[ returns the NULL
> # indexing can be done recursively
content > x$speaks[1]
[1] 19 [1] "English"
> typeof(x[["age"]]) > x[["speaks"]][2]
[1] "French"
[1] "double"
Dealing with List(4)
• How to add components to a list?
• How to modify a list in R?
> x[["married"]] <- FALSE
> x[["name"]] <- "Clair"; x >x
$age $age
[1] 19
[1] 19 $speaks
$speaks [1] "English" "French"
$name
[1] "English" "French" [1] "Clair"
$name $married
[1] FALSE
[1] "Clair"
Dealing with List(5)

How to delete components from


a list?
> x[["age"]] <- NULL
> str(x)
List of 3
$ speaks : chr [1:2] "English" "French"
$ name : chr "Clair"
$ married: logi FALSE
> x$married <- NULL
> str(x)
List of 2
$ speaks: chr [1:2] "English" "French"
$ name : chr "Clair"
Dealing with Data Frame(1)
• Construction: > names(x)
> x <- data.frame("SN" = 1:2, "Age" [1] "SN" "Age" "Name"
= c(21,15), "Name" = > ncol(x)
c("John","Dora"))
[1] 3
> str(x) # structure of x
'data.frame': 2 obs. of 3 variables: > nrow(x)
$ SN : int 1 2 [1] 2
$ Age : num 21 15 > length(x) # returns length of
$ Name: Factor w/ 2 levels the list, same as ncol()
"Dora","John": 2 1 [1] 3
Dealing with Data Frame(2)
• How to access Components of How to modify a Data Frame in R?

a Data Frame? >x


SN Age Name
1 1 21 John

> x["Name"] 2 2 15 Dora


> x[1,"Age"] <- 20; x
SN Age Name
Name 1 1 20 John
2 2 15 Dora
Adding Components
1 John > rbind(x,list(1,16,"Paul"))
SN Age Name

2 Dora
1 1 20 John
2 2 15 Dora
3 1 16 Paul

> x$Name
Adding col
> cbind(x,State=c("NY","FL"))
SN Age Name State

[1] "John" "Dora"


1 1 20 John NY
2 2 15 Dora FL
OR

> x[["Name"]] > x$State <- c("NY","FL"); x


SN Age Name State
1 1 20 John NY

[1] "John" "Dora" 2 2 15 Dora FL

> x[[3]]
[1] "John" "Dora"
Dealing with Data Frame(3)

• Deleting Component
> x$State <- NULL
>x
SN Age Name
1 1 20 John
2 2 15 Dora
> x <- x[-1,]
>x
SN Age Name
2 2 15 Dora
R PROGRAMMING
Arithmetic Operation in R
> x <- 5
Operator Description > y <- 16
> x+y
+ Addition
[1] 21
– Subtraction > x-y
* Multiplication [1] -11
> x*y
/ Division [1] 80
^ Exponent > y/x
%% Modulus [1] 3.2
(Remainder from > y%/%x
division) [1] 3
> y%%x
%/% Division Integer [1] 1
> y^x
[1] 1048576
R Relational Operators
Operator Description > x <- 5
> y <- 16
< Less than > x<y
[1] TRUE
> Greater than
> x>y
Less than or equal [1] FALSE
<= > x<=5
to
[1] TRUE
Greater than or > y>=20
>=
equal to [1] FALSE
> y == 16
== Equal to
[1] TRUE
!= Not equal to > x != 5
[1] FALSE
Operation on Vectors
> x <- c(2,8,3)
> y <- c(6,4,1) • The above mentioned operators
> x+y work on vectors.
[1] 8 12 4
> x>y
[1] FALSE TRUE TRUE
• When there is a mismatch in
length (number of elements) of
> x <- c(2,1,8,3) operand vectors, the elements in
> y <- c(9,4) shorter one is recycled in a cyclic
> x+y # Element of y is recycled to 9,4,9,4 manner to match the length of the
[1] 11 5 17 7 longer one.
> x-1 # Scalar 1 is recycled to 1,1,1,1
[1] 1 0 7 2
> x+c(1,2,3)
[1] 3 3 11 4
R Logical Operators
Operator Description
Operators & and | perform element-wise operation producing
! Logical NOT result having length of the longer operand.
Element-wise
& But && and || examines only the first element of the operands
logical AND
resulting into a single length logical vector.
&& Logical AND
Zero is considered FALSE and non-zero numbers are taken as
Element-wise TRUE
|
logical OR
> x <- c(TRUE,FALSE,0,6)
|| Logical OR > y <- c(FALSE,TRUE,FALSE,TRUE)
> !x
[1] FALSE TRUE TRUE FALSE
> x&y
[1] FALSE FALSE FALSE TRUE
> x&&y
[1] FALSE
> x|y
[1] TRUE TRUE FALSE TRUE
> x||y
[1] TRUE
R Assignment Operators

Operator Description > x <- 5


Leftwards >x
<-, <<-, = [1] 5
assignment
>x=9
Rightwards >x
->, ->>
assignment [1] 9
> 10 -> x
>x
[1] 10
Matrix Operation(1)
Scalar Multiplication
A <- matrix(c(2, 3, -2, 1, 2, 2), nrow = 3, ncol = 2)
A
## [,1] [,2]
## [1,] 2 1
## [2,] 3 2
## [3,] -2 2
c <- 3

c*A
## [,1] [,2]
## [1,] 6 3
## [2,] 9 6
## [3,] -6 6
Matrix Operation(2)
Addition And Subtraction
B <- matrix(c(1,4,-2,1,2,1),nrow = 3, ncol = 2)
B
## [,1] [,2]
## [1,] 1 1
## [2,] 4 2
## [3,] -2 1
A + B
## [,1] [,2]
## [1,] 3 2
## [2,] 7 4
## [3,] -4 3
A - B
## [,1] [,2]
## [1,] 1 0
## [2,] -1 0
## [3,] 0 1
Matrix Operation(3)
Matrix Multiplication
D <- matrix(c(2,-2,1,2,3,1),2,3)
D
## [,1] [,2] [,3]
## [1,] 2 1 3
## [2,] -2 2 1
D %*% A
## [,1] [,2]
## [1,] 1 10
## [2,] 0 4
A %*% D
## [,1] [,2] [,3]
## [1,] 2 4 7
## [2,] 2 7 11
## [3,] -8 2 -4
Matrix Operation(4)
Transpose
t(A)
## [,1] [,2] [,3]
## [1,] 2 3 -2
## [2,] 1 2 2

Diagonal Matrix
S <- matrix(c(2,3,-2,1,2,2,4,2,3),ncol = 3, nrow = 3)
S
## [,1] [,2] [,3]
## [1,] 2 1 4
## [2,] 3 2 2
## [3,] -2 2 3
diag(S)
## [1] 2 2 3
Matrix Operation(5)
Identity Matrix
I <- diag(c(1, 1, 1))
I
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
Invers Matrix
A <- matrix(c(4,4,-2,2,6,2,2,8,4),3,3)
A
## [,1] [,2] [,3]
## [1,] 4 2 2
## [2,] 4 6 8
## [3,] -2 2 4
solve(A)
## [,1] [,2] [,3]
## [1,] 1.0 -0.5 0.5
## [2,] -4.0 2.5 -3.0
## [3,] 2.5 -1.5 2.0
# Identity Result
A %*% solve(A)
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
Matrix Operation(6)
C <- matrix(c(2,1,6,1,3,4,6,4,-2),ncol = 3, nrow = 3)
C
## [,1] [,2] [,3]
## [1,] 2 1 6
## [2,] 1 3 4
## [3,] 6 4 -2
CI <- solve(C)
CI
## [,1] [,2] [,3]
## [1,] 0.2156863 -0.25490196 0.13725490
## [2,] -0.2549020 0.39215686 0.01960784
## [3,] 0.1372549 0.01960784 -0.04901961
d <- det(CI)
d
## [1] -0.009803922
LOOPING IN R
Types of Loop
Loop For(1)
Loop For(2)
Nested Loop
H <- matrix(nrow = 30, ncol = 30)

for(i in 1:dim(H)[1]) {
for (j in 1:dim(H)[2]) {
H[i,j] = i*j
}
}

H[1:10, 1:10]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 2 3 4 5 6 7 8 9 10
## [2,] 2 4 6 8 10 12 14 16 18 20
## [3,] 3 6 9 12 15 18 21 24 27 30
## [4,] 4 8 12 16 20 24 28 32 36 40
## [5,] 5 10 15 20 25 30 35 40 45 50
## [6,] 6 12 18 24 30 36 42 48 54 60
## [7,] 7 14 21 28 35 42 49 56 63 70
## [8,] 8 16 24 32 40 48 56 64 72 80
## [9,] 9 18 27 36 45 54 63 72 81 90
## [10,] 10 20 30 40 50 60 70 80 90 100
While and Repeat Loop

set.seed(11) i=1
a=rnorm(10)
repeat {
x=1 print(paste0("Bilangan ke-",i,"=",a[i]))
i=i+1
while (x<=length(a)){ if (i > length(a)) break
print(paste0("Bilangan ke-",x,"=",a[x])) }
x=x + 1
}
CONDITION IN R
R if statement
if (test_expression) {
statement
}

x <- 5
if(x > 0){
print("Positive number")
}
if…else statement
if (test_expression) {
statement1
} else {
statement2
}

x <- -5
if(x > 0){
print("Non-negative number")
} else {
print("Negative number")
}
if…else Ladder
if ( test_expression1) {
statement1
} else if ( test_expression2) {
statement2
} else if ( test_expression3) {
statement3
} else {
statement4
}
x <- 0
if (x < 0) {
print("Negative number")
} else if (x > 0) {
print("Positive number")
} else
print("Zero")
RANDOM NUMBER IN R
Uniform: [a,b]
> runif(1,0,2) # time at light
[1] 1.490857 # also runif(1,min=0,max=2)
> runif(5,0,2) # time at 5 lights
[1] 0.07076444 0.01870595 0.50100158 0.61309213
0.77972391
> runif(5) # 5 random numbers in [0,1]
[1] 0.1705696 0.8001335 0.9218580 0.1200221 0.1836119
Normal: [mu,sigma]

> rnorm(1,100,16)
[1] 94.1719
> rnorm(1,mean=280,sd=10)
[1] 270.4325
Binomial:[n,p]
> n=1, p=.5 # set the probability
> rbinom(1,n,p) # different each time
[1] 1
> rbinom(10,n,p) # 10 different such numbers
[1] 0 1 1 0 1 0 1 0 1 0

> n = 10; p=.5


> rbinom(1,n,p) # 6 successes in 10 trials
[1] 6
> rbinom(5,n,p) # 5 binomial number
[1] 6 6 4 5 4
FUNCTION BUILDING IN R
Building Function
fahrenheit_to_celcius <- function(temp_F) {
temp_C <- (temp_F - 32) * 5 / 9
return(temp_C)
}

fahrenheit_to_celcius(212) # titik didih air


## [1] 100
fahrenheit_to_celcius(32) # titik beku air
## [1] 0
SIMPLE DATA PROCESSING
Data Cleaning(1)
Proses ini merupakan tahap awal yang penting sebelum mulai bekerja
dengan bahasa R, karena kita harus mempersiapkan data sebelum
dapat dianailisis lebih lanjut. Proses yang perlu kita ketahui
diantaranya adalah
– Import data
– Melihat data secara sekilas
– Mengetahui jenis objek
– Memberi nama variabel
– Mengetahui apakah ada data yang hilang atau tidak lengkap
– Deskripsi dan visualisasi data
Data Cleaning(2)
Import data. R bisa import berbagai jenis data, diantaranya
adalah file dengan ekstensi .csv, .xlsx, .txt, hingga .sav. Untuk saat
ini kita akan mengimport file .csv
data <- read.csv(file = "file location", header = TRUE)

– atau kita dapat menggunakan fitur yang ada di RStudio File> Import
Dataset. kemudian pilih bentuk data yang ada inginkan
Data Cleaning(3)
# import data from txt file into data.frame

sales<- read.csv( "C:/Users/lenovo/Desktop/Bahan R DSM JKT 2018/sales.txt")

summary(sales)
cust_id sales_total num_of_orders gender
Min. :100001 Min. : 30.02 Min. : 1.000 F:5035
1st Qu.:102501 1st Qu.: 80.29 1st Qu.: 2.000 M:4965
Median :105001 Median : 151.65 Median : 2.000
Mean :105001 Mean : 249.46 Mean : 2.428
3rd Qu.:107500 3rd Qu.: 295.50 3rd Qu.: 3.000
Max. :110000 Max. :7606.09 Max. :22.000
Data Cleaning(4)
str(sales)
'data.frame': 10000 obs. of 4 variables:
$ cust_id : int 100001 100002 100003 100004 100005
100006 100007 100008 100009 100010 ...
$ sales_total : num 800.6 217.5 74.6 498.6 723.1 ...
$ num_of_orders: int 3 3 2 3 4 2 2 2 2 2 ...
$ gender : Factor w/ 2 levels "F","M": 1 1 2 2 1 1 2 2 1 2 ...
DATA VISUALIZATION
Data Visualization(1)
Scatter plot
> plot(sales$num_of_orders,sales$sales_total,main = "Number of Orders vs Sales")
Histogram
hist(sales$num_of_orders, xlab = "num_of_orders", main = "Histogram
data num_of_orders", col = "red")
Data Cleaning(3)
Density
plot(density(sales$sales_total), xlab = "sales_total", main =
"Distribution of sales total")
Data Cleaning(4)
Boxplot
boxplot(sales$num_of_orders, sales$per_order, main = "Boxplot of
sales",xlab = "num_of_orders", ylab = "per_order")
Let’s Practice
Bersihkan environment R Anda dengan command rm(list = ls())
Buat variabel vector bernama x dan y dimana masing-masing berisi
objek numerik 1 sampai dengan 5
Buat variabel list bernama z yang berisi objek logical, numeric,
character, complex, dan integer dengan urutan tersebut
Buat variabel matrix bernama w yang berdimensi 2 x 5 dengan
menggabungkan vector x dan y
Buat variabel matrix bernama q yang berdimensi 5 x 2 dengan
menggabungkan vector x dan y
Import data hbat.csv dan buat kesimpulan mengenai data tersebut
Let’s Practice
Buat matriks sebagai berikut:

Kemudian hitung:



Bangun matriks dengan nilai diagonal
Reference
– Grolemund, G. (2014). Hands-On Programming with R: Write Your
Own Functions and Simulations. O’Reilly Media, Inc.
– Wickham, H., & Grolemund, G. (2016). R for data science: import,
tidy, transform, visualize, and model data. O’Reilly Media, Inc.
– Wickham, H. (2014). Advanced R. Chapman & Hall/CRC The R. Series

You might also like