You are on page 1of 42

Lecture 2: Getting started with

Dr. Nguyen Quang Huy

12 July, 2023

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 1 / 37
Basics of - basic math operators
Let’s type on the console window:
1 + 0.001

## [1] 1.001
2*pi - 3

## [1] 3.283185
exp(1)-exp(-1)

## [1] 2.350402
log(3.2) # natural logarithm

## [1] 1.163151
log(32,2) # the binary logarithm of 32

## Dr.
[1] 5 Quang Huy
Nguyen Lecture 2: Getting started with 12 July, 2023 2 / 37
Basics of - basic math operators
Calculate the following formulas:
1
(a)
4−6

(b) 4 22
(c) 43+2
4+7
(d)
12 − 2
(e) (12 − 5)4/3
23
(f)
32
!
2+3
(g) ln .
4−1
Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 3 / 37
Basics of - variable assignment
We can use “<-” or “=” for assigning value to a variable
x<-3.25 # variable name is x
print(x)

## [1] 3.25
xˆ2+exp(x)-3

## [1] 33.35284
y=2 # variable name is y
xˆy

## [1] 10.5625
Variable names can contain any combination of alphanumeric characters
along with periods (.) and underscores (_). However, they CANNOT start
with a number or an underscore. Tên biến không được bắt đầu bằng số hoặc _
Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 4 / 37
Basics of - removing variables

123x<-3.25
# check if there is an error

_x<-3.25
# check if there is an error

To remove a variable, we use function rm()


x<-3.25
x

## [1] 3.25
rm(x) # check on the environment window

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 5 / 37
Basics of - numerical variables
There are 4 main types of variables: numeric, character (string),
Date/Time and logical. số, chuỗi ký tự, ngày/thời gian và logic (chỉ nhận T/F).
Numeric is similar to float or double in other languages.
x<-5
is.numeric(x) # check if x is numeric

## [1] TRUE
An integer variable is also a numerical variable.
x<-5
is.integer(x)

## [1] FALSE
To assign an integer value to variable x :
x<-5L
Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 6 / 37
Basics of - numerical variables

Operator Description
+ Addition
− Subtraction
∗ Multiplication
/ Division
ˆ Exponent
exp() Natural Exponent
%% Modulus (Remainder from division)
phép lấy phần dư trong phép chia
%/% Integer Division
lấy phần nguyên của kết quả phép chia

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 7 / 37
Basics of - numerical variables
5ˆ2

## [1] 25
5 %/% 2

## [1] 2
5 %% 2

## [1] 1
7.5 %/% 2

## [1] 3
7.5 %% 2

## [1] 1.5
Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 8 / 37
Basics of - numerical variables
Specific numerical values in
z<- 1e+20 # 10ˆ20 = 1x10^20
log(z,10)

## [1] 20
t<- 1e-20 # 10ˆ(-20) =1x10^-20
log(t,10)

## [1] -20
# Inf and -Inf
x<-1/0
y<- (-1)/0
x - 10ˆ9 # Inf

## [1] Inf
x +Dr.yNguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 9 / 37
Basics of - logical variables
Logical variables can be either TRUE or FALSE.
Numerically, TRUE is the same as 1 and FALSE is the same as 0.
Logical variables are results from relational operations
x<-TRUE
x

## [1] TRUE
x*2

## [1] 2
y<-FALSE
x + y

## [1] 1
Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 10 / 37
Basics of - logical variables

Operator Description
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to
== Equal to (có bằng nhau không?)
!= Not equal to (có khác nhau không?)
! Not
& And
| Or

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 11 / 37
Basics of - logical variables
x<-(1>=2)
x

## [1] FALSE
y<-(x==0)
y

## [1] TRUE
x+y

## [1] 1
x&y

## [1] FALSE
x|y
Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 12 / 37
Basics of - character variables

In any language, character variables should be use with care.


x<-"Ice cream"
is.character(x)

## [1] TRUE
Capital letter “I” and normal letter “i” are different.
x == "ice cream" # capital letter

## [1] FALSE
Funtion nchar(x) count the characters in string x
nchar(x) # number of characters

## [1] 9

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 13 / 37
Basics of - character variables

Function paste() concatenate number of strings into a single string


x<-paste("We", "are", "the champion", sep = " ")
x

## [1] "We are the champion"


Function substr() extract a substrings in a character vector.
substr("abcdefgh",3,4)

## [1] "cd"
Function sub() performs replacement of the first match
sub("a","1234","abcdefga") thay thế ký tự a bằng ký tự 1234 trong chuỗi cuối
lệnh gsub là thay tất cả, sub thay ký tự đầu tiên mà nó
gặp.
## [1] "1234bcdefga"

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 14 / 37
Basics of - character variables

Function gsub() performs replacement of all matches


gsub("a","1234","abcdefga")

## [1] "1234bcdefg1234"
Function grep and grepl search for matches to argument pattern within
each element of a character
grep("ab","abcdefga") xem đoạn ký tự có trong đoạn văn bản nào đấy không

## [1] 1
grepl("de","abcdefga")

## [1] TRUE

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 15 / 37
Basics of - Date/time variables.
Dealing with dates and times can be difficult in any language.
Date variables store a date as number of days since January 1, 1970
while POSIXct stores a time as number of seconds since 07:00:00 Jan
1, 1970.
date1<-as.Date("2019-12-05")
as.numeric(date1) # number of days since Jan 01, 1970

## [1] 18235
thời gian chính xác từng giây
time1<-as.POSIXct("1970-01-01 07:00:01")
as.numeric(time1) # number of seconds since 7AM Jan 01, 1970

## [1] 1
as.numeric(substr(time1,12,13)) # extract hour from date2

## [1] 7
Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 16 / 37
+++++++++++++++++++++++++
Basics of - Date/time variables.
Base does not well support to date/time variables. In practice, we use
build-in package, lubridate for example
library(lubridate)
today()

## [1] "2023-07-12"
now()# Universal Time Coordinated (UTC)

## [1] "2023-07-12 22:21:29 +07"


mdy("December 31st, 2021")

## [1] "2021-12-31"
dmy("31-Dec-2021")

## [1] "2021-12-31"
Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 17 / 37
Basics of - vectors (1)

A vector is a collection of elements, all of the same type.


Vectors play a crucial and helpful role R. R is also call a vectorized
language.
There are many ways to create a vector in
# the most popular way to create a vector
x<-c(2,3,5,7,11,13) #"c" means concatenates
length(x)

## [1] 6
s<-c("R", "VBA", "Python","C++")
print(s)

## [1] "R" "VBA" "Python" "C++"

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 18 / 37
Basics of - vectors (2)
To access elements of a vector, we use []
x<-c(2,3,5,7,11,13) #
x[5]

## [1] 11
x[2:4]

## [1] 3 5 7
There are many build-in functions result in a numerical vector in
x<-seq(1,2,0.2) # from 1 to 2 with jump 0.2
length(x)

## [1] 6
x<-seq(1,2,length=11) # from 1 to 2 with = 11
x
Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 19 / 37
Basics of - vectors (3)

x<-1:5
x*2 # We can not write this in C++ or Pascal ...

## [1] 2 4 6 8 10
xˆ2

## [1] 1 4 9 16 25
100
P 1
Example 1: Calculate .
n=1 n2
n<-100
x<-1:n
sum(1/xˆ2) # sum() return the sum of all element in a numerica

## [1] 1.634984

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 20 / 37
Basics of - vectors (4)
100
P 1
Example 2: Calculate ?
n=0 n!
n<-100
x<-0:n
sum(1/factorial(x)) # factorial(n) = n!

## [1] 2.718282
R2
Example 3: Calculate e x dx and compare the result to e 2 − 1. Note that
0
Rb
f (x ) dx can be approximated by
a

Zb n
!
X b−a i(b − a)
f (x ) dx ∼ ×f a+ with n is large
i=1
n n
a

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 21 / 37
Basics of - vectors (5)
n<-10000
x<-seq(0+(2-0)/n,2,(2-0)/n)
(2-0)/n*sum(exp(x))

## [1] 6.389695
exp(2)-1 # Analytical solution

## [1] 6.389056

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 22 / 37
Basics of - vectors (5)
n<-10000
x<-seq(0+(2-0)/n,2,(2-0)/n)
(2-0)/n*sum(exp(x))

## [1] 6.389695
exp(2)-1 # Analytical solution

## [1] 6.389056
Example 4: The standard normal random variable has distribution function
1 Rx − t 2
P(X ≤ x ) = √ e 2 dt. Calculate P(−0.5 < X ≤ 1)?
2π −∞

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 22 / 37
Basics of - vectors (5)
n<-10000
x<-seq(0+(2-0)/n,2,(2-0)/n)
(2-0)/n*sum(exp(x))

## [1] 6.389695
exp(2)-1 # Analytical solution

## [1] 6.389056
Example 4: The standard normal random variable has distribution function
1 Rx − t 2
P(X ≤ x ) = √ e 2 dt. Calculate P(−0.5 < X ≤ 1)? Answer
2π −∞
n<-1000
x<-seq(-0.5+(1--0.5)/n,1,length=n)
1/sqrt(2*pi)*(1.5/n*sum(exp(-xˆ2/2)))

## Dr.
[1] 0.5327246
Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 22 / 37
Basics of - vectors (6)

Logical operations on vectors result in a logical vector:


x<-1:7
x<=3 # return a logical vector

## [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE


sum(x<=3) # how many values in x ...

## [1] 3
any(x==1) # TRUE if any values in x ...

## [1] TRUE
all(x==1) # TRUE if all values in x are 1

## [1] FALSE
Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 23 / 37
Basics of - vectors (7)

Example 5: AirPassengers is a dataset (in form of a vector) in R


Is there any month that the number of passengers is more than 650 ?
What is the proportion of months that the number of passengers are
less than 300 ?

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 24 / 37
Basics of - vectors (7)

Example 5: AirPassengers is a dataset (in form of a vector) in R


Is there any month that the number of passengers is more than 650 ?
What is the proportion of months that the number of passengers are
less than 300 ?
Answer
(sum(AirPassengers>650)>0)

## [1] FALSE
sum(AirPassengers<300)/length(AirPassengers)

## [1] 0.5694444

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 24 / 37
Basics of - operations between vectors

Combine vectors of the same type


x<-1:3
y<-4:8
z<-9:10
c(x,y,z)

## [1] 1 2 3 4 5 6 7 8 9 10
x<-seq(1,2,0.5)
y<-c(3,4)
z<-seq(5,6,length=3)
c(x,y,z)

## [1] 1.0 1.5 2.0 3.0 4.0 5.0 5.5 6.0

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 25 / 37
Basics of - operations between vectors

x<-"I am an actuary, "


y<-"what is your talent ?"
c(x,y)

## [1] "I am an actuary, " "what is your talent ?"


z<-paste(x,y) # z is a character
z

## [1] "I am an actuary, what is your talent ?"

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 26 / 37
Basics of - operations between vectors

There are TWO methods to subtract a subvector from vector x.


1 Using a numerical vector y:
x<-c(2,3,5,7,11,13,17,19,23,29)
y<-c(10,1,4)
x[y] # x[10], x[1], x[4]

## [1] 29 2 7
y<-c(10,1,4,15)
x[y]

## [1] 29 2 7 NA

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 27 / 37
Basics of - operations between vectors

2 Using a logical vector y (we must have length(y ) = length(x )):


x<-c(2,3,5,7,11,13,17,19,23,29)
y<-(x %% 2 == 1)
y

## [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
x[y]

## [1] 3 5 7 11 13 17 19 23 29
x[!y]

## [1] 2

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 28 / 37
Basics of - operations between vectors

Example 6 Taking the numbers of passengers in months of June out of


AirPassengers dataset

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 29 / 37
Basics of - operations between vectors

Example 6 Taking the numbers of passengers in months of June out of


AirPassengers dataset
y<-0:11*12+6 # using numerical vector y
AirPassengers[y]

## [1] 135 149 178 218 243 264 315 374 422 435 472 535
# using a logical vector, the same result
y<-(1:length(AirPassengers))%%12 == 6
AirPassengers[y] ưekhkweh

## [1] 135 149 178 218 243 264 315 374 422 435 472 535

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 29 / 37
Basics of - operations between vectors

Which month has the lowest average number of passengers ?

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 30 / 37
Basics of - operations between vectors

Which month has the lowest average number of passengers ?


AverageNumber<-rep(0,12) # vector of length 12 and all element
for (i in 1:12){ # for loop will be introduced later
y<-0:11*12+i
AverageNumber[i]<-mean(AirPassengers[y])
}
which.min(AverageNumber)

## [1] 11

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 30 / 37
Basics of - operations between vectors
plot(x=1:12,y=AverageNumber,col="blue",type="b")
340
320
AverageNumber

300
280
260
240

2 4 6 8 10 12

1:12
Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 31 / 37
Basics of - operations between vectors

Operation between vectors with the same length: Let


x = (x1 , x2 , · · · , xn ) and y = (y1 , y2 , · · · , yn ), then x∆y will result in z
where z[i] = x [i]∆y [i] for all operation ∆.
x<-1:7
y<-7:1
x*y

## [1] 7 12 15 16 15 12 7
x<y

## [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE


x%/%y

## [1] 0 0 0 1 1 3 7

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 32 / 37
Basics of - operations between vectors

Operation between vectors with different lengths:


x<-1:2 # length(x) = 2
y<-1:5 # length(y) = 5
x-y # result in a vector with length = max(2,5) = 5

## Warning in x - y: longer object length is not a multiple of


## length
## [1] 0 0 -2 -2 -4
transformed x to (1,2,1,2,1) before computing x − y

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 33 / 37
Basics of - missing data
Missing data plays a critical role in both statistics and computing.
uses NA to denote missing data or undefined value.
x<-c(1,2,3,NA,5,NA)
is.na(x) # result in a logical vector

## [1] FALSE FALSE FALSE TRUE FALSE TRUE


x*2 # NA * 2 = NA

## [1] 2 4 6 NA 10 NA
x>3 # NA > 3 = NA

## [1] FALSE FALSE FALSE NA TRUE NA


y<-c(NA,2,NA,4,5,6)
x*y # losing information without handling missing value

## [1] NA 4 NA NA 25 NA
Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 34 / 37
Basics of - often used functions with vectors

length(x) : number of elements in vector x


sum(x) : sum of all elements in vector x
mean(x) : average of all elements in vector x
sd(x) : standard deviation of all elements in vecto
prod(x) : product of all elements in vector x
min(x) : minimum value of all elements in vector x
max(x) : maximum value of all elements in vector x
sort(x,decreasing = TRUE) : sorts vector x in decreasing order
sort(x,decreasing = FALSE) : sorts vector x in increasing order
sum(x) will be NA if vector x contains missing values. If we want to remove
these missing value, we use sum(x, na.rm = TRUE) instead.

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 35 / 37
Basics of - exercises

Ex1: Calculate from AirPassengers dataset, the proportion of months


with average number of passengers is
Less than 100?
More than 300 but less than 400 ?
More than 400 but less than 450 or more than 500 but less than 550 ?
Ex 2: Given x = (31, 2, 11, 37, 5, 13, 17, 29, 23, 19, 3, 7). Assigning to
vector y values in vector x but in reserve order.
Ex 3: Let x = Nile (measurements of the annual flow of the river Nile,
in 102 km2 , from 1871 to 1970).
What is the smallest and the largest values of annual flow?
In which year the annual flow is the smallest, is the largest?
In which year the annual flow is the 20th largest?

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 36 / 37
Basics of - exercises

R3
Ex 4: Calculate x 1.5 e −0.3x dx
0
Ex 5: Calculale A:

Ex 6: Let x = presidents
How many NA values are in x?
Which years there are no observation?

Dr. Nguyen Quang Huy Lecture 2: Getting started with 12 July, 2023 37 / 37

You might also like