Professional Documents
Culture Documents
Table of Contents
1. Logical Operators and Logical Expressions 4
2.1 Simple If 11
2.2 The If-Else Statement 12
2.2 Multiple If Statement 13
2.3 Vectorised If Statement 14
3. For Statement 15
4. While-Statement 17
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 2/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
Introduction
Until now, codes were executed sequentially; that is, the codes are executed in the way they
are written, and there are no conditional or recurrent executions. When there is a restriction
or control over the sequence of execution of a code, it is called ‘Control Flow’.
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 3/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
x=4
x>5
Output
TRUE
In the code above, x > 5 is the logical expression and the greater-than symbol (>) is the
relational operator. The output displays the logical truth value in R – TRUE. Note that the
letters are capital letters. Similarly, the less-than-relational operator can also be used. This
will give the output FALSE which will be in capital letters as well.
To check the equality between two entities, the == operator is used. Note that this has two
‘equal to’ symbols.
x=4
x == 5
Output
FALSE
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 4/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
In the above code, ‘=’ is the assignment operator and ‘==’ is the relational operator.
# Logical AND
l1 & l2
Output
[1] TRUE FALSE FALSE FALSE
In the code above, l1 is a vector of logical TRUE and FALSE values. The elements of l1 are
TRUE and FALSE, all capital letters. Similarly, the elements of l2 are all capital letters. The
logical AND is denoted by the ampersand (&) symbol and it can be used in between two
vectors, for example, l1 & l2. This will check every pair of elements in l1 and l2 according to
the logical condition AND. The first elements of both vectors are compared. Since both are
TRUE, the first element of the output is TRUE. Next, the second elements are compared.
Since one of them is FALSE, the second element in the output is FALSE and so on. This is
like a vectorised logical expression. The logical operator ‘AND’ operates on each pair of
elements from the vectors and it will output TRUE or FALSE based on the condition.
1.2.2 Logical OR
The logical OR operator uses the ‘piping symbol’ ( | ). This operator evaluates to TRUE if at
least one of the input arguments is TRUE. The example below, first checks if at least one of
the first elements of l1 and l2 is TRUE. In this case, both are TRUE. It sets TRUE for the first
element of the output. It then proceeds to the next elements. For the third element, both are
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 5/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
FALSE. Thus, it outputs FALSE for the third element. This is again a vectorised operation,
meaning it applies to each element of the vectors, l1 and l2.
l1 | l2
Output
[1] TRUE TRUE FALSE TRUE
!(l1)
Output
[1] FALSE TRUE TRUE FALSE
Logical Not flips the truth value of the elements. In the code above, ‘!(l1)’ turns the TRUE to
FALSE and the FALSE to TRUE for the elements in l1. The logical NOT operator is used often.
For example, to check if a name is in a list and so on.
Output
Warning in l1 && l2 : 'length(x) = 4 > 1' in coercion to 'logical(1)'
Warning in l1 && l2 : 'length(x) = 4 > 1' in coercion to 'logical(1)'
[1] TRUE
Note that there is output along with a warning. The warning indicates that the length of the
input supplied is greater than one. It is not an error, but a warning. Long-form is expected to
be used with arrays of one element only. The Long-form logical AND compares only the first
element and ignores the rest. It ignores the remaining elements of the array. If one of the
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 6/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
first elements in l1 or l2 were FALSE, the output would be FALSE. In this case, changes to
any other elements except the first will have no impact on the output.
l1 || l2
Output
Warning in l1 || l2 : 'length(x) = 4 > 1' in coercion to 'logical(1)'
Warning in l1 || l2 : 'length(x) = 4 > 1' in coercion to 'logical(1)'
[1] FALSE
Like the Long-form AND operation, only the first elements are considered. Hence, there is a
warning along with the output.
The Short-form is typically used when dealing with vectors and Long-form is used when we
have a single element.
Output
[1] TRUE
In the above code, the vector ‘names’ have 'Ajith', 'Priya', and 'Gabriel' as their elements. The
notation %in% checks if the string provided on the left matches with any elements in the
vector provided on the right side of the notation. Since the string ‘Ajith’ matches with one of
the elements, the output is TRUE. If the spelling is changed or the case is changed, the
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 7/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
output would be FALSE. The entire sentence, “‘Ajith’ %in% names” is considered a logical
expression. This is not only applicable to vectors, but also to more complicated objects.
Output
[1] FALSE
This will check if ‘Ajith’ is not present in the vector called ‘names’.
all(l1)
Since not all the elements are TRUE in l1, we get the output as FALSE.
Output
[1] FALSE
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 8/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
any(l1)
Output
[1] TRUE
Since l1 contains at least one TRUE value, the output of any() is TRUE.
Output
NA
When it comes to the missing value it cannot make any judgment about this. So 'R' simply
prints 'NA'. In such situations, the methods ‘isTRUE()’ and ‘isFALSE()’ are used. ‘isTRUE()’
will return TRUE if all the elements are TRUE and there is no ‘NA’ or missing elements. If
there are missing elements, it returns FALSE and not ‘NA’. isFALSE() will return TRUE if all
elements are FALSE and there is no ‘NA’ or missing elements.
isTRUE(l1)
Output
[1] FALSE
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 9/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
x= c(1:4)
y= (x^1/2)^2
x==y
Output
[1] FALSE FALSE FALSE TRUE
In the code above, the check, x == y results in FALSE. Square root is a floating-point
operation. Because of round-off errors and the limited precision of the computer, the
resulting expression when squared does not give back the same value. The function
‘all.equal()’ can be used in such cases.
all.equal(x,y,0.5)
Output
[1] TRUE
In the above code, x is technically not equal to y because of the finite precision arithmetic
but they are almost close to each other. The function ‘all.equal()’ checks if two quantities
are almost close to each other and within the tolerance specified.
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 10/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
2. If Statement
2.1 Simple If
Conditional execution of statements is done using the ‘if’ statement. The basic syntax is
given below:
Syntax
If (logical expression) {
# Statements to execute if the logical expression evaluates to TRUE.
}
We start with the if keyword and then the logical expression enclosed within parentheses.
The body of the ‘if statement’ contains the set of statements to execute if that logical
condition is true. The body of the if statement starts and ends with curly braces.
# if statement
x=4
if (x %% 2 == 0 ) { print (‘even’) }
Output
[1] "even"
The code above simply prints the string “even” as 4 modulo 2 is 0 and the logical expression
is TRUE. If the code is changed as shown below no output is generated.
# if statement
x=4
if (x %% 2 != 0 ) { print (‘odd’) }
Output
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 11/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
This is because the logical expression evaluates to false. The statements within the if
condition is not executed. The statements can be written in the same line or they can be
written in the next line. However, these will have to be enclosed within curly braces. It is
always recommended to write the code in the next line with an indentation so that the
conditional statements are distinguishable.
# if statement
x=4
if (x %% 2 != 0 ) {
print (‘odd’)
}
Syntax
If (logical expression ) {
# Statements to execute if the logical expression evaluates to TRUE.
} else {
# Statements to execute if the logical expression evaluates to FALSE.
}
A sample code to demonstrate if-else statement is shown below:
# if statement
x=4
if (x %% 2 != 0 ) {
print (‘odd’)
} else {
print (‘even’)
}
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 12/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
Output
[1] "even"
In the above code, the logical condition evaluates to false. In this case, the statements within
the else part are executed. Hence, the output “even” is displayed.
# if statement
x=0
if ( x > 0 ) {
print (‘positive’)
} if ( x < 0 ) {
print (‘negative’)
} else {
print( ‘x is zero’)
}
Output
[1] "x is zero"
In the above output, the statement in the final ‘else’ is executed. Hence, ‘x is zero’ is
displayed.
In the codes above, simple use cases have been used to demonstrate the concepts.
However, for more complicated examples, the structure will remain the same.
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 13/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
x = c(1:10)
print(x)
Output
[1] 1 2 3 4 5 6 7 8 9 10
# Vectorized if-statement
x = c(1:10)
print(x)
ifelse(x %% 2 == 0, 'even', 'odd')
Output
[1] 1 2 3 4 5 6 7 8 9 10
[1] "odd" "even" "odd" "even" "odd" "even"
[7] "odd" "even" "odd" "even"
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 14/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
In this case, the execution on each element of ‘x’ occurs simultaneously and is called
vectorised operation. This ‘ifelse()’ function is used to check a condition on multiple
elements on a vector or vector-like object.
3.For Statement
To do an operation a certain number of times, for-statement can be used. For example, to
print a set of quantity ‘x’ ten times.
Syntax
for (value in sequence){
The for-statement starts with the keyword ‘for’, followed by an expression. This determines
the number of times a particular operation must be performed. For example, ‘for (val in x)’,
means that the execution is going to take place for all the values of the vector ‘x’. The code
below demonstrates the use of ‘for’ to calculate the square of x for all elements from 1 to
10. Please note the keyword ‘in’ is not enclosed within the symbol ‘%’.
# For-statement
x = c(1:10)
for (val in x){
y= x^2
print(y)
}
Output
[1] 1 4 9 16 25 36 49 64 81 100
[1] 1 4 9 16 25 36 49 64 81 100
[1] 1 4 9 16 25 36 49 64 81 100
[1] 1 4 9 16 25 36 49 64 81 100
[1] 1 4 9 16 25 36 49 64 81 100
[1] 1 4 9 16 25 36 49 64 81 100
[1] 1 4 9 16 25 36 49 64 81 100
[1] 1 4 9 16 25 36 49 64 81 100
[1] 1 4 9 16 25 36 49 64 81 100
[1] 1 4 9 16 25 36 49 64 81 100
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 15/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
In the above output, squares of ‘x’ are printed multiple times. This is because x^2 operates
on the entire vector. The loop variable ‘val’ takes values from 1 to 10, and for each value, the
square of the entire vector x is calculated and displayed.
The code to display squares of individual elements is given below. In the first version, the
loop variable, ‘i', takes the values from 1, 2, 3 and so on, until the length of the vector x. We
access the element as x[i]. In the second version, the loop variable ‘val’ directly takes on the
values of the elements of the vector x whose squares are then printed.
# For-statement
x = c(1:10)
# version-1
for (i in 1:length(x)){
y[i]= x[i]^2
print(y[i])
}
# version-2
for (val in x){
print(val^2)
}
Output
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
[1] 36
[1] 49
[1] 64
[1] 81
[1] 100
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 16/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
As shown in version-2 above, a loop variable can traverse over a vector of names as well.
An example is given below. The vector contains three strings. We can display the vector
using a print statement.
Output
"Ajith" "Priya" "Gabriel"
# For-statement
participants = c('Ajith','Priya','Gabriel')
for (names in participants) {
print(names)
}
Output
[1] "Ajith"
[1] "Priya"
[1] "Gabriel"
In this case, the looping variable ‘names’, take the values ‘Ajith’, ‘Priya’, and ‘Gabriel’ in each
iteration of the loop, respectively.
4.While-Statement
The ‘for loop’ is used to execute a statement or a particular sequence of statements a
specific number of times. ‘While-statements’ are used when a sequence of statements must
be executed as long as some condition is satisfied.
For example: as long as the age is greater than 30, keep repeating this operation.
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 17/20
CONTROL FLOW AND USER-DEFINED FUNCTIONS IN R
Syntax
while ( logical expression is true )
{
# Set of statements to be executed
}
The code below demonstrates the use of while. In this case, the function, ‘runif()’ is used.
The ‘runif()’ function generates a uniform random variable. We generate a new random
number between 0 and 1 in a loop and we repeat the loop as long as the newly generated
value is greater than 0.3. Note that the number x must be initialised to a number greater than
0.3 so that the loop is executed for the first time.
# While-statement
x=1
while (x > 0.3) {
x = runif(1)
print(x)
}
Output
[1] 0.8326397
[1] 0.8946775
[1] 0.6325222
[1] 0.8007339
[1] 0.3405287
[1] 0.4995432
[1] 0.1090243
The output shown above may vary across executions as ‘runif()’ generates a random number
each time. Each time, x is assigned a new value and displayed. The condition is checked and
if TRUE, the loop is executed again. In the above output, once x takes the value 0.1090243
since it is less than 0.3, the control is transferred out of the loop. A while-statement is just
like a for statement except that it is based on a logical expression.
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 18/20
MDS5103/ MSBA5104 Segment 03
SIMULATION USING R
IN-BUILT FUNCTIONS -
TUTORIAL
SIMULATION USING R IN-BUILT FUNCTIONS
Table of Contents
1. Functions for Simulating Random Experiments 4
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 2/20
SIMULATION USING R IN-BUILT FUNCTIONS
Introduction
In real-world situations, the occurrence of an event cannot entirely be predicted. The outputs
can be random. R provides methods and functions to simulate such random experiments
and to generate random numbers. This topic covers the various options available in R to
perform such operations and visualise them.
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 3/20
SIMULATION USING R IN-BUILT FUNCTIONS
# Corresponding probabilities
p = (1/6) * replicate(6, 1)
The sample space, that is, all the possible outcomes in one run of the experiment is
initialised to the variable ‘s’. In this case, the possible outcomes in one run of rolling or one
time rolling of the fair die is any number between 1 to 6. Hence, a vector with values from 1
to 6 is created and assigned to ‘s’. The likelihood or the probability of 1 appearing after rolling
of a fair die is 1/6. This is the case for each of the numbers on the die. Ideally, to specify the
probabilities for each number, we provide 1/6 six times. However, we can use the replicate
function as shown in the code above. The replicate function has multiple purposes. The
simplest one is to replicate a number or a string multiple times in a vector. In this case, it
creates a vector of six 1s. It is then multiplied by 1/6 to obtain the probabilities of each
number on the die. To simulate the experiment, we use the sample() function.
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 4/20
SIMULATION USING R IN-BUILT FUNCTIONS
Syntax
sample(x, size, replace = FALSE, prob = NULL)
In the above syntax, ‘x’ refers to the vector on which sampling must be performed; ‘size’
refers to the number of selections in each iteration, and ‘replace’ refers to a Boolean that
indicates if the item selected previously is put back for the next selection. This value is
relevant only if the size is greater than one. ‘prob’ provides the probability for picking each
of the items in the sample space. The probability is provided as a vector.
In the code, the sample space and probability are provided to the sample() function. To begin
with, only 1 output is required. Hence, the second parameter is set to 1.
The output of the code is shown below:
Output
[1] 4
The sample may return any number between 1 and 6. In the output shown above, the result
is 4 since the sample size is specified as 1. The result may be different for each execution.
To simulate the rolling of a pair of dice, we can specify the size (second parameter of the
sample() function) as 2 as shown below. Note that we can specify the ‘replace’ as TRUE
since the two dice are independent of each other.
# Corresponding probabilities
p = (1/6) * replicate(6, 1)
Output
[1] 4 2
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 5/20
SIMULATION USING R IN-BUILT FUNCTIONS
To repeat this experiment—rolling a die or rolling a pair of dice several times—we can use
the replicate method.
Syntax
replicate(n, expression)
Here ‘n’ is the number of times and ‘expression’ is the function to be executed.
The code below shows the sampling done with a pair of dice executed 10 times.
# Corresponding probabilities
p = (1/6) * replicate(6, 1)
Output
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4 6 4 5 4 2 3 5 1 1
[2,] 3 2 6 1 6 5 3 2 1 2
The output is a matrix or a 2D array. The outcomes corresponding to each run of the random
experiment appear column-wise. The first simulation returns 4 and 3, the second 6 and 2
and, so on. Since the ‘nsimulations’ is set to 10, there are 10 pairs of entries.
In the code below, the number of simulations is increased to 100 and the result is stored in
a variable named ‘simulated_data’. The structure of the same is displayed below.
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 6/20
SIMULATION USING R IN-BUILT FUNCTIONS
# Corresponding probabilities
p = (1/6) * replicate(6, 1)
Output
int [1:2, 1:100] 6 5 6 3 4 2 6 2 6 5 ...
The resulting structure, as shown above, is a matrix with 100 columns and 2 rows.
As shown in the code below, we set the number of simulations to 10 or 1e1 and replicate
the simulations. The result, a 2D array, is stored in the variable named simulatedData. Now,
the frequency for each sum must be calculated. Frequency is the number of times some
event occurs. To do this, we create a user-defined function named checkEvent. The syntax
for a user-defined function is given below
Syntax
function_name = function(arguments) {
# Statements
}
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 7/20
SIMULATION USING R IN-BUILT FUNCTIONS
Syntax
apply( x, row-wise(1) or column-wise(2), function )
apply() invokes the function specified as argument for all the combinations of rows or
columns on data x. The row-wise application or the column-wise application is determined
by the second parameter of the apply() function. In this case, the function must be applied
column-wise since the column contains the output of a single simulation. Hence, the apply()
function is applied on simulatedData with the second parameter as 2 and the function as
‘checkEvent’.
nsimulations = 1e1
simulatedData = replicate (nsimulations, sample ( s, 2, replace = TRUE, prob = p))
Output
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 4 4 2 1 5 3 6 1 4
[2,] 6 3 6 1 2 1 3 1 1 6
[1] 1 1 1 0 0 0 0 1 0 1
The output displays the generated numbers and the result. The result is a series of 1 and 0,
which is the result of applying the ‘checkEvent’ function for each column in simulated data.
For example, the first outcomes are 1 and 6 (values of the 1st column). The sum of this is
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 8/20
SIMULATION USING R IN-BUILT FUNCTIONS
greater than 7, hence the result is 1. The fourth outcome is 2 and 1. The sum is less than 7,
hence the output is 0 and so on.
To find the probability, we need to calculate the events that were successful (greater than
or equal to 7) and divide them by the total simulation. Alternatively, we can apply the mean()
function to the result.
In the code shown below, the number of simulations has been increased to 100,000 (1e5)
and the mean() function has been applied to the result of the simulations. Typically, the
simulations have to be applied a significant number of times to get the probability.
nsimulations = 1e5
simulatedData = replicate (nsimulations, sample ( s, 2, replace = TRUE, prob = p))
Output
[1] 0.58498
The output will always be close to 0.58 indicating that the probability of getting the sum of
two die rolls greater than or equal to 7 is about 0.58.
The code given below is another example of finding the probability of an event. In this case,
the probability of getting an even number in the first die of a roll needs to be found. As with
the previous case, the data received by the function checkEvent1 is a 1D array containing
the result of a single experiment. We can check the 1st element and return 1 if even and 0
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 9/20
SIMULATION USING R IN-BUILT FUNCTIONS
Output
[1] 0.4977
The output as expected would be around 0.5, indicating that there is a 50% chance of
getting an even number.
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 10/20
SIMULATION USING R IN-BUILT FUNCTIONS
In this, ‘n’ is the number of observations in a single trial, ‘size’ is the number of times the
experiment needs to be repeated, and ‘prob’ is the probability of success.
The code below demonstrates the generation of random variables using rbinom(). We are
going to draw 10 balls. In this, six black ones are considered success (probability 0.6) and
we are going to repeat this experiment 10 times.
Output
[1] 8 7 4 7 5 8 6 4 6 9
The output is a set of random variables between 1 and 10 indicating how many black balls
were found in each iteration. There are 8, 7, and 4 black balls in the first 3 iterations
respectively.
This can be run several times (e.g., 100000), and the result can be stored in the variable
simulatedData. The frequency of occurrence of each number can be calculated using the
table() function as shown below. This can then be converted into a dataframe using the
as.data.frame() method.
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 11/20
SIMULATION USING R IN-BUILT FUNCTIONS
Output
Value Frequency
0 13
1 161
2 1005
3 4258
4 11268
The output displays the count of times a particular number of black balls were taken. That
is, the number of times zero black balls were taken is 13 (out of 100000 times the experiment
was repeated). One black ball was taken 161 times, two black balls were taken 1005 times,
and so on.
p=ggplot(data=df) +
geom_col(aes(x=Value, y = Frequency), width=0.7, fill='steelblue') +
ggtitle("Simulating a binomial random variable") +
labs(x="Values", y="Frequency") +
theme_minimal()
p
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 12/20
SIMULATION USING R IN-BUILT FUNCTIONS
Output
The output is shown above. The result is an almost symmetrically distributed graph around
the value 6.
rpois(100, lambda=10)
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 13/20
SIMULATION USING R IN-BUILT FUNCTIONS
Output
[1] 7 5 9 4 10 6 9 13 10 11 11 9 12 9 15 7 14
[18] 4 8 9 9 8 13 14 8 12 14 15 13 12 10 10 10 6
[35] 12 6 9 14 8 11 9 4 12 9 10 6 12 11 12 10 16
[52] 15 14 11 6 6 9 11 7 8 5 6 13 7 10 11 13 10
[69] 14 12 9 10 4 6 10 12 9 11 14 11 10 15 8 10 6
[86] 13 10 12 7 9 11 12 11 8 11 13 9 8 13 6
The output shows how many customers showed up in the five minutes slot. Most of the
numbers are around the value 10. This has very interesting practical applications. For
example, we can also simulate the number of people arriving at a bus stop in the next minute,
or the number of photons that strike a pixel on a sensor and so on.
Syntax
rnorm(n, mean, sd)
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 14/20
SIMULATION USING R IN-BUILT FUNCTIONS
In this syntax, ‘n’ represents the count of numbers to be generated, ‘mean’ the expected
mean of the entire data to be generated, and ‘sd’ the standard deviation.
A sample code to generate 10 continuous random variables with mean as 170 and standard
deviation as 8 is given below.
Output
149.4088 159.3580 181.0153 166.8341 165.4002
171.8139 163.5023 172.2372 170.2813 171.7389
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 15/20
SIMULATION USING R IN-BUILT FUNCTIONS
Output
Height
[2,] FALSE
2 181.4812
3 175.4368
4 166.8967
5 169.0739
The data thus generated can be used to calculate the probability of occurrence of events.
For example, the data can be used to calculate the probability that a random person’s height
is between 170-171 cm. The code below displays TRUE if the number is greater than or equal
to 170 and less than or equal to 171 for all the 100000 simulated values.
Output
Height
[1,] FALSE
[2,] FALSE
[3,] FALSE
[4,] TRUE
[5,] FALSE
[6,] FALSE
[7,] FALSE
[8,] FALSE
[9,] FALSE
[10,] FALSE
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 16/20
SIMULATION USING R IN-BUILT FUNCTIONS
To determine the probability, we must calculate the fraction of total success to the total
count. Alternatively, we can use the mean() function to do the same. ‘TRUE’ is considered
as 1 and ‘FALSE’ is considered as 0 in the calculation of the mean().
Output
[1] 0.05
The output implies that for a normally distributed data with mean as 170 and standard
deviation as 8, the probability of value being between 170 and 171 is 0.05.
2.3.2 Visualisation
The random continuous values can be visualised using histograms. The sample code is
shown below. We first initialise the plot object. To this, a geom_histogram layer is added.
The x-axis is set as the ‘Height’ column of the new dataframe and the y-axis is initially set to
the internal variable – ‘..count..’. Since this is a histogram, we need to specify the width of
each bar. This is done using the seq() function. This generates a sequence of numbers. This
generates number from (mean-4*standard deviation) in steps of 2 till (mean+4*standard
deviation). This is provided as the value for the argument ‘breaks’ in the geom_histogram()
function. Other attributes like colour, fill, alpha, and labels can be specified as shown below.
delta = 2
p1 = ggplot(df) +
geom_histogram(aes(x=Height, y = ..count.. ), breaks = seq(mu-4*sigma, mu+4*sigma,
by=delta), color = 'black', fill= 'steelblue', alpha = 0.4 ) +
labs (x='Height', y= ‘Count’ )
p1
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 17/20
SIMULATION USING R IN-BUILT FUNCTIONS
Output
The height of the histogram indicates the count of the simulated values that fall within the
range specified in the x-axis. For example, around 10,000 simulated values fall between 170
and 172, etc.
The code below displays the relative frequency. In this, the y-axis is ‘..count../sum(..count..)’.
delta = 2
p1 = ggplot(df) +
geom_histogram(aes(x=Height, y = ..count../sum(..count..) ), breaks = seq(mu-4*sigma,
mu+4*sigma, by=delta), color = 'black', fill= 'steelblue', alpha = 0.4 ) +
labs (x='Height', y= ‘Count’ )
p1
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 18/20
SIMULATION USING R IN-BUILT FUNCTIONS
Output
In the above output, the histogram height is normalised between 0 and 1 for each range.
We can also use the ‘..density..’ variable on the y-axis. This will divide the height with the
width of each bar. In this case, the histogram height is divided by 2.
delta = 2
p1 = ggplot(df) +
geom_histogram(aes(x=Height, y = ..density.. ), breaks = seq(mu-4*sigma, mu+4*sigma,
by=delta), color = 'black', fill= 'steelblue', alpha = 0.4 ) +
labs (x='Height', y= ‘Count’ )
p1
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 19/20
SIMULATION USING R IN-BUILT FUNCTIONS
Output
In the above output, since ‘..density..’ is used, the relative frequency is divided by 2.
©COPYRIGHT 2022 (Ver. 1.0), ALL RIGHTS RESERVED. MANIPAL ACADEMY OF HIGHER EDUCATION 20/20