You are on page 1of 92

Advanced R

Chapter 2.4: Functions

Daniel Horn & Sheila Görz

Summer Semester 2022

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 1 / 92


What is a function?

What is a function?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 2 / 92


What is a function?

Example

square = function(x) {
# Simply return x^2
return(x^2)
}
square(5)

## [1] 25

A function in R:
is (unsurprisingly) defined by the key word function.
packages instructions and executes them when called.
is called by round brackets.
can depend on parameters.
has been used by all of us 10 000 times already.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 3 / 92


What is a function?

Defining a function
A function is defined via the key word function followed by round
brackets and an R expression. The brackets may be empty.
function(arg1 = default1, arg2 = default2, ...)
expression

If we want to execute multiple instructions when calling the function, we


have to enclose them with curly brackets.
function(arg1 = default1, arg2 = default2, ...) {
expression1
expression2
expression3
...
}

Usually, we assign the function object to a variable – this is not strictly


necessary, however.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 4 / 92
What is a function?

Statements and expressions

Over the course of this chapter, we’ll frequently encounter the term
expression. Expressions are one of the 24 data types in R and describe R
code which is to be executed.

An R statement is a syntactically correct string, e.g. x + 2, sin(2),


wheras sin(2 +) is not a statement
An expression is a sequence of unevaluated R statements
Expressions can be evaluated inside of an environment using eval().
The result may differ depending on the environment.

In this chapter, we’ll never explicitly generate expressions. Some


components of functions are, however, represented by expressions of length
1, e.g. the default values of parameters.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 5 / 92


What is a function?

Functions are objects


Functions are but objects in R and we can do almost everything we have
learned in the previous chapters with them. We can
assign them to a variable (which we just did),
provide them with arbitrary attributes,
add them to a list,
use them as input for another function (e.g: apply()).
However: Functions are not a basic vector type like integers. Thus, we can
not generate an atomic vector containing two or more functions. In this
case, R immediately falls back to lists.
str(c(sum, mean))

## List of 2
## $ :function (..., na.rm = FALSE)
## $ :function (x, ...)

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 6 / 92


What is a function?

Attributes and functions

attr() and attributes() work just like before.


Functions have a special attribute: srcref. It contains the function’s
source code including comments.

attr(square, "name") = "Squarefunction"


attributes(square)

## $srcref
## function(x) {
## # Simply return x^2
## return(x^2)
## }
##
## $name
## [1] "Squarefunction"

This can be turned off via options(keep.source = FALSE).

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 7 / 92


What is a function?

Different types of functions

There are three types of functions in R: Closure, builtin and special.


Closures are the most common type. It’s the only type users can create
functions of. Builtin and special functions are exclusively found in the base
package.
typeof(square)

## [1] "closure"

typeof(round)

## [1] "special"

typeof(sum)

## [1] "builtin"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 8 / 92


What is a function?

Closures
Closure
A closure in R consists of three to four components:
1 Body: Contains the R code from the definition of the function.
2 Formals: A list of the function’s arguments and default values.
3 Environment: The function’s enclosing environment.
4 Optional: Bytecode: Compiled code.

The term ’closure’ refers to the combination of function and enclosing


environment. All three/four components are displayed when printing the
function:
mean

## function (x, ...)


## UseMethod("mean")
## <bytecode: 0x0000000008898858>
## <environment: namespace:base>

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 9 / 92


What is a function?

Builtin and special

Both types don’t contain R code, but rather directly call the
underlying C code. Thus, they can be more efficient.
formals, body and environment are NULL
Both types can behave differently than closures – in every imaginable
way. We’ll take a look at some examples.
They exist only in the base package – that’s why only R developers
are able to define new ones.
Builtin functions evaluate their arguments before passing them to the
internal (C) function whereas special functions directly pass
unevaluated arguments.
From here on, we’ll be dealing almost exclusively with closures - though at
some points we’ll have to take a look at the behavior of builtins and
specials.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 10 / 92


First component: Formals

First component: Formals

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 11 / 92


First component: Formals

The list of arguments

Also known as: The formal arguments / parameters of a function.


It can be extracted and set via formals().
Type: A so-called pairlist.

formals(square)

## $x

typeof(formals(square))

## [1] "pairlist"

str(formals(square))

## Dotted pair list of 1


## $ x: symbol

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 12 / 92


First component: Formals

Pairlists
Pairlists are deprecated for normal use! Thus, we’ll never actually use
them, but internally, they are used quite frequently.
Differences: List vs. pairlist
Empty pairlists are NULL as opposed to a list of length 0.
Pairlists are singly linked objects. Each element only knows its
successor. The total length of a pairlist is unknown.
Lists are based on a table in which all elements are linked.
Every pairlist can be treated just like a normal list. However, normal lists
are usually more efficient. This is why pairlists are often automatically
converted to lists.
str(formals(square)[1])

## List of 1
## $ x: symbol

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 13 / 92


First component: Formals

Content of formals
So, formals are merely a named list of arguments. The names are
mandatory, each formal argument has a name.
groceries = function(Milk = 1, Bread, Butter = "Landliebe"){
list(Milk = Milk, Bread = Bread, Butter = Butter)
}
str(formals(groceries))

## Dotted pair list of 3


## $ Milk : num 1
## $ Bread : symbol
## $ Butter: chr "Landliebe"

There are two types of arguments:


1 With default values: A named list element of the type default with the
respective value.
2 Without default values: A named element of type symbol (= name of
an R object, named R object ’without any content’).
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 14 / 92
First component: Formals

The input’s type is not specified


Formals don’t specify the type of arguments, parameters are not typed.
This differentiates R from many other programming languages.
f = function(x = 2)
return(x)
f(1)

## [1] 1

f("a")

## [1] "a"

R doesn’t check the type of inputs. This can lead to errors:


f = function(x = 2)
return(x + 1)
f("a")

## Error in x + 1: non-numeric argument to binary operator

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 15 / 92


First component: Formals

Argument matching I

When a function is called, the specified parameters (calling parameters) are


matched to the function’s formal parameters. This process is called
argument matching.
str(groceries(5, Bread = 4, B = "Rama"))

## List of 3
## $ Milk : num 5
## $ Bread : num 4
## $ Butter: chr "Rama"

Three possible ways of argument matching


Name of the named calling parameter (complete matching)
Partial match of the name (partial matching)
Position of the unknown calling parameter (positional matching)

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 16 / 92


First component: Formals

Argument matching II

These three possibilities are checked in exactly this order until every calling
parameter is matched to a formal parameter.
1 Is there a formal parameter with an identical name? If so, the
respective calling parameter is matched to it.
2 If not: Is there a formal parameter whose beginning matches the name
of the calling parameter? If there is exactly one fitting parameter, then
they are matched. If there are multiple fitting parameters, an error
occurs.
3 If there are named calling parameters that can’t be matched, an error
occurs (exception: the ’...’ argument).
4 All unnamed calling parameters are matched according to the order of
yet unmatched formal arguments. If there remain more calling
parameters than formal arguments, an error occurs.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 17 / 92


First component: Formals

Argument matching: Examples I

Already matched arguments are skipped when checking for position.


str(groceries(Bread = 4, 2, "Landliebe"))
## List of 3
## $ Milk : num 2
## $ Bread : num 4
## $ Butter: chr "Landliebe"

We can mix the matching possibilities arbitrarily. Complete matching


occurs before partial matching.
str(groceries(Bread = 4, 2, B = "Landliebe"))
## List of 3
## $ Milk : num 2
## $ Bread : num 4
## $ Butter: chr "Landliebe"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 18 / 92


First component: Formals

Argument matching: Examples II

When using positional matching, arguments can be left unspecified.


str(groceries(, 2, 4))
## List of 3
## $ Milk : num 1
## $ Bread : num 2
## $ Butter: num 4

Arguments that don’t exist throw errors - even using positional


matching.
str(groceries(Apple = 2))
## Error in groceries(Apple = 2): unused argument (Apple = 2)
str(groceries(Bread = 4, Butter = "Landliebe", , 2))
## Error in groceries(Bread = 4, Butter = "Landliebe", , 2): unused
argument (2)

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 19 / 92


First component: Formals

Argument matching: How should it be used?


R offers a very flexible system allowing for many possibilities – however,
most of them lead to incomprehensible code.
mean(, TRUE, x = c(1:10, NA))

## [1] 5.5

mean(x = 1:10)

## [1] 5.5

Good practices
1 Set the first or the first two arguments via positional matching.
2 Match all other arguments via complete matching.

mean(c(1:10, NA), trim = 0.15, na.rm = TRUE)

## [1] 5.5

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 20 / 92


First component: Formals

Alternative function call: do.call()


In some cases the arguments of a function are present as a named list – for
this particular situation, the function do.call() exists.
Arguments of do.call()
what Function or name of function to be called.
args Named list, what will be called with these parameters.

li = list(Milk = 1, Bread = 2, Butter = "Landliebe")


str(do.call(groceries, li))

## List of 3
## $ Milk : num 1
## $ Bread : num 2
## $ Butter: chr "Landliebe"

Sometimes do.call() is the only possibility of calling a function. A


popular example is do.call(rbind, lapply(...)).
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 21 / 92
First component: Formals

Default values

Using argument matching, we have now matched every calling parameter


to a formal argument. But it doesn’t work the other way around: It’s
possible to have formal arguments that are not matched to a calling
parameter. But what happens when such a parameter is accessed during a
function’s execution?
1 When a calling parameters has been matched, it will be used.
missing() can be used to check for such bindings.
2 The respective parameter’s default value is used.
3 An error occurs.
Beware: An error only occurs if said parameter is accessed. If this
parameter is never accessed during the function’s execution, then no error
can occur. This is a side effect of R’s lazy evaluation.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 22 / 92


First component: Formals

Lazy evaluation

Lazy evaluation
Arguments in R are not evaluated until they are actually needed. Until then,
they are so-called promise objects.

f = function(x) {
return(10)
}
f(stop("This is an error!"))

## [1] 10

Here, an expression is assigned to the argument x. Once it’s evaluated, it


throws an error. However, the parameter x is never actually accessed during
the function. As such, it’s never evaluated and doesn’t throw an error
which is why the function can return the value 10.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 23 / 92


First component: Formals

I know the value, I promise

A promise is the pledge that the value of a variable is known. However,


only the way to calculate said value is currently known, not the value itself.
A promise object consists of three components:
1 value: The promised value. It doesn’t exist at first.
2 expression: The expression matched to the argument.
3 environment: The current environment of the call.
Once the promise object is accessed, the expression is evaluated inside of
the environment and the result stored as the value. If the object isn’t
accessed, this evaluation doesn’t occur.

Promises can be explicitly generated using the delayedAssign() function.


Otherwise, they only occur in the context of lazy evaluation.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 24 / 92


First component: Formals

Use case 1: Dependent default values I


Default values for parameters can arbitrarily depend on each other. Only
when a parameter is evaluated for the first time, all values must be known.
f = function(x, y = 10 * x) {
return(y)
}
f(2)

## [1] 20

They can even depend on values that are defined in the function:
f = function(x = 2 * internal.param) {
internal.param = 10
return(x)
}
f()

## [1] 20

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 25 / 92


First component: Formals

Use case 1: Dependent default values II

It’s possible to define defaults recursively in a syntactically correct manner.


However, their evaluation then leads to an error:
f = function(x = x)
return(x)
f()

## Error in f(): promise already under evaluation: recursive default argument


reference or earlier problems?

This can also happen for the interplay of two parameters:


f = function(x = y, y = x)
return(x + y)
f()

## Error in f(): promise already under evaluation: recursive default argument


reference or earlier problems?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 26 / 92


First component: Formals

Use case 2: Avoiding (expensive) evaluations

Evaluating some R expressions can be costly (runtime), or might lead to


errors in particular situations. Lazy evaluation allows for only evaluating
these expressions when it’s actually necessary and reasonable:
f = function(name, fun = get(name), value) {
if (missing(fun) && (missing(name) || !exists(name)))
stop("My own error message.")
return(fun(value))
}

f(fun = sin, value = 2) f(name = "sin", value = 2) f(name = "sine", value = 2)

## [1] 0.9092974 ## [1] 0.9092974 ## Error in f(name =


"sine", value = 2): My
own error message.

By the way, the && and || functions used here are another example for this.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 27 / 92
First component: Formals

Forcing an evaluation: force()


Sometimes it’s necessary to circumvent the lazy evaluation and force the
evaluation of a parameter. This can be done with force():
f = function(x) {
force(x)
return(10)
}
f(stop("This is an error!"))

## Error in force(x): This is an error!

The definition of force() is very simple since it only suffices to evaluate x.


force

## function (x)
## x
## <bytecode: 0x0000000006de70f8>
## <environment: namespace:base>

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 28 / 92


First component: Formals

Calling parameters versus default values

Caution: Calling parameters are evaluated before a function is executed


while default values are evaluated during execution. This leads to different
results for these two calls, because ls() is called at different points of time.
f = function(x = ls()) {
a = 1
return(x)
}
f()

## [1] "a" "x"

f(x = ls())

## [1] "f" "groceries" "li" "square"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 29 / 92


First component: Formals

The ’...’ argument

If there are calling parameters (named or unnamed) that can’t be matched


to any formal parameter, usually an error occurs. However, if the function
is provided with a ... argument, then all these parameters are collected by
this special argument:
fun = function(x, y, z, ...){
return(list(x = x, y = y, z = z))
}
str(fun(1, y = 2, z = 3, a = 7, x = 4, "a name"))

## List of 3
## $ x: num 4
## $ y: num 2
## $ z: num 3

Thus, no errors occur anymore and the collected calling parameters can be
processed inside of the function.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 30 / 92


First component: Formals

Use case 1: Passing arguments to function calls I

A function with the ... argument can pass the arguments collected by it
to another function with the ... argument:
absoluteMean = function(x, ...) {
abs.vals = abs(x)
abs.mean = mean(abs.vals, ...)
return(abs.mean)
}
absoluteMean(c(1, -1, 2, NA), na.rm = TRUE)

## [1] 1.333333

An important example for the use of ... is the plot function. plot() itself
only possesses a few parameters like x, y, xlim etc. However, a variety of
additional graphical parameters (e.g. col) can be passed to plot which
will be passed to more basic functions like par() via ...

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 31 / 92


First component: Formals

Use case 1: Passing arguments to function calls II

S3 methods regularly use the ... parameter as well. More on this in


the chapter on object-oriented programming.
One possibility of examining the contents of ... is passing it to a
list():
fun = function(...) {
return(list(...))
}
str(fun(2, 4, name = "a name"))
## List of 3
## $ : num 2
## $ : num 4
## $ name: chr "a name"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 32 / 92


First component: Formals

Use case 2: An arbitrary number of arguments

Functions like sum() use ... to process an arbitrary number of named or


unnamed parameters.
sum(1, 3, y = 6, z = 8)

## [1] 18

We can replicate this concept (though in a rather cumbersome way):


mySum = function(...) {
args = unname(unlist(list(...)))
while (length(args) > 1)
args = c(args[1] + args[2], args[-(1:2)])
return(args)
}
mySum(1, 3, y = 6, z = 8)

## [1] 18

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 33 / 92


First component: Formals

Caution!
The ... argument is a powerful tool – but it also circumvents some safety
mechanisms. For example, typos are simply collected by the ... argument
instead of resulting in an error:
sum(1, 2, NA, na.mr = TRUE)

## [1] NA

Therefore: Use with care. Sometimes it’s safer to use a list with additional
arguments and do.call():
absoluteMean2 = function(x, additional.args = list()) {
args = list(x = abs(x))
abs.mean = do.call("mean", c(args, additional.args))
return(abs.mean)
}
absoluteMean2(c(1, -1, 2, NA), additional.args = list(na.rm = TRUE))

## [1] 1.333333

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 34 / 92


Second component: Body

Second component: Body

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 35 / 92


Second component: Body

The function body

Type: language. Can be set and extracted via body().


Doesn’t contain any comments (contrary to the srcref attribute).
Otherwise: Leave it be.

body(square)

## {
## return(x^2)
## }

typeof(body(square))

## [1] "language"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 36 / 92


Third component: Environment

Third component: Environment

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 37 / 92


Third component: Environment

The function environment

Type environment as presented in Chapter 2.3.


Can be set and extracted using environment().
Is set to the environment the function was created in.

environment(square)

## <environment: R_GlobalEnv>

typeof(environment(square))

## [1] "environment"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 38 / 92


Third component: Environment

The different environments of a function

A function is fixed to one environment, yet it comes into contact with


many different environments. They can be categorized into four classes:
1 The enclosing environment: Every function has exactly one
environment as one of its components. It can be addressed via
environment(functionname).
2 The execution environment: Whenever a function is executed, a
new execution environment is created. It’s the current environment
during the execution.
3 Binding environments: Every environment in which the function is
bound to a variable.
4 Calling environments: Every environment from which the function is
called from.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 39 / 92


Third component: Environment

The execution environment


Since a function usually contains assignments and each assignment
takes place inside an environment, an execution environment exists.
The execution environment is created anew upon each function call
and is therefore empty at the beginning of the execution.
Its parent is always the enclosing environment.

f = function() { f()
x = 5
print(environment()) ## <environment: 0x0000000007996900>
print(parent.env(environment())) ## <environment: R_GlobalEnv>
return(x) ## [1] 5
}
environment(f) f()

## <environment: R_GlobalEnv> ## <environment: 0x00000000078d4648>


## <environment: R_GlobalEnv>
## [1] 5

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 40 / 92


Third component: Environment

A new beginning
Every time a function is called anew, it starts from its beginning. This
might seem obvious to us, but it doesn’t always comply with the expected
behavior:

counter = function() { counter()


ex.count = exists("count",
inherits = FALSE) ## Defining count
if (!ex.count) { ## [1] 1
message("Defining count")
count = 1 counter()
} else {
count = count + 1 ## Defining count
} ## [1] 1
return(count)
}

Thus, a function can’t remember anything (at least not in this way). It’s
memoryless so to speak.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 41 / 92
Third component: Environment

The enclosing environment I

As soon as a function is created, it’s provided with an enclosing


environment, the current environment:
f = function()
return(x)
f
## function()
## return(x)
environment()
## <environment: R_GlobalEnv>
environment(f)
## <environment: R_GlobalEnv>

If the global environment acts as the enclosing environment, then it’s


not displayed upon printing the function.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 42 / 92


Third component: Environment

The enclosing environment II

The main purpose is to integrate the function into the existing search
tree, i.e. to allow for scoping.
x = 2
f()
## [1] 2
x = x + 1
f()
## [1] 3

Since x is unknown inside the function, it can thus be found in the


enclosing environment.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 43 / 92


Third component: Environment

Lexical scoping – here we go again

Just as before, R has to perform scoping inside of functions as well and


once more, it uses the concept of lexical scoping with its familiar set of
simple rules.

The Lexical scoping rule


1 Look for the variable inside of the current environment.
2 If it is not there, continue looking for it in the parent environment.
3 If there is no parent environment, then the variable does not exist.

And that’s really all there is to say about this – though, we’ll see in the
excercises that the actual reality might prove more complicated than that
at times.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 44 / 92


Third component: Environment

Using the enclosing environment

Searching functions. Inside of a function, we frequently use other


functions. Usually, they are defined somewhere in the search path
(often times in the base package). We can access these functions
through the enclosing environment. Accessing the base package is
particularly of utmost importance.
Using global variables. Sometimes we only need π and that’s okay.
But sometimes, we forget to define a variable inside the function and
accidentally use a global variable with an equal name. This is a
common error source.
Modifying global variables. We can also use assign() to modify
variables inside the enclosing environment. This should be avoided,
also see side effects in the chapter on return values.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 45 / 92


Third component: Environment

The memory of a function

Sometimes, it’s sensible to provide a newly created function with an


individual environment. The function can then use this environment to
remember things:

g = function() { g()
encl.env = parent.env(environment())
new = !exists("count", encl.env) ## Defining count
if (new) { ## [1] 1
message("Defining count")
assign("count", 1, envir = encl.env) g()
} else {
count <<- count + 1 ## [1] 2
}
count count
}
environment(g) = new.env() ## Error in eval(expr, envir, enclos):
object ’count’ not found

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 46 / 92


Third component: Environment

The deep assignment <<-

Aside from the regular assignment via <- and =, there also is a deep
assignment in R. It’s defined in the following way:
Definition: x <<- val
1 Does the current environment have a parent environment?

No: Perform the assignment x <- val in the global environment.


Yes: Enter the parent environment of the current environment.
2 Look for a variable named x in the current environment:
x exists: Override the respective value with val.
x doesn’t exist: Go back to step 1.

The deep assignment (almost) never performs an assignment in the active


environment. Thus, it should be avoided because it can create unwanted
variables in the global environment. It should rather be used to perform
assignments in the enclosing environment of a function.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 47 / 92


Third component: Environment

Binding environments

Through assignments, functions are bound to names – just like


variables.
The binding environment doesn’t have any influence over the
enclosing environment and thus no influence on scoping either.
The same holds true for the calling environment: It can’t influence
scoping.

f = function() x = 10
return(x) f()
## [1] 10
f2 = f
f2()
environment(f2) = new.env()
environment(f2)$x = 2 ## [1] 2
environment(f2)
## <environment: 0x0000000006bedd18>

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 48 / 92


Third component: Environment

Scoping in the calling environment

But R wouldn’t be R, if it didn’t also have a way of accessing the calling


environment after all. That’s what the slightly unfortunately named
function parent.frame() is for.

f = function() { f()
cal.env = parent.frame()
print(cal.env) ## <environment: R_GlobalEnv>
x = get("x", envir = cal.env) ## [1] 10
return(x)
} environment(f)
x = 10
environment(f) = new.env() ## <environment: 0x00000000072612d0>
environment(f)$x = 2

Scoping that searches for variables in the calling environment is also called
dynamic scoping. In R, it can only be performed by explicitly using
parent.frame().
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 49 / 92
Third component: Environment

Flashback: Defaults versus calling parameter

Recalling the following example:


f = function(x = ls()) {
a = 1
return(x)
}
f()

## [1] "a" "x"

f(x = ls())

## [1] "absoluteMean" "absoluteMean2" "counter" "f"


## [5] "f2" "fun" "g" "groceries"
## [9] "li" "mySum" "square" "x"

We’re now able to explain it: Default values are evaluated in the execution
environment whereas calling parameters are evaluated in the calling
environment. ls() then returns differing contents.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 50 / 92
Component 4: Byte code

Component 4: Byte code

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 51 / 92


Component 4: Byte code

Compiled R code

Recalling our first lecture:


Interpreted code is executed by the interpreter at runtime.
Compiled code is translated to machine language in advance.
R is an interpreted language. That’s great, because interpreted
programming languages offer many advantages. In particular, they are more
interactive.

However: Compiled program code is usually more efficient. Wouldn’t it be


great, if we could compile our slow R code as well? Well, we can! That’s
what the fourth and optional component of a closure - byte code - is for.

Caution: Compiled R code is not to be confused with appending compiled


C code. Our R code is translated by the R compiler into machine language.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 52 / 92


Component 4: Byte code

Compiling an R function
Most closures from the base package are already compiled and as
such contain byte code:
mean
## function (x, ...)
## UseMethod("mean")
## <bytecode: 0x0000000008898858>
## <environment: namespace:base>

To compile our own functions, we can use cmpfun() from the


compiler package.
library(compiler)
squareCMP = cmpfun(square)
squareCMP
## function(x) {
## # Simply return x^2
## return(x^2)
## }
## <bytecode: 0x0000000008fe0878>
## attr(,"name")
## [1] "Squarefunction"
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 53 / 92
Component 4: Byte code

Efficiency through compiling I

Sometimes, compiling makes a function significantly faster. However,


usually vectorized R functions are just as efficient or even more.
library(microbenchmark)
f = function(x) {
res = 0
for (i in 1:x) {
res = res + i^2
}
res
}
fCMP = cmpfun(f)
print(microbenchmark(f(1000), fCMP(1000), sum((1:1000)^2)), signif = 2)

## Unit: microseconds
## expr min lq mean median uq max neval
## f(1000) 30.0 31.0 57.0 32.0 34 2400 100
## fCMP(1000) 30.0 31.0 33.0 32.0 34 60 100
## sum((1:1000)^2) 3.2 3.9 5.2 4.5 6 18 100

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 54 / 92


Component 4: Byte code

Efficiency through compiling II

Wait a minute! The compiled version isn’t that much faster than the
original version. So, compiling it doesn’t help at all?

Changelog R 3.4.0
The JIT (‘Just In Time’) byte-code compiler is now enabled by default at
its level 3.

Help page for enableJIT()


enableJIT enables or disables just-in-time (JIT) compilation. JIT is disabled
if the argument is 0. If level is 1 then larger closures are compiled before
their first use. If level is 2, then some small closures are also compiled
before their second use. If level is 3 then in addition all top level loops are
compiled before they are executed.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 55 / 92


Component 4: Byte code

Efficiency through compiling III

enableJIT(level = 0)

## [1] 3

f = function(x) {
res = 0
for (i in 1:x) {
res = res + i^2
}
res
}
fCMP = cmpfun(f)
print(microbenchmark(f(1000), fCMP(1000), sum((1:1000)^2)), signif = 2)

## Unit: microseconds
## expr min lq mean median uq max neval
## f(1000) 410.0 430.0 510.0 450.0 570.0 1600 100
## fCMP(1000) 29.0 30.0 33.0 31.0 32.0 130 100
## sum((1:1000)^2) 3.2 4.3 6.9 6.1 7.6 29 100

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 56 / 92


The return value

The return value

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 57 / 92


The return value

Regular return values I

A function maps a set of inputs onto an output parameter. Since we’ve


already talked about the input, now it’s time to talk about the output.
Paradigm 6
A function in R always returns exactly one value.

An R function can’t return two values. This is not a limitation, however,


since the output can also be a list:
f = function(x, y) {
list(x = x, y = y)
}
str(f(1, 2))

## List of 2
## $ x: num 1
## $ y: num 2

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 58 / 92


The return value

Regular return values II

A function’s return value can be defined in two ways:


1 By calling the function return(). Upon calling return(), the
function’s execution is terminated.
f = function(x) {
return(x)
x + 2
}
f(0)
## [1] 0

2 If no return() is to be found, then the result of the lastly evaluated R


statement acts as the return value.
f = function(x) {
x + 2
}
f(0)
## [1] 2

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 59 / 92


The return value

Regular return values III

The return value is NOT definied by print() or the like.


f = function(x) {
print(x)
}
f(1)

## [1] 1

The result does seem correct (because print(x) is usually evaluated as x


and thus actually returns x), but calling print() is redundant here. It
creates the impression that it actually influences anything ...

Caution: print(x) is not always evaluated as x, the code above can


sometimes be wrong indeed.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 60 / 92


The return value

Invisible output

Some functions don’t seem to return anything. However, this is not


possible since every function has a return value in R. This kind of behavior
can be achieved using the function invisible():
f = function(x) {
return(invisible(x))
}
f(1)
y = f(1)
y

## [1] 1

invisible() causes that a function’s return value is not printed to the


console. However, it can still be assigned to a variable or be explicitly
processed for further use.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 61 / 92


The return value

What if something goes wrong? I

When executing functions, things can go wrong, e.g. when the input
parameter has the wrong type. In these situations, the user should be
notified and in extreme cases the execution of the function should be
aborted with an error. R offers three types of notifications:
1 message(): In case that nothing severe occurs, but the user still
should be notified.
2 warning(): A potentially harmful situation where the execution is still
possible but probably not in the way intended by the user.
3 stop(): Everything is lost. The function’s execution is stopped and
the user is presented with an error message (in red!)

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 62 / 92


The return value

What if something goes wrong? II

Warnings can be handled differently in R according to the definition of


option(warn = value):

value Behavior
negative Warnings are ignored
0 Warnings are collected, printing via warnings
1 Warnings are printed directly to the console
>1 Warnings are treated like errors

Important: Self-implemented warnings and error messages should be


meaningful. The message: ’There’s something wrong here!’ isn’t very
helpful. It should be clear what went wrong.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 63 / 92


The return value

What if something goes wrong? III


Under no circumstances should error messages be generated through
return()
f = function(x) {
if (x == 0)
return("This is an error message")
else
return(x)
}

This way, the return type depends on the input without the user having any
influence over it. If the output is supposed to be processed further, new
errors will occur.

The popular practice of using specific numbers to code errors (e.g: return
value 99 = error) should also be avoided in R.

The correct way is using stop() to stop the execution with an error.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 64 / 92
The return value

Side effects

Most functions in R map a set of inputs onto an output. They are also
called pure functions. Contrary to those, there are also functions that have
other side effects aside from returning a value.
Definition: Side effects
Pure functions only compute a return value. Other impacts of a function,
e.g. on the R session status or on the data system, are called side effects.
Examples of a side effect are the generation of a variable in the global
environment or the generation of a graphic or a file.

A simple exemplary function with a side effect can be obtained by using <<-
fun = function(x, value) x «- value

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 65 / 92


The return value

Famous functions with side effects


print(): Printing an object to the console.
plot(): Creating a plot.
library(): Loading a package.
save(), load(): Loading and saving objects.
source(): Sourcing an .R file.
help(): Displaying a help page.
And many more...

All of these functions have one thing in common: They are called explicitly
for their side effect. Their actual return value is usually disregarded. Yet,
they always have a return value.
x = library(gtools)
x

## [1] "gtools" "microbenchmark" "compiler" "knitr"


## [5] "stats" "graphics" "grDevices" "utils"
## [9] "datasets" "methods" "base"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 66 / 92


The return value

Preventing unintended side effects

We have seen: There are many desirable side effects.


However, side effects occur frequently without the function’s user
noticing.

f = function(y) { f(x)
x <<- y + 1 ## [1] 2
return(x)
} x
## [1] 2
x = 1

Such side effects can lead to errors that are hard to find.

Rule
A function that is not explicitly called for its side effect, should not have a
side effect.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 67 / 92


The return value

Side effects: Example of how it should not be

nastyFunction = function() {
all.var.names = ls(pos = 1)
for (name in all.var.names) {
value = get(name, pos = 1)
if (is.numeric(value)) {
assign(name, pos = 1, value = value + 1)
}
}
return("gnahahahaha")
}
nastyFunction()

## [1] "gnahahahaha"

## [1] 3

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 68 / 92


The return value

Cleaning up I

In some cases, it’s not possible to prevent side effects. For example, a
function might require the change of global graphic parameters:
par()$mfrow

## [1] 1 1

myPlot = function(){
par(mfrow = c(2, 1))
# Do some nice plotting
return(invisible(NULL))
}
myPlot()
par()$mfrow

## [1] 2 1

From the outside, the global options appear to have been magically
changed - probably even without us intending to do so.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 69 / 92
The return value

Cleaning up II

Responsible programmers clean up after themselves and restore the global


state. To do this, R offers the function on.exit().

on.exit()
Calling on.exit() inside a function defines program code that is executed
upon exiting the function. It doesn’t matter if the function terminates
successfully (e.g. via return()) or with an error. Code defined through
on.exit() is always executed.

Nice to know: on.exit() works based on a side effect itself.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 70 / 92


The return value

Cleaning up III

So, if we want to change graphic parameters for example, we should


remember the old state and restore it via on.exit():
par()$mfrow

## [1] 1 1

myPlot = function(){
old.pars = par(no.readonly = TRUE)
on.exit(do.call(par, old.pars))
par(mfrow = c(2, 1))
# Do some nice plotting
}
myPlot()
par()$mfrow

## [1] 1 1

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 71 / 92


The return value

Good practices

The return value of a function should always have the same type and
structure.
The return value of a function is usually specified by return(). This
should be used in particular to directly terminate the function in
simple situations.
You’re welcome to replace return(value) at the end of a function by
a simple value.
Side effects should be avoided. Examples of acceptable side effects:
Loading packages, objects, ...
Beware: Existing variables can be overridden by this
Storing objects in data files.
Printing output to the console (don’t overdo it!).
Generating or displaying a graphic.
Functions ’without output’ should return invisible(NULL).

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 72 / 92


Function calls

Function calls

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 73 / 92


Function calls

Prefix functions I

We have now dealt with pretty much every aspect of functions:

How to define them Execution environments and


Argument matching scoping rules
Default parameters and lazy The return value
evaluation Side effects

The only thing that remains is: How do you call a function? Usually, this is
done using round brackets:
Definition: Prefix function
A function that is called according to the pattern:
’functionname’(name1 = arg1, name2 = arg2, ..)
is called a prefix function.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 74 / 92


Function calls

Prefix functions II

This comes as no surprise, since we have called functions like this in


every previous example.
Prefix =
ˆ the name of the function preceeds the arguments.
Every R function can be called using prefix notation.
Most of our functions are always called as a prefix.
Some functions allow for an alternative calling method while still
allowing a prefix call.
In fact, the prefix notation is not even the most common way to call
functions in R. Instead, so-called infix functions where the function
name comes between its arguments are used way more frequently:

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 75 / 92


Function calls

Infix function I

Definition: Infix functions


A function that is called according to the pattern:
arg1 ’functionname’ arg2
is called an infix function.

Infix functions are also called infix operators.


The most important infix functions are arithmetic and logical
operations as well as assignments from the base package.
There is a function call behind all these operators in R.
There exists an equivalent prefix call for every infix operator.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 76 / 92


Function calls

Infix function II

1 + 2 `+`(1, 2)

## [1] 3 ## [1] 3

x = 1:2 `=`(x, `:`(1, 2))


x[1] `[`(x, 1)

## [1] 1 ## [1] 1

5 %in% 1:10 `%in%`(5, `:`(1, 10))

## [1] TRUE ## [1] TRUE

Beware: The names of all infix functions contain special characters.


Therefore, backticks need to be used for their prefix call.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 77 / 92


Function calls

Creating your own infix functions


We can create our own infix functions – there are just two simple rules that
need to be followed: The name of an infix function always begins and ends
with a % and it always has exactly two formal parameters.
As such, we can for example easily define a function to ’add’ two
characters. Caution: Since % is a special character, we’ll always need to
use backticks here as well.
`%+%` = function(a, b) {
return(paste(a, b))
}
"new" %+% "string"

## [1] "new string"

This call is still equivalent to:


`%+%`("new", "string")

## [1] "new string"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 78 / 92


Function calls

Unary operators
In addition to these binary infix operators, there also are a handful of unary
operators in R:

x = -5 x = TRUE
+x !x

## [1] -5 ## [1] FALSE

-x (2 + 2)

## [1] 5 ## [1] 4

Here, the argument always succeeds the operator.


The respective prefix calls are ‘+‘(), ‘-‘(), ‘!‘(), ‘(‘().
Functions like these can’t be created by users.
Beware: Round brackets can fulfill two purposes: They can be used
for a prefix call, but also simply for mathematical expressions.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 79 / 92
Function calls

Infix functions: Evaluation order


Infix functions are always evaluated from left to right. Round brackets may
alter the order of evaluation:
`%m%` = function(a, b) a - b

10 %m% 5 %m% 3

## [1] 2

(10 %m% 5) %m% 3

## [1] 2

10 %m% (5 %m% 3)

## [1] 8

This also concerns self-created infix functions! The order of evaluation for
operators defined in base R is more complicated. Here, R also has to
consider the order of arithmetic operations for example.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 80 / 92
Function calls

Evaluation order of standard operators

Operators from a higher class are always evaluated first. Inside of an class,
evaluation usually occurs from left to right.

1. :: ::: Access (**) 9. < > <= Comparisons (**)


2. $@ Extract >= == !=
3. ˆ, ** Exponentiation (*) 10. ! Negation
4. -+ Sign (unary) 11. & && And
5. : Sequence 12. | || Or
6. %any% Remainder, 13. ∼ For formulas
esp. matrix mult. 14. -> ->> Assignment
7. */ Mult., Div. 15. <- <<- Assignment (*)
8. +- Add., Sub. (binary) 16. = Assignment (*)

(*): Here, evaluation occurs from right to left


(**): Here, nested expressions like 1 + 2 + 3 are not allowed
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 81 / 92
Function calls

Evaluation order: Specifics and common mistakes

Nested assignments: Comparisons can’t be nested:


x = y = z = 2
Nested colon operator: 1 == 1 == 1
1:2:3 ## Error: <text>:1:8: unexpected
’==’
## Warning in 1:2:3: numerical ## 1: 1 == 1 ==
expression has 2 elements: only ## ^
the first used
## [1] 1 2 3
Colon operator comes first:
1:2 * 2 4 < 5 < 6
## [1] 2 4 ## Error: <text>:1:7: unexpected
’<’
1:(2 * 2) ## 1: 4 < 5 <
## [1] 1 2 3 4 ## ^

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 82 / 92


Function calls

Brackets

The four bracket types ‘(‘ ‘[‘ ‘[[‘ ‘{‘ are also just functions in R.
However, their behavior is different from every other function: They
require a further character, a closing bracket.
Every pair of brackets encloses (‘{‘ even multiple) R statements. It’s
evaluated before the brackets themselves.
‘[‘ ‘[[‘ have more than two arguments (matrix subsetting).
This atypical bevavior is possible, because they aren’t closures:

typeof(`(`) typeof(`[[`)
## [1] "builtin" ## [1] "special"

typeof(`[`) typeof(`{`)
## [1] "special" ## [1] "special"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 83 / 92


Function calls

Replacement functions I

In some assignments, there is a function call on the left side – e.g. when
placing the names attribute of an object.
y = x = 1:2
names(x) = c("a", "b")
x

## a b
## 1 2

We’ve all seen this kind of assignment before and have come to accept it.
But at this point, we want to ask ourselves: How is it possible to assign a
value to the result of a function call? This appears to contradict everything
we have learned so far in this chapter.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 84 / 92


Function calls

Replacement functions II

Just like infix functions, these so-called replacement functions are just
another notation for a special kind of prefix functions. Alternatively, we can
also use the prefix call:
y = `names<-`(y, c("a", "b"))
y

## a b
## 1 2

Replacement functions are characterized by having their name end on a <-.


They usually have two arguments, but they can also have more.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 85 / 92


Function calls

Replacement functions III

Definition: Replacement functions


A function that is defined according to the pattern:

’functionname’<- = function(x, ..., value)

and that returns a modified x is called a replacement function. The call

’functionname’(x, ...) = value

is equivalent to:

x = ‘functionname<-‘(x, ..., value)

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 86 / 92


Function calls

Replacement functions: Examples

x = 1 y = 1
names(x) = "a" y = `names<-`(y, "a")
x y

## a ## a
## 1 ## 1

x[2] = 2 y = `[<-`(y, 2, 2)
x y

## a ## a
## 1 2 ## 1 2

We can also define our own replacement functions:


`modify<-` <- function(x, position, value) {
x[position] = value
return(x)
}

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 87 / 92


Function calls

Further special functions

Key words like for, if, while and even function are just functions
themselves.
typeof(`for`)

## [1] "special"

typeof(`function`)

## [1] "special"

Their call behavior differs wildly from other functions – they are neither
prefix nor infix nor replacement. This is possible because their type is
special or builtin and thus, they don’t have to adhere to the behavior
of closures. Fortunately, we can’t define functions with such a call
behavior ourselves.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 88 / 92


Function calls

Prefix calls of specific functions


The following calls of key words defined in base R are equivalent. Since
they are functions, they also have a return value.

val = for (i in 1:2) print(i) val = `for`(i, 1:2, print(i))

## [1] 1 ## [1] 1
## [1] 2 ## [1] 2

val val

## NULL ## NULL

val = if (i == 1) print(1) else print(2) val = `if`(i == 1, print(1), print(2))

## [1] 2 ## [1] 2

val val

## [1] 2 ## [1] 2

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 89 / 92


Function calls

It’s all about functions

When even a for loop in R is internally a function call, is there actually


something that’s not based on a function call?
Paradigm 7
Everything happening in R is the effect of a function call.

And everything really means everything, even if we might not expect it in


many cases. E.g:
Arithmetic operations: e.g. ‘+‘,
Conditionals and loops: e.g. ‘if‘, ‘for‘,
Text output on the console: print(), cat(),
Quitting R: q().

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 90 / 92


Summary

Summary

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 91 / 92


Summary

Summary

Functions behave just like any other object and can be treated as such.
Aside from the internal types builtin and special, we usually
encounter closures. They consist of a list of arguments, a body and
an environment.
Argument matching matches calling parameters to the list of formal
parameters.
The enclosing environment integrates the function into the lexical
scoping used by R.
Functions can be compiled for the purpose of efficiency.
A function in R always has exactly one return value.
Functions are not just an object in R; they play a key role as
everything in R results from function calls.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 92 / 92

You might also like