You are on page 1of 8

Unit 2: FUNCTIONS

The function Command

To define a function, use the function command and assign the results to an
object name. Once you’ve done this, you can call the function using that
object name just like any other built-in or contributed function in the
workspace.

Function Creation

A function definition always follows this standard format:

functionname <- function(arg1,arg2,arg3,...){


do any code in here when called
return (return object)
}

The function name placeholder can be any valid R object name, which is
what you’ll ultimately use to call the function. Assign to this function name
a call to function, followed by parentheses with any arguments you want
the function to have. The pseudocode includes three argument place-
holders plus an ellipsis. Of course, the number of arguments, their tags, and
whether to include an ellipsis all depend on the particular function you’re
defining. If the function does not require any arguments, simply include
empty parentheses: (). If you do include arguments in this definition, note
that they are not objects in the workspace and they do not have any type or
class attributes associated with them—they are merely a declaration of
argument names that will be required by functionname.

When the function is called, it runs the code in the braced area (also called
the function body or body code). It can include if statements, loops, and
even other function calls. In the braced area, you can use arg1, arg2, and
arg3, and they are treated as objects in the function’s lexical environment.
Depending on how those declared arguments are used in the body code,
each argument may require a certain data type and object structure. If you’re
writing functions that you intend for others to use, it’s important to have
sound documentation to say what the function expects.
Often, the function body will include one or more calls to the return
command. When R encounters a return statement during execution, the
function exits, returning control to the user at the command prompt. This
mechanism is what allows you to pass results from operations in the
function back to the user. This output is denoted in the pseudocode by
returnobject, which is typically assigned an object created or calculated
earlier in the func tion body. If there is no return statement, the function will
simply return the object created by the last executed expression.

myfib <- function(){


fib.a <- 1
fib.b <- 1
cat(fib.a,", ",fib.b,", ",sep="")
repeat{
temp <- fib.a+fib.b

fib.a <- fib.b


fib.b <- temp
cat(fib.b,", ",sep="")
if(fib.b>150){
cat("BREAK NOW...")
break
}
}
}

I’ve named the function myfib, and it doesn’t use or require any arguments.
Before you can call myfib from the console, you have to send the function
definition there. Highlight the code in the editor and press CTRL-R or
RETURN.
R> myfib <- function(){
fib.a <- 1
fib.b <- 1
cat(fib.a,", ",fib.b,", ",sep="")
repeat{
temp <- fib.a+fib.b
fib.a <- fib.b
fib.b <- temp
cat(fib.b,", ",sep="")
if(fib.b>150){
cat("BREAK NOW...")
break
}
}
}

This imports the function into the workspace (if you enter ls() at the
command prompt, "myfib" will now appear in the list of present objects).
This step is required anytime you create or modify a function and want
to use it from the command prompt.
Now you can call the function from the console.

R> myfib()
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, BREAK NOW...

It computes and prints the Fibonacci sequence up to 250, just as


instructed.

Adding Arguments
Rather than printing a fixed set of terms, let’s add an argument to control
how many Fibonacci numbers are printed. Consider the following new
function, myfib2, with this modification:
myfib2 <- function(thresh){
fib.a <- 1
fib.b <- 1
cat(fib.a,", ",fib.b,", ",sep="")
repeat{
temp <- fib.a+fib.b
fib.a <- fib.b
fib.b <- temp
cat(fib.b,", ",sep="")
if(fib.b>thresh){
cat("BREAK NOW...")
break
}
}
}

This version now takes a single argument, thresh. In the body code, thresh
acts as a threshold determining when to end the repeat procedure, halt
printing, and complete the function—once a value of fib.b that is greater
than thresh is calculated, the repeat statement will exit after encountering
the call to break. Therefore, the output printed to the console will be the
Fibonacci sequence up to and including the first fib.b value bigger than
thresh. This means that thresh must be supplied as a single numeric value—
supplying a character string, for example, would make no sense.
After importing the definition of myfib2 into the console, note the same
results as given by the original myfib when you set thresh=150.

R> myfib2(thresh=150)
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, BREAK NOW...

But now you can print the sequence to any limit you want (this time
using positional matching to specify the argument):
R> myfib2(1000000)
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987,
1597, 2584,
4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393,
196418, 317811,
514229, 832040, 1346269, BREAK NOW...

Returning Results
If you want to use the results of a function in future operations (rather than
just printing output to the console), you need to return content to the user.
Continuing with the current example, here’s a Fibonacci function that stores
the sequence in a vector and returns it:

myfib3 <- function(thresh){


fibseq <- c(1,1)
counter <- 2
repeat{
fibseq <- c(fibseq,fibseq[counter-1]+fibseq[counter])
counter <- counter+1
if(fibseq[counter]>thresh){
break
}
}
return(fibseq)
}

First you create the vector fibseq and assign it the first two terms of the
sequence. This vector will ultimately become the returnobject. You also
create a counter initialized to 2 to keep track of the current position in
fibseq. Then the function enters a repeat statement, which overwrites fibseq
with c(fibseq,fibseq[counter-1]+fibseq[counter]). That expression
constructs a new fibseq by appending the sum of the most recent two terms
to the contents of what is already stored in fibseq. For example, with
counter starting at 2, the first run of this line will sum fibseq[1] and
fibseq[2], appending the result as a third entry onto the original fibseq.
Next, counter is incremented, and the condition is checked. If the most
recent value of fibseq[counter] is not greater than thresh, the loop repeats.
If it is greater, the loop breaks, and you reach the final line of myfib3.
Calling return ends the function and passes out the specified returnobject
(in this case, the final contents of fibseq).
After importing myfib3, consider the following code:

R> myfib3(150)
[1] 1 1 2 3 5 8 13 21 34 55 89144233
R> foo <- myfib3(10000)
R> foo
[1] 1 1 2 3 5 8 13 21 34 55 89 144
[13] 233 377 610 987 1597 2584 4181 6765 10946
R>bar <- foo[1:5]]
R>bar
[1] 1 1 2 3 5

Here, the first line calls myfib3 with thresh assigned 150. The output is still
printed to the screen, but this isn’t the result of the cat command as it was
earlier; it is the return object. You can assign this return object to a variable,
such as foo, and foo is now just another R object in the global environment
that you can manipulate.

Recursive Functions
Recursion is when a function calls itself. This technique isn’t commonly
used in statistical analyses, but it pays to be aware of it. This section will
briefly illustrate what it means for a function to call itself.
Suppose you want to write a function that takes a single positive integer
argument n and returns the corresponding nth term of the Fibonacci
sequence (where n = 1 and n = 2 correspond to the initial two terms 1 and
1, respectively). Earlier you built up the Fibonacci sequence in an iterative
fashion by using a loop. In a recursive function, instead of using a loop to
repeat an operation, the function calls itself multiple times. Consider the
following:

myfibrec <- function(n){


if(n==1||n==2){
return(1)
} else {
return(myfibrec(n-1)+myfibrec(n-2))
}
}

The recursive myfibrec checks a single if statement that defines a stopping


condition. If either 1 or 2 is supplied to the function (requesting the first or
second Fibonacci number), then myfibrec directly returns 1. Otherwise, the
function returns the sum of myfibrec(n-1) and myfibrec(n-2). That means
if you call myfibrec with n greater than 2, the function generates two more
calls to myfibrec, using n-1 and n-2. The recursion continues until it reaches
a call for the 1st or 2nd term, triggering the stopping condition,
if(n==1||n==2), which simply returns 1. Here’s a sample call that retrieves
the fifth Fibonacci number:

R> myfibrec(5)
[1] 5

Note that an accessible stopping rule is critical to any recursive function.


Without one, recursion will continue indefinitely. For example, the current
definition of myfibrec works as long as the user supplies a positive integer
for the argument n. But if n is negative, the stopping rule condition will
never be satisfied, and the function will recur indefinitely (though R has
some automated safeguards to help prevent this and should just return an
error message rather than getting stuck in an infinite loop).
Recursion is a powerful approach, especially when you don’t know ahead
of time how many times a function needs be called to complete a task. For
many sort and search algorithms, recursion provides the speediest and most
efficient solution.

You might also like