Professional Documents
Culture Documents
Welcome!
Preface
Getting started
o 1 Introduction
o 2 The whole game
o 3 System setup
o 4 Package structure and state
o 5 Fundamental development workflows
o 6 The package within
Package components
o 7 R code
o 8 Data
o 9 Other components
Package metadata
o 10 DESCRIPTION and NAMESPACE
o 11 Dependencies: What does your package need?
o 12 Licensing
Testing
o 13 Testing basics
o 14 Designing your test suite
o 15 Advanced testing techniques
Documentation
o 16 Function documentation
o 17 Vignettes
o 18 Other markdown files
o 19 Website
Maintenance and distribution
o 20 R CMD check
o 21 Continuous integration
o 22 Releasing to CRAN
o 23 Lifecycle
24 References
Table of contents
Introduction
roxygen2 basics
o The documentation workflow
o roxygen2 comments, blocks, and tags
Title, description, details
Arguments
Return value
Examples
Links and images
Re-using documentation
Report an issue
16 Function documentation
Second edition
You are reading the work-in-progress second edition of R Packages. This chapter is undergoing
heavy restructuring and may be confusing or incomplete.
16.1 Introduction
Documentation is one of the most important aspects of a good package: without it, users won’t
know how to use your package! Documentation is also useful for future-you (so you remember
what your functions were supposed to do) and for developers extending your package.
In this chapter, you’ll learn about function documentation, as accessed by ? or help(). Function
documentation works like a dictionary: it’s helpful if you want to know what a function does, but
it won’t help you find the right function for a new situation. That’s one of the jobs of vignettes,
which you’ll learn about in the next chapter. In this chapter we’ll focus on documenting
functions, but the same ideas apply to documenting datasets, classes and generics, and packages.
Base R provides a standard way of documenting the functions in a package: you write .Rd files in
the man/ directory. These files use a custom syntax, loosely based on LaTeX, that are rendered to
HTML, plain text, or pdf for viewing. We are not going to use these files directly. Instead, we’ll
use the roxygen2 package to generate them from specially formatted comments. There are a few
advantages to using roxygen2:
Code and documentation are intermingled so that when you modify your code, it’s easy
to remember to also update your documentation.
You can write using markdown, rather than having to remember another text formatting
syntax.
roxygen2 dynamically inspects the objects that it documents, so you can skip some
boilerplate that you’d otherwise need to write by hand.
It abstracts provides a number of tools for sharing text between documentation topics and
even between topics and vignettes.
You’ll see these files when you work with them in git, but you’ll otherwise rarely need to look at
them.
16.2 roxygen2 basics
To get started, we’ll work through the basic roxygen2 workflow and discuss the overall structure
of roxygen2 comments which are organised into blocks and tags.
The documentation workflow starts when you add roxygen comments, comments that start
with ', to your source file. Here’s a simple example:
\item{y}{A number.}
}
\value{
The sum of \code{x} and \code{y}.
}
\description{
Add together two numbers
}
\examples{
add(1, 1)
add(10, 1)
}
If you’re familiar with LaTeX, this should look familiar since the .Rd format is loosely based on
it, and you can read more about it in R extensions. You’ll commit this file to git (since it’s what
R requires to make its interactive documentation work), but typically you won’t otherwise need
to look at it.
To preview the development documentation, devtools uses some tricks to override the usual help
functions so they know where to look in your source packages. To activate these tricks, you need
to run devtools::load_all() once. So if development the documentation doesn’t appear, you may
need to load your package first.
Now that you understand the basic workflow, lets talk a little more about roxygen2’s syntax.
roxygen2 comments start with #' and the set of roxygen2 comments preceding a function is
called a block. Blocks are broken up by tags, which look like @tagName tagValue. The content of
a tag extends from the end of the tag name to the start of the next tag1. A block can contain text
before the first tag which is called the introduction. By default, each roxygen2 block will
generate a single documentation topic, corresponding to one .Rd. file in the man directory.
Throughout this chapter I’m going to show you roxygen2 comments from real tidyverse
packages, focusing on stringr since the functions there tend to be fairly straightforward leading to
documentation that easier to excerpt. Here’s a simple example to start with using the
documentation for stringr::str_unique():
Here the introduction includes the title (“Remove duplicated strings”) and a basic description of
what the function does. It’s followed by five tags, two @params, one @returns, one @seealso,
one @example, and one @export. Note that I’ve wrapped each line of the roxygen2 block 80
characters wide, to match the wrapping of my code, and I’ve indented the second and subsequent
lines of the long @param tag so it’s easier to scan. You can get more documentation style advice
in the tidyverse style guide.
The following sections will work through the most important tags. We’ll start with the
introduction which provides the title, description, and details, then we’ll cover the inputs (the
function arguments), outputs (the return value), and examples.
The title is taken from the first sentence. The title is shown in various function indexes so
is what the user will see when browsing functions.
The description is taken from the next paragraph. It comes first in the documentation and
should briefly describe the most important features of the function.
The details are taken from any additional text. Details are optional, but can be any length
so are useful if want to dig deep into some important aspect of the function.
The following sections describe each component in more detail, and then discuss a few useful
related tags.
16.3.1 Title
The title should be written in sentence case and not end in a full stop. When figuring out what to
use as a title, I think it’s most important to consider the functions in your package holistically.
When the user is skimming an index of functions, how will they know which function to use?
What do functions have in common which doesn’t need to be repeated in every title? What is
unique to that function and should be highlighted?
Here we’ve tried to succinctly describe what the function does, making sure to describe whether
it affects rows, columns, or groups. Where possible, we’ve tried to use synonyms of the function
name in the title to hopefully give folks another chance to understand the intent of the function.
At the time we wrote this, I don’t think the function titles for stringr were that successful. But
they provide a useful negative case study.
There’s a lot of repetition (“pattern”, “from a string”) and the verb used for the function name is
repeated in title, so if you don’t understand the function already, the title seems unlikely to help
much. (In hindsight, it also seems like the function names could have been better chosen.)
Hopefully we’ll have improved those titles by the time you read this.
16.3.2 Description
The goal of the description is to summarize the goal of the function, usually in under a
paragraph. This can be challenging to because the title of the function is also a very concise
summary of the function. And it’s often especially hard if you’ve just written the function
because the purpose seems so intuitively obvious it’s hard to understand why anyone would need
an explanation.
It’s okay for the description to be a little duplicative of the rest of the documentation; it’s often
useful for the reader to see the same thing expressed in two different ways. It’s a little extra work
keeping it all up to date, but the extra effort is often worth it.
If you want to use multiple paragraphs or other structures like a bulleted list, you can use the
explicit @description tag. Here’s an example from stringr::str_like(), which mimics
the LIKE operator from SQL:
#' Detect the a pattern in the same way as `SQL`'s `LIKE` operator.
#'
#' @description
#' `str_like()` follows the conventions of the SQL `LIKE` operator:
#'
#' * Must match the entire string.
#' * `_` matches a single character (like `.`).
#' * `%` matches any number of characters (like `.*`).
#' * `\%` and `\_` match literal `%` and `_`.
#' * The match is case insensitive by default.
You can also use explicit @title and @details tags if needed, but we don’t generally recommend it
because it adds extra noise to the docs without enabling any additional functionality.
16.3.3 Details
If you have a lot of information to convey in the details, I recommend using markdown headings
to break up the documentation in to sections. Here’s a example from dplyr::mutate(). We’ve elided
some of the details to keep this example short, but you should still get a sense of how we used
headings to break up the content in to skimmable chunks.
Note that even though these headings come immediately after the description they are shown
much later (after the function arguments and return value) in the rendered documentation.
In older code, you might also see the use of @section title: which was used to create headings
before roxygen2 fully supported RMarkdown. You can move these below the description and
turn into markdown headings.
16.4 Arguments
For most functions, the bulk of your work will go towards documenting how each argument
affects the output of the function. For this purpose, you’ll use @param (short for parameter, a
synonym of argument) followed by the argument name and a description of its action.
The most important job of the description is to provide a succinct summary of the allowed inputs
and what the parameter does. For example, here’s str_detect():
#' @param collapse String to insert between each piece. Defaults to `""`.
#' @param last Optional string use in place of final separator.
#' @param na.rm Remove missing values? If `FALSE` (the default), the result
#' will be `NA` if any element of `string` is `NA`.
Note that @param collapse and @param na.rm describes the default argument. This is good practice
because the function usage (which shows the default values) and the argument description are
quite far apart in the docs. The primary downside is that introducing this duplication means that
you’ll need to update the docs if you change the default value; we believe this small amount of
extra work is worth it to make the life of the user easier.
If an argument has a fixed set of possible parameters, you should list them. If they’re simple, you
can just list them in a sentence, like in str_trim():
If they need more explanation, you might use a bulleted list, as in str_wrap():
The documentation for most arguments tends to be relatively short, often one or two sentences.
But you should take as much space as you need, and you’ll see some examples of multi-
paragraph argument documentation shortly.
16.4.1 Multiple arguments
If the behavior of multiple arguments is tightly coupled, you can document them together by
separating the names with commas (with no spaces). For example, in str_equal() x and y are
interchangeable, so they’re documented together:
In str_sub() start and end define the range of characters to replace, and you can use just start if you
pass in a two-column matrix. So it makes sense to document them together:
#' @param start,end Two integer vectors. `start` gives the position
#' of the first character (defaults to first), `end` gives the position
#' of the last (defaults to last character). Alternatively, pass a two-column
#' matrix to `start`.
#'
#' Negative values count backwards from the last character.
In str_wrap() indent and exdent define the indentation for the first line and all subsequent lines
respectively:
#' @param indent,exdent A non-negative integer giving the indent for the
#' first line (`indent`) and all subsequent lines (`exdent`).
16.4.2 Inheriting arguments
If your package contains many closely related functions, it’s common for them to have
arguments with the same name and meaning. It would be annoying and error prone to copy and
paste the same @param documentation to every function so roxygen2
provides @inheritParams which allows you to inherit argument documentation from another
package.
@inheritParams only inherits docs for arguments that aren’t already documented, so you can
document some arguments and inherit others. str_match() uses this to inherit its
standard string argument but document its unusual pattern argument:
You can documentation from a function in another package but using the
standard :: notation: @inheritParams package::function. This can introduce one small annoyance:
now the documentation for your package is no longer self-contained and the version of the
remote package can affect the generated docs. This can introduce spurious diffs if you have
multiple contributors with different versions installed.
There’s another powerful tool to share not just argument docs but all doc across multiple
functions: @rdname. By default, each function gets its own documentation topic, but you can
combine the documentation for fun2 into fun1 by using @rdname fun1.
This technique is best used for functions that have not just similar arguments, but also similar
return value and related examples, as discussed next.
16.5 Return value
As important as a function’s inputs are its outputs. Documenting the outputs is the job of
the @returns2 tag. Here the goal of the docs is not to describe exactly how the values are
computed (which is the job of the description and details), but to roughly describe the overall
“shape” of the output, i.e. what sort of object it is, and its dimensions (if that makes sense). For
example, if your function returns a vector you might describe its type and length, or if your
function returns a data frame you might describe the names and types of the columns and the
expected number of rows.
The return documentation for functions in stringr are straightforward because almost all
functions return some type of vector with the same length as some input. For example,
here’s str_like():
In more complicated cases, it’s often easier to figure out what to describe by about the set of
functions and how they differ. For example, most dplyr functions return data frames, so just
saying @return A data frame is not useful. Instead we had to sit down and think about exactly what
makes the functions different. It took some thinking, but we eventually decided it makes sense to
describe each function in terms of how it affects the rows, the columns, the groups, and the
attributes. For example, here’s dplyr::filter():
#' @returns
#' An object of the same type as `.data`. The output has the following properties:
#'
#' * Rows are a subset of the input, but appear in the same order.
#' * Columns are not modified.
#' * The number of groups may be reduced (if `.preserve` is not `TRUE`).
#' * Data frame attributes are preserved.
@return is also a good place to describe any important warnings or errors that the user might see
here. For example readr::read_csv():
#' @returns A [tibble()]. If there are parsing problems, a warning will alert you.
#' You can retrieve the full details by calling [problems()] on your dataset.
Submitting to CRAN
For initial CRAN submission, all functions must document their a @return value. This is not
required for subsequent submission, but it’s good practice. There’s currently no way to enforce
this (we’re working on it) which is why you’ll notice some tidyverse functions lack
documentation of their outputs.
16.6 Examples
Describing how a function works is useful, but showing how it works is often even better. That’s
the purpose of the @examples tag, which uses executable R code to show what you can do with
the function. Unlike other parts of the documentation where we’ve focused mainly on what you
should write, here we’ll briefly give some advice as to what to put in the examples and then
focus on the mechanics. The mechanics are more complex here because examples are commonly
run in four different situations:
16.6.1 Contents
Use examples to show the basic operation of the function, and then to highlight any particularly
important properties. str_detect() starts by showing a few simple variations and then highlights are
property you might easily miss from reading the docs: as well as passing a vector of strings and
one pattern, you can also pass one string and vector of patterns.
#' @examples
#' fruit <- c("apple", "banana", "pear", "pineapple")
#' str_detect(fruit, "a")
#' str_detect(fruit, "^a")
#' str_detect(fruit, "a$")
#'
#' # Also vectorised over pattern
#' str_detect("aecfg", letters)
Try to stay focused on the most important features without getting into the weeds of every last
edge case: if you make the examples too long, it becomes hard for the user to find the key
application that they’re looking for. If you find yourself writing very long examples, it may be a
sign that you should write a vignette instead.
Try and keep the examples focused on the function that you’re documenting. If you can make the
point with a familiar built-in dataset, like iris, do that. If you find yourself needing to do a bunch
of setup to create a dataset or object to use the in example, it may be a sign that you need to
create a package dataset. See Chapter 8 for details.
16.6.2 Pack it in; pack it out
As much as possible, you want to keep your examples as self-contained as possible. You want to
“take nothing but photographs and leaving nothing but footprints”. For example, this means:
If you modify the global options(), make sure to reset them at the end of the example.
If you have to create a file, make sure to write somewhere in tempdir() and make sure to
delete it at the end of the example.
Don’t change the working directory.
Don’t write to the clipboard.
These points can be in tension with good documentation if you’re trying to demonstrate
functions that somehow change the state of the world. So if you’re finding it really hard to follow
these rules, it may be that you need to switch to to different form of documentation.
Submitting to CRAN
Many of these constraints are also mentioned in the CRAN repository policy, which you must
adhere to when submitting to CRAN. Use find in page to search for “malicious or anti-social” to
see the details.
Additionally, you want your examples to send the users on a short walk, not a long hike.
Examples need to execute relatively quickly so users can run them, so it doesn’t take ages to
make a website, your automated checks happen quickly, and it doesn’t take up computing
resources when submitting to CRAN.
Submitting to CRAN
If submitting to CRAN, all examples must run in under 10 minutes.
16.6.3 Errors
So what can you do if you want to include code that causes an error for the purposes of teaching.
There are two basic options:
You can wrap the code in try() so that the error is shown, but doesn’t stop execution of the
error.
You can wrap the code \dontrun{}3 so it is never run by example().
16.6.4 Dependencies
You can only use packages in examples that you packages depends on (i.e. appears in imports or
suggests). You do need to explicitly load these packages because example code is run in the
user’s environment, not the package environment.
For either imported or suggested packages we reccomend using ::, unless it’s really a lot of
duplication.
Previously, we recommended protecting any suggested packages inside an if block that used if
(requireNamespace("suggested_package", quietly = TRUE)). However, we no longer recommend that
because:
We expect that suggested packages are installed when running R CMD check (since they
are effectively required for developers).
The cost of wrapping code in {} is high, because you don’t see intermediate results. And
the cost for a package not being installed is low, because users generally know how to
recognise the package not loaded error and can resolve themselves.
As much as possible, avoid accessing websites in examples. If the website is down, your
example will fail and hence R CMD check will error.
16.6.5 Conditional execution
In other cases, you might want code to run only in specific scenarios. In the most common case,
you don’t want to run code on CRAN because you’re doing something that is usually best
avoided (see below) or your examples need other setup that CRAN won’t have. In this case you
can use @examplesIf instead of @examples. The code in an @examplesIf block will only be
executed if some condition is TRUE:
For example, googledrive uses @examplesIf in almost every function because the examples can
only work if you have an active, authenticated, connection to googledrive as judged
by googledrive::drive_has_token(). For example, here’s googledrive::drive_publish():
Submitting to CRAN
For initial CRAN submission of your package, all functions must contain some runnable
examples (i.e. there must be examples and they must not all be wrapped in \dontrun{}).
It’s also possible to show example code in the text with code blocks, either ```R if you just want
to show some code or ```{r} if you want the code to be run.
16.6.7 Organisation
tidyr::chop() + tidyr::unchop():
#' @examples
#' # Chop ==============================================================
#' df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1)
#' # Note that we get one row of output for each unique combination of
#' # non-chopped variables
#' df %>% chop(c(y, z))
#' # cf nest
#' df %>% nest(data = c(y, z))
#'
#' # Unchop ============================================================
#' df <- tibble(x = 1:4, y = list(integer(), 1L, 1:2, 1:3))
#' df %>% unchop(y)
#' df %>% unchop(y, keep_empty = TRUE)
#'
#' #' # Incompatible types -------------------------------------------------
#' # If the list-col contains types that can not be natively
#' df <- tibble(x = 1:2, y = list("1", 1:3))
#' try(df %>% unchop(y))
Useful tags
@seealso allows you to point to other useful resources, either on the web, in your
package [functioname()], or another package [pkg::function()].
If you have a family of related functions where every function should link to every other
function in the family, use @family. The value of @family should be plural.
When you start using links and images, you’ll also need to use a new documentation workflow,
as the workflow described above does not show images or links between topics or. If you’d like
to also see links, you can use this slower but more comprehensive workflow:
16.8 Re-using documentation
There is a tension between the DRY (don’t repeat yourself) principle of programming and the
need for documentation to be self-contained. It’s frustrating to have to navigate through multiple
help files in order to pull together all the pieces you need. roxygen2 provides a number of ways
to avoid you having to repeat yourself as a developer, while not forcing the user to follow a
spiderweb of links to find everything they need. Here we’ll focus on two:
There are two RMarkdown features supported by roxygen2 that you can use to share
documentation.
16.8.1 @inherits
16.8.2 Child documents
16.8.3 Inline R code