You are on page 1of 3

R: Determine Duplicate Elements


duplicated {base} R Documentation

Determine Duplicate Elements


duplicated() determines which elements of a vector or data frame are duplicates of elements
with smaller subscripts, and returns a logical vector indicating which elements (rows) are

anyDuplicated(.) is a generalized more efficient shortcut for any(duplicated(.)).


duplicated(x, incomparables = FALSE, ...)

## Default S3 method:
duplicated(x, incomparables = FALSE,
fromLast = FALSE, nmax = NA, ...)

## S3 method for class 'array'

duplicated(x, incomparables = FALSE, MARGIN = 1,
fromLast = FALSE, ...)

anyDuplicated(x, incomparables = FALSE, ...)

## Default S3 method:
anyDuplicated(x, incomparables = FALSE,
fromLast = FALSE, ...)
## S3 method for class 'array'
anyDuplicated(x, incomparables = FALSE,
MARGIN = 1, fromLast = FALSE, ...)


a vector or a data frame or an array or NULL.

a vector of values that cannot be compared. FALSE is a special value, meaning
that all values can be compared, and may be the only value accepted for methods
other than the default. It will be coerced internally to the same type as x.

logical indicating if duplication should be considered from the reverse side, i.e.,
the last (or rightmost) of identical elements would correspond to duplicated =

the maximum number of unique items expected (greater than one).

arguments for particular methods.

1 of 3 08/Sep/2017, 7:34 PM
R: Determine Duplicate Elements

the array margin to be held fixed: see apply, and note that MARGIN = 0 maybe


These are generic functions with methods for vectors (including lists), data frames and arrays
(including matrices).

For the default methods, and whenever there are equivalent method definitions for duplicated
and anyDuplicated, anyDuplicated(x, ...) is a generalized shortcut for any(duplicated(x,
...)), in the sense that it returns the index i of the first duplicated entry x[i] if there is one, and 0
otherwise. Their behaviours may be different when at least one of duplicated and
anyDuplicated has a relevant method.

duplicated(x, fromLast = TRUE) is equivalent to but faster than rev(duplicated(rev(x))).

The data frame method works by pasting together a character representation of the rows separated
by \r, so may be imperfect if the data frame has characters with embedded carriage returns or
columns which do not reliably map to characters.

The array method calculates for each element of the sub-array specified by MARGIN if the
remaining dimensions are identical to those for an earlier (or later, when fromLast = TRUE)
element (in row-major order). This would most commonly be used to find duplicated rows (the
default) or columns (with MARGIN = 2). Note that MARGIN = 0 returns an array of the same
dimensionality attributes as x.

Missing values ("NA") are regarded as equal, numeric and complex ones differing from NaN;
character strings will be compared in a common encoding; for details, see match (and unique)
which use the same concept.

Values in incomparables will never be marked as duplicated. This is intended to be used for a
fairly small set of values and will not be efficient for a very large set.

When used on a data frame with more than one column, or an array or matrix when comparing
dimensions of length greater than one, this tests for identity of character representations. This will
catch people who unwisely rely on exact equality of floating-point numbers!

Except for factors, logical and raw vectors the default nmax = NA is equivalent to nmax =
length(x). Since a hash table of size 8*nmax bytes is allocated, setting nmax suitably can save
large amounts of memory. For factors it is automatically set to the smaller of length(x) and the
number of levels plus one (for NA). If nmax is set too small there is liable to be an error: nmax = 1
is silently ignored.

Long vectors are supported for the default method of duplicated, but may only be usable if nmax
is supplied.


duplicated(): For a vector input, a logical vector of the same length as x. For a data frame, a

2 of 3 08/Sep/2017, 7:34 PM
R: Determine Duplicate Elements

logical vector with one element for each row. For a matrix or array, and when MARGIN = 0, a
logical array with the same dimensions and dimnames.

anyDuplicated(): an integer or real vector of length one with value the 1-based index of the first
duplicate if any, otherwise 0.


Using this for lists is potentially slow, especially if the elements are not atomic vectors (see
vector) or differ only in their attributes. In the worst case it is O(n^2).


Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth &

See Also



x <- c(9:20, 1:5, 3:7, 0:8)

## extract unique elements
(xu <- x[!duplicated(x)])
## similar, same elements but different order:
(xu2 <- x[!duplicated(x, fromLast = TRUE)])

## xu == unique(x) but unique(x) is more efficient

stopifnot(identical(xu, unique(x)),
identical(xu2, unique(x, fromLast = TRUE)))


duplicated(iris3, MARGIN = c(1, 3))

anyDuplicated(iris) ## 143

anyDuplicated(x, fromLast = TRUE)

[Package base version 3.4.1 Index]

3 of 3 08/Sep/2017, 7:34 PM