Professional Documents
Culture Documents
Edwin Olson
University of Michigan
ebolson@umich.edu
http://april.eecs.umich.edu
I. I NTRODUCTION
Visual fiducials are artificial landmarks designed to be
Fig. 1. Example detections. This paper describes a visual fiducial system
easy to recognize and distinguish from one another. Although based on 2D planar targets. The detector is robust to lighting variation and
related to other 2D barcode systems such as QR codes [1], occlusions and produces accurate localizations of the tags.
they have significantly goals and applications. With a QR
code, a human is typically involved in aligning the camera
with the tag and photographs it at fairly high resolution generate user interfaces that overlay robots’ plans and task
obtaining hundreds of bytes, such as a web address. In assignments onto a head-mounted display [6].
contrast, a visual fiducial has a small information payload Performance evaluation and benchmarking of robot sys-
(perhaps 12 bits), but is designed to be automatically detected tems have become central issues for the research community;
and localized even when it is at very low resolution, unevenly visual fiducials are particularly helpful in this domain. For
lit, oddly rotated, or tucked away in the corner of an example, fiducials can be used to generate ground-truth
otherwise cluttered image. Aiding their detection at long robot trajectories and close control loops [7]. Similarly,
ranges, visual fiducials are comprised of many fewer data artificial features can make it possible to evaluate Simulta-
cells: the alignment markers of a QR tag comprise about neous Localization and Mapping (SLAM) algorithms under
268 pixels (not including required headers or the payload), controlled algorithms [8]. Robotics applications have led to
whereas the visual fiducials described in this paper range the development of additional tag detection systems [9], [10].
from about 49 to 100 pixels, including the payload. Designing robust fiducials while minimizing the size
Unlike 2D barcode systems in which the position of the required is a challenging both from a marker detection
barcode in the image is unimportant, visual fiducial systems standpoint (which pixels in the image correspond to a tag?)
provide camera-relative position and orientation of a tag. and from a error-tolerant data coding standpoint (which tag
Fiducial systems also are designed to detect multiple markers is it?)
in a single image. In this paper, we describe a new visual fiducial system that
Visual fiducial systems are perhaps best known for their significantly improves performance over previous systems.
application to augmented reality, which spurred the devel- The central contributions of this paper are:
opment of several popular systems including ARToolkit [2] • We describe a method for robustly detecting visual fidu-
and ARTag [3]. Real-world objects can be augmented with cials. We propose a graph-based image segmentation
visual fiducials, allowing virtually-generated imagery to be algorithm based on local gradients that allows lines to be
super-imposed. Similarly, visual fiducials can be used for precisely estimated. We also describe a quad extraction
basic motion capture [4]. method that can handle significant occlusions.
Visual fiducial systems have been used to improve hu- • We demonstrate that our detection system provides
man/robot interaction, allowing humans to signal commands significantly better localization accuracy than previous
(such as “follow me” or “wait here”) by flashing an appro- systems.
priate card to a robot [5]. Planar tags have also been used to • We describe a new coding system that address problems
unique to 2D barcoding systems: robustness to rotation, finally Studierstube Tracker [12]. These versions introduced
and robustness to false positives arising from natural digitally-encoded payloads like those used in ARTag. Despite
imagery. As demonstrated by our experimental results, being later work, our experiments show that these encoding
our coding system provides significant theoretical and systems do not perform as well as that used by ARTag, which
real-world benefits over previous work. in turn is outperformed by our coding system.
• We specify and provide results on a set of benchmarks In addition to monochrome tags, other coding systems
which will allow better comparisons of fiducial systems have been developed. For example, color information has
in the future. been used to increase the amount of information that can be
In contrast to previous methods (including ARTag and encoded [13], [14]. Tags using retro-reflectors [15] have also
Studierstube Tracker), our implementation is released under been used. A particularly interesting approach is that used
an Open Source license, and its algorithms and implementa- by Bokode [16], which exploits the bokeh effect to detect
tion are well documented. The closed nature of these other extremely small tags by intentionally defocusing the camera.
systems was a challenge for our experimental evaluation.
For the comparisons in this paper, we have used the limited
publicly-available information to enable as many objective
comparisons as possible. On the other hand, ARToolkitPlus
is open source and so we were able to make a more detailed
comparison. In addition to code, we are also making our
evaluation code available in order to make it easier for future
authors to perform comparisons.
In the next section, we review related work. We describe
our method in the following two sections: the tag detector
in Section 3, and the coding system in Section 4. In Section
5, we provide an experimental evaluation of our methods,
comparing them to previous algorithms.
II. P REVIOUS W ORK
Fig. 2. Input image. This paper describe the processing of this sample
ARToolkit [11] was among the first tag tracking systems, image which contains two tags. This example is purposefully simple for
and was targeted at artificial reality applications. Like the explanatory reasons, though note that the tags are not rigidly planar. See
systems that would follow, its tags contained a square-shaped Fig. 1 for a more challenging result.
payload surrounded by a black border. It differed, however, in
that its payload was not directly encoded in binary: instead, it Besides two-dimensional barcodes, a number of other
used symbols such as the latin character ’A’. When decoding artificial landmarks have been developed. Overhead cameras
a tag, the payload of the tag (sampled at high resolution) was have been used to track robots equipped with blinking
correlated against a database of known tags, with the best- LEDs [17]. In contrast, the NorthStar system puts the vi-
correlating tag reported to the user. A major disadvantage sual fiducials on the ceiling [18]. Two-dimensional planar
of this approach is the computational cost associated with systems, like the one described in this paper, have two
decoding tags, since each template required a separate, main advantages over LED-based systems: the targets can be
slow correlation operation. A second disadvantage is that cheaply printed on a standard printer, and they provide 6DOF
it is difficult to generate templates that are approximately position estimation without the need for multiple LEDs.
orthogonal to each other.
The tag detection scheme used by ARToolkit is based on III. D ETECTOR
a simple binarization of the input image based on a user- Our system is composed of two major components: the tag
specified threshold. This scheme is very fast, but not robust detector and the coding system. In this section, we describe
to changes in illumination. In general, ARToolkit’s detections the detector whose job is to estimate the position of possible
can not handle even modest occlusions of the tag’s border. tags in an image. Loosely speaking, the detector attempts to
ARTag [3] provided improved detection and coding find four-sided regions (“quads”) that have a darker interior
schemes. Like our own approach, the detection mechanism than their exterior. The tags themselves have black and white
was based on the image gradient, making it robust to changes borders in order to facilitate this (see Fig. 2).
in lighting. While the details of the detector algorithm are not The detection process is comprised of several distinct
public, ARTag’s detection mechanism is able to detect tags phases, which are described in the following subsections and
whose border is partially occluded. ARTag also provided the illustrated using the example shown in Fig. 2.
first coding system based on forward error correction, which Note that the quad detector is designed to have a very
made tags easier to generate, faster to correlate, and provided low false negative rate, and consequently has a high false
greater orthogonality between tags. positive rate. We rely on the coding system (described in
The performance of ARTag inspired several improvements the next section) to reduce this false positive rate to useful
to ARToolkit, which evolved into ARToolkitPlus [2], and levels.
Fig. 3. Early processing steps. The tag detection algorithm begins by computing the gradient at every pixel, computing their magnitudes (first) and direction
(second). Using a graph-based method, pixels with similar gradient directions and magnitude are clustered into components (third). Using weighted least
squares, a line segment is then fit to the pixels in each component (fourth). The direction of the line segment is determined by the gradient direction, so
that segments are dark on the left, light on the right. The direction of the lines are visualized by short perpendicular “notches” at their midpoint; note that
these “notches” always point towards the lighter region.