You are on page 1of 740


Computers and


Computers and
edited by

Woodrow Barfield

Boca Raton London New York

CRC Press is an imprint of the
Taylor & Francis Group, an informa business

MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does
not warrant the accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB® software or related products does not constitute endorsement or sponsorship by The MathWorks
of a particular pedagogical approach or particular use of the MATLAB® software.

CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2016 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Version Date: 20150616
International Standard Book Number-13: 978-1-4822-4351-2 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access ( or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
and the CRC Press Web site at

Editor...................................................................................................................... xiii
Contributors.............................................................................................................. xv

Section I Introduction
Chapter 1 Wearable Computers and Augmented Reality: Musings
and Future Directions............................................................................3
Woodrow Barfield
Chapter 2 Wearable Computing: Meeting the Challenge.................................... 13
Thad Starner
Chapter 3 Intimacy and Extimacy: Ethics, Power, and Potential
of Wearable Technologies................................................................... 31
Patricia Flanagan, Despina Papadopoulos, and Georgina Voss

Section II The Technology
Chapter 4 Head-Mounted Display Technologies for Augmented Reality........... 59
Kiyoshi Kiyokawa
Chapter 5 Optics for Smart Glasses, Smart Eyewear, Augmented Reality,
and Virtual Reality Headsets.............................................................. 85
Bernard Kress
Chapter 6 Image-Based Geometric Registration for Zoomable Cameras
Using Precalibrated Information....................................................... 125
Takafumi Taketomi




Chapter 7 Visual Tracking for Augmented Reality in Natural Environments......151
Suya You and Ulrich Neumann
Chapter 8 Urban Visual Modeling and Tracking............................................... 173
Jonathan Ventura and Tobias Höllerer
Chapter 9 Scalable Augmented Reality on Mobile Devices: Applications,
Challenges, Methods, and Software.................................................. 195
Xin Yang and K.T. Tim Cheng
Chapter 10 Haptic Augmented Reality: Taxonomy, Research Status,
and Challenges.................................................................................. 227
Seokhee Jeon, Seungmoon Choi, and Matthias Harders

Section III  Augmented Reality
Chapter 11 Location-Based Mixed and Augmented Reality Storytelling........... 259
Ronald Azuma
Chapter 12 Dimensions of Spatial Sound and Interface Styles of Audio
Augmented Reality: Whereware, Wearware, and Everyware.......... 277
Michael Cohen
Chapter 13 Applications of Audio Augmented Reality: Wearware,
Everyware, Anyware, and Awareware..............................................309
Michael Cohen and Julián Villegas
Chapter 14 Recent Advances in Augmented Reality for Architecture,
Engineering, and Construction Applications.................................... 331
Amir H. Behzadan, Suyang Dong, and Vineet R. Kamat
Chapter 15 Augmented Reality Human–Robot Interfaces toward
Augmented Robotics......................................................................... 399
Maki Sugimoto
Chapter 16 Use of Mobile Augmented Reality for Cultural Heritage................. 411
John Krogstie and Anne-Cecilie Haugstvedt



Chapter 17 Applications of Augmented Reality for the Automotive Industry......433
Vincent Gay-Bellile, Steve Bourgeois, Dorra Larnaout,
and Mohamed Tamaazousti
Chapter 18 Visual Consistency in Augmented Reality Compositing.................. 457
Jan Fischer
Chapter 19 Applications of Augmented Reality in the Operating Room............ 485
Ziv Yaniv and Cristian A. Linte
Chapter 20 Augmented Reality for Image-Guided Surgery................................ 519
Marta Kersten-Oertel, Pierre Jannin, and D. Louis Collins

Section IV Wearable Computers and Wearable
Chapter 21 Soft Skin Simulation for Wearable Haptic Rendering................................. 551
Gabriel Cirio, Alvaro G. Perez, and Miguel A. Otaduy
Chapter 22 Design Challenges of Real Wearable Computers............................. 583
Attila Reiss and Oliver Amft
Chapter 23 E-Textiles in the Apparel Factory: Leveraging Cut-and-Sew
Technology toward the Next Generation of Smart Garments........... 619
Lucy E. Dunne, Cory Simon, and Guido Gioberto
Chapter 24 Garment Devices: Integrating Energy Storage into Textiles............. 639
Kristy Jost, Genevieve Dion, and Yury Gogotsi
Chapter 25 Collaboration with Wearable Computers.......................................... 661
Mark Billinghurst, Carolin Reichherzer, and Allaeddin Nassani
Author Index......................................................................................................... 681
Subject Index......................................................................................................... 707

In the early 1990s, I was a member of the coordinating committee that put together
the first conference on wearable computers, which, interestingly, was followed by
a highly publicized wearable computer fashion show. Speaking at the conference,
I recall making the following comment about wearable computers: “Are we wearing
them, or are they wearing us?” At the time, I was thinking that eventually advances
in prosthetics, sensors, and artificial intelligence would result in computational tools
that would have amazing consequences for humanity. Developments since then have
proven that vision correct. The first edition of Fundamentals of Wearable Computers
and Augmented Reality, published in 2001, helped set the stage for the coming
decade, in which an explosion in research and applications for wearable computers
and augmented reality occurred.
When the first edition was published, much of the research in augmented ­reality
and wearable computers was primarily proof-of-concept projects; there were
few, if any, commercial products on the market. There was no Google Glass® or
­handheld smartphones equipped with sensors and the computing power of a mid1980s s­upercomputer. And the apps for handheld smartphones that exist now
were n­ onexistent then. Fast forward to today: the commercial market for wearable
­computers and augmented reality is in the millions of dollars and heading toward the
billions. From a technology perspective, much of what is happening now with wearables and augmented reality would not have been possible even five years ago. So,
as an observation, Ray Kurzweil’s law of accelerating returns seems to be alive and
well with wearable computer and augmented reality technology, because 14 years
after the first edition of this book, the capabilities and applications of both technologies are orders of magnitude faster, smaller, and cheaper.
As another observation, the research and development of wearable computers
and augmented reality technology that was once dominated by U.S. universities
and research laboratories is truly international in scope today. In fact, the second
edition of Fundamentals of Wearable Computers and Augmented Reality ­contains
­contributions from researchers in the United States, Asia, and Europe. And if one participates in conferences in this field, they are as likely to be held these days in Europe
or Asia as they are in the United States. These are very positive ­developments and
will lead to even more amazing applications involving the use of wearable c­ omputers
and augmented reality technology in the future.
Just as the first edition of this book provided a comprehensive coverage of the
field, the second edition attempts to do the same, specifically by including chapters
from a broad range of topics w
­ ritten by outstanding researchers and teachers within
the field. All of the chapters are new, with an effort to again provide fundamental
knowledge on each topic so that a valuable technical resource is provided to the
community. Specifically, the second edition contains chapters on haptics, visual displays, the use of augmented reality for surgery and manufacturing, technical issues
of image registration and tracking, and augmenting the environment with wearable



audio interfaces. The second edition also contains chapters on the use of augmented
reality in preserving our cultural heritage, on human–computer interaction and
augmented reality technology, on augmented reality and robotics, and on what we
termed in the first edition as computational clothing. Still, even with this wide range
of applications, the main goal of the second edition is to provide the community with
fundamental information and basic knowledge about the design and use of wearable
computers and augmented reality with the goal to enhance people’s lives. I believe
the chapter authors accomplished that goal showing great expertise and breadth of
knowledge. My hope is that this second edition can also serve as a stimulus for
­developments in these amazing technologies in the coming decade.
Woodrow Barfield, PhD, JD, LLM
Chapel Hill, North Carolina
The images for augmented reality and wearable computers are essential for the
understanding of the material in this comprehensive text; therefore, all color images
submitted by the chapter authors are available at
MATLAB® is a registered trademark of The MathWorks, Inc. For product information, please contact:
The MathWorks, Inc.
3 Apple Hill Drive
Natick, MA 01760-2098 USA
Tel: 508-647-7000
Fax: 508-647-7001

I offer special thanks to the following chapter authors for providing images that
appear on the cover of the book: Kiyoshi Kiyokawa, an occlusion-capable optical
see-through head-mounted display; Miguel A. Otaduy, Gabriel Cirio, and Alvaro
G. Perez, simulation of a deformable hand with nonlinear skin mechanics; Vineet
R. Kamat, Amir H. Behzadan, and Suyang Dong, augmented reality ­visualization
of buried utilities during excavation; Marta Kersten-Oertel, virtual vessels of an
arteriovenous malformation (AVM) (with color-coded vessels [blue for veins,
red for arteries, and purple for the AVM nidus]) overlaid on a live image of a 3D
printed nylon anthropomorphic head phantom; Seokhee Jeon, Seungmoon Choi, and
Matthias Harders, an example of a visuo-haptic augmented reality system, doing a
modulation of real soft object stiffness; and Kristy Jost, Genevieve Dion, and Yury
Gogotsi, 3D simulations of knitted smart textiles (rendered on the Shima Seiki Apex
3 Design Software).
Several members of CRC Press contributed in important ways to this book’s
publication and deserve recognition. First, I thank Jessica Vakili, senior project
­coordinator, for answering numerous questions about the process of editing the book
and those of the chapter authors in a timely, patient, and always efficient manner.
I also thank and acknowledge Cindy Renee Carelli, senior acquisition editor, for
contacting me about editing a second edition, championing the proposal through
the publisher’s review process, and her timely reminders to meet the deadline. The
project editor, Todd Perry, is thanked for the important task of overseeing the coordination, copyediting, and typesetting of the chapters. Gowthaman Sadhanandham
is also thanked for his work in production and assistance provided to authors.
Most importantly, in my role as editor for the second edition, I acknowledge and
thank the authors for their hard work and creative effort to produce outstanding
chapters. To the extent this book provides the community with a valuable resource
and stimulates further developments in the field, each chapter author deserves much
thanks and credit. In many ways, this book began 14 years ago, when the first edition
was published. To receive contributions from some of the original authors, to  see
how their careers developed over the years, and the contributions they made to the
field, was a truly satisfying experience for me. It was a great honor that such a distinguished group again agreed to join the project.
Finally, in memoriam, I thank my parents for the freedom they gave me to follow
my interests and for the Erlenmeyer, distilling, and volumetric flasks when I was a
budding teenage scientist. Further, my niece, Melissa, is an inspiration and serves
as the gold standard in the family. Last but not least, I acknowledge my daughter,
Jessica, student and college athlete, for keeping me young and busy. I look forward
to all she will achieve.


Woodrow Barfield, PhD, JD, LLM, has served as professor of engineering at the
University of Washington, Seattle, Washington, where he received the National
Science Foundation Presidential Young Investigator Award. Professor Barfield
directed the Sensory Engineering Laboratory, where he was involved in research on
sensors and augmented and virtual reality displays. He has served as a senior editor for Presence: Teleoperators and Virtual Environments and is an associate editor
for Virtual Reality. He has more than 350 publications and presentations, including
invited lectures and keynote talks, and holds two degrees in law.


Santa Barbara Santa Barbara. Spain Woodrow Barfield Chapel Hill. Canada Michael Cohen Computer Arts Laboratory University of Aizu Aizu-Wakamatsu. Germany Seungmoon Choi Pohang University of Science and Technology Pohang. Louis Collins Department of Biomedical Engineering Department of Neurology & Neurosurgery Montreal Neurological Institute McGill University Montreal. Behzadan Department of Civil. Environmental. France K. California Gabriel Cirio Department of Computer Science Universidad Rey Juan Carlos Madrid. New Zealand Steve Bourgeois Vision and Content Engineering Laboratory CEA LIST Gif-sur-Yvette. California D. Florida Mark Billinghurst Human Interface Technology Laboratory New Zealand University of Canterbury Christchurch. and Construction Engineering University of Central Florida Orlando. North Carolina Amir H. South Korea Ronald Azuma Intel Labs Santa Clara.T. Japan Genevieve Dion Shima Seiki Haute Technology Laboratory ExCITe Center Antoinette Westphal College of Media Arts and Design Drexel University Philadelphia.Contributors Oliver Amft ACTLab Research Group University of Passau Passau. Tim Cheng Department of Electrical and Computer Engineering University of California. Michigan xv . Pennsylvania Suyang Dong Department of Civil and Environmental Engineering University of Michigan Ann Arbor.

Canada Kiyoshi Kiyokawa Cybermedia Center Osaka University Osaka. Kamat Department of Civil and Environmental Engineering University of Michigan Ann Arbor. Drexel Nanomaterials Institute and Shima Seiki Haute Technology Laboratory ExCITe Center Antoinette Westphal College of Media Arts and Design Drexel University Philadelphia. France Kristy Jost Department of Materials Science and Engineering College of Engineering A. Pennsylvania Matthias Harders University of Innsbruck Innsbruck. France Guido Gioberto Department of Computer Science and Engineering University of Minnesota Minneapolis. Quebec. Michigan Marta Kersten-Oertel Department of Biomedical Engineering Montreal Neurological Institute McGill University Montreal. Pennsylvania Seokhee Jeon Kyung Hee University Seoul. and Apparel University of Minnesota St Paul. Hong Kong Vincent Gay-Bellile Vision and Content Engineering Laboratory CEA LIST Gif-sur-Yvette. Minnesota Jan Fischer European Patent Office Munich. Japan . Housing.xvi Lucy E.J.J. Inserm UMR 1099 University of Rennes Rennes. South Korea Vineet R. California Pierre Jannin INSERM Research Director LTSI. Drexel Nanomaterials Institute Drexel University Philadelphia. Dunne Department of Design. Germany Patricia Flanagan Wearables Lab Academy of Visual Arts Hong Kong Baptist University Kowloon Tong. Austria Anne-Cecilie Haugstvedt Computas A/S Lysaker. Minnesota Yury Gogotsi Department of Materials Science and Engineering College of Engineering A. Norway Contributors Tobias Höllerer University of California Santa Barbara.

Germany Cory Simon Johnson Space Center National Aeronautics and Space Administration Houston. Spain Takafumi Taketomi Nara Institute of Science and Technology Nara. New Zealand Attila Reiss Chair of Sensor Technology University of Passau Passau. New Zealand Alvaro G. California Maki Sugimoto Faculty of Science and Technology Department of Information and Computer Science Keio University Tokyo.xvii Contributors Bernard Kress Google [X] Labs Mountain View. France . Georgia Ulrich Neumann Department of Computer Science University of Southern California Los Angeles. Otaduy Department of Computer Science Universidad Rey Juan Carlos Madrid. California John Krogstie Department of Computer and Information Science Norwegian University of Science and Technology Trondheim. Linte Department of Biomedical Engineering Rochester Institute of Technology Rochester. New York Allaeddin Nassani Human Interface Technology Laboratory New Zealand University of Canterbury Christchurch. France Cristian A. Spain Carolin Reichherzer Human Interface Technology Laboratory New Zealand University of Canterbury Christchurch. Norway Dorra Larnaout Vision and Content Engineering Laboratory CEA LIST Gif-sur-Yvette. Texas Thad Starner School of Interactive Computing Georgia Institute of Technology Atlanta. Perez Department of Computer Science Universidad Rey Juan Carlos Madrid. Japan Despina Papadopoulos Interactive Telecommunications Program New York University New York. New York Mohamed Tamaazousti Vision and Content Engineering Laboratory CEA LIST Gif-sur-Yvette. Japan Miguel A.

California . Mendota Heights. Colorado Julián Villegas Computer Arts Laboratory University of Aizu Aizu-Wakamatsu. Japan Georgina Voss Science and Technology Policy Research University of Sussex Sussex. Santa Barbara Santa Barbara. Inc. Minnesota and Office of High Performance Computing and Communications National Library of Medicine National Institutes of Health Bethesda.xviii Jonathan Ventura University of Colorado Colorado Springs. United Kingdom Xin Yang Department of Electrical and Computer Engineering University of California. Maryland Suya You Department of Computer Science University of Southern California Los Angeles. California Contributors Ziv Yaniv TAJ Technologies.

Section I Introduction .


............... And unlike a laptop or a palmtop. is interactive and in real-time.... a brief h­ istorical perspective.................................... While each technology alone (AR and wearables) is ­providing people with amazing applications and technologies to assist them in their daily life.........1 Wearable Computers and Augmented Reality Musings and Future Directions Woodrow Barfield CONTENTS 1.... 9 1..2 Toward a Theory of Augmented Reality.. 10 References..... ­information ­provided by a wearable computer can be very context and location sensitive................ spatialized sound.... a wearable c­ omputer is constantly turned on and is often used to interact with the real-world through ­sensors that are becoming more ubiquitous each day..7 1... We began to discuss different degrees of reality and virtuality....................... Often........................ Let me begin to set the stage by offering a few definitions....... when virtual images.... as.......................... in some cases....... codified the thinking by proposing a virtuality continuum 3 .. and a glimpse into the future of a sensor-filled...... Paul Milgram from the University of Toronto....... In the early days of research in developing augmented reality........... I briefly introduce the topic of wearable computers and augmented reality.......... and is registered in three dimensions. In this regard...... with the goal to provide the reader a roadmap to the book... most people think of a w ­ earable computer as a computing device that is small and light enough to be worn on one’s body without causing discomfort. many of the same researchers were also involved in creating immersive virtual environments..... and haptic feedback are combined with wearable computers to augment the world with information whenever or wherever it is needed.... Azuma (1997) defined an augmented reality application as one that combines the real world with the virtual world..... for example... 11 In this chapter......... especially when combined with GPS..................... a hand-held computer....3 Challenges and the Future Ahead...... or in the case of a smart phone.............1 Public Policy................. the platform to deliver augmented reality is a wearable device...... Additionally. Early on.... multiplicative............ Furthermore..................... the computational model of wearable computers differs from that of laptop computers and personal digital assistants.... wearable computer and augmented reality (AR) world...... the combination of the technologies is often additive and.

2002).4 Fundamentals of Wearable Computers and Augmented Reality which represents a continuous scale ranging between the completely virtual world.. and a completely real. 1994). I envision a continuum that starts with the most basic of ­wearable computing technology and ends with wearable computing that is actually connected to a person’s central nervous system. that is. as humans are becoming more-and-more equipped with wearable computing ­technology. In fact. or diminish reality (Mann. Another prominent early researcher in wearables. where the real augments the virtual. and with numerous examples. (Image courtesy of Wikimedia Commons. The area between the two extremes. was Steve Mann (2001. The reality–virtuality continuum therefore encompasses all possible variations and compositions of real and virtual objects. Steve.1). their brain (Figure 1. When I think of the different types of computing technology that may be worn on or in the body. he expanded the discussion of wearable computing to include the more expansive term “bearable computing” by which he meant wearable computing technology that is on or in the body. where both the real and the virtual are mixed. a virtuality. mediate. now at the University of Toronto.) . in fact. is the so-called mixed-reality—which Paul indicated consisted of both augmented-reality. we are just now at the cusp of wearable computing and sensor technology breaking the skin barrier and moving 4 mm 2 3 1 FIGURE 1. where the virtual augments the real.1  A microchip is used to process brain waves that are used to control a cursor on a computer screen. Steve showed how wearable computing could be used to augment. and a proponent of the idea of mediating reality. describes wearable computing as miniature body-borne computational and sensory devices. and augmented ­virtuality. reality (Milgram et al. 2002). in my view. The ­extension of computing integrated into a person’s brain could radically enhance human sensory and cognitive abilities. the distinction as to what is thought of as a prosthesis is becoming blurred as we integrate more wearable computing devices into human anatomy and physiology.

And in another example of a wearable implantable device. According to Kevin. With the appropriate wearable computing technology consisting of a microchip that is implanted onto the surface of the brain (where it monitors electronic thought pulses). which will then wirelessly send a command to any of ­various electronic devices. Professor Warwick was able to control an electric wheelchair and an artificial hand. and so on. and through the internet. Further. For example. using the neural interface. navigation. the company . Sadly. One of the early adopters of wearable computing technology. just in the United States about. I highlight this example to show. since then. either being developed now. Most notably. and electric wheelchairs. including computers. For example. such people may use a computer by thought alone allowing them to communicate with their family. the world at large. this was the first solely electronic communication between the nervous systems of two humans. This disease is a rapidly progressive. or soon to be developed.Wearable Computers and Augmented Reality 5 into the human body. This bi-­directional functionality was demonstrated with the aid of Kevin’s wife and a second. sometimes called Lou Gehrig’s disease. 5000 people yearly are diagnosed with just such a disease that ultimately shuts down the motor control capabilities of their body—amyotrophic lateral sclerosis. that will benefit humanity in ways we are just now beginning to realize. The ­vibrator acts as microphone and speaker. stereos. caregivers. there are very transformative uses of wearable computing technology. In addition to being able to measure the signals transmitted along the nerve fibers in Professor Warwick’s left arm. especially with regard to implantable sensors within the body. consider people with debilitating diseases such that they are essentially locked in their own body. Other types of innovative and interesting wearable devices are being developed at a rapid pace. And ­neuroscientists and robotics engineers have just recently demonstrated the ­viability of direct brain-to-brain communication in humans using electroencephalogram (EEG) and image-guided transcranial magnetic stimulation (TMS) technologies. that while many uses of AR/wearables will be for gaming. sending sound waves along the jawbone to a person’s eardrum. consider a German team that has designed a microvibration device and a wireless low-frequency receiver that can be implanted in a person’s tooth. shopping. invariably fatal neurological disease that attacks the nerve cells responsible for controlling voluntary muscles. a procedure which allowed him to link his nervous system directly to a computer. less complex implant which connected to her nervous system. many have extended Kevin’s seminal work in wearable computers using RFID chips and other implantable sensors (and there is even an a­ nti-chipping statute enacted in California and other states). and eventually into the brain. the implant was also able to create artificial sensation by ­stimulating the nerves in his arm using individual electrodes. was Professor Kevin Warwick who in 1998 at the University of Reading was one of the first people to hack his body when he participated in a series of proof-of-concept studies involving a sensor implanted into the median nerves of his left arm. Already there are experimental systems (computing technology integrated into a person’s brain) in-field now that are helping those with severe physical disabilities. researchers at Brown University and Cyberkinetics in Massachusetts are devising a microchip that is implanted in the motor cortex just beneath a person’s skull that will be able to intercept nerve signals and reroute them to a computer.

2  The implantable miniature telescope (IMT) is designed to improve vision for those experiencing age-related macular degeneration. In fact. and is the leading cause of irreversible vision loss and legal blindness in people over the age of 65.6 Fundamentals of Wearable Computers and Augmented Reality Setpoint. Google’s technology consists of contact lens built with special sensors that measures sugar levels in tears using a tiny wireless chip and miniature sensor embedded between two layers of soft contact lens material.2).S. As interesting and innovative as this solution to monitoring diabetes is.) . The IMT technology reduces the impact of the central vision blind spot due to end-stage AMD and projects the objects the patient is looking at onto the healthy area of the light-sensing retina not degenerated by the disease. for example. a spinoff company of Cambridge University. which works like the telephoto lens of a camera (Figure 1. Smart Holograms. As for developing a telephoto lens. and others are developing eyeworn sensors to assist those with the disease. we may see people equipped with contact lens or retinal prosthesis that monitor their health. (Images provided courtesy of VisionCare Ophthalmic Technologies. because if not controlled such people are at risk for dangerous complications. and heart. is a main reason why people will become equipped with wearable computing technology and sensors that monitor their body’s health. millions of p­ eople worldwide with diabetes could benefit from implantable sensors and wearable ­ ­computers designed to monitor their blood-sugar level. FDA approved an implantable miniature telescope (IMT). detect energy in the x-ray or infrared range. this isn’t the only examples of eye-oriented wearable technology that will be developed. for the approximately 20–25 million people worldwide who have the advanced form of age-related macular degeneration (AMD). This device works by activating the body’s natural inflammatory reflex to dampen inflammation and improve clinical signs and symptoms. Google. including damage to the eyes. is developing computing therapies to reduce systemic ­inflammation by stimulating the vagus nerve using an implantable pulse generator. and have telephoto capabilities. Saratoga. an implantable telescope could offer hope. FIGURE 1. kidneys. Medical necessity. to manage debilitating disease such as ­diabetes. in 2010. In the future. detailed vision. To help people monitor their blood-sugar level. a disease which affects the region of the retina responsible for central. CA. the U. In fact.

and ground maps to generate multi-layered real-time location-based databases. With these senses the cell phone can be used to track a person’s location.Wearable Computers and Augmented Reality 7 The surgical procedure involves removing the eye’s natural lens. Also. sensor data derived from wearable computers will be even more powerful when linked to the physical world. claiming that his very arrest for filming the officers constituted a violation of his rights under the First (free speech) and Fourth (unlawful arrest) Amendments to the U. the point being that much of wearable computing technology comes under government regulation.S. With sensor technology.S. 1983). Federal Statute (42 U. and can link to a smartphone often with wireless sensors. and a camera to record the visual scene. the cybersecurity of wearable . The charges against Glik. including our goals. First Circuit Court of Appeals—but note it is not the only legal dispute involving sensors and wearable computers. Just consider one of the most common technologies equipped with ­sensors—a cell phone. were subsequently judged baseless and were dismissed. 2011). This example brings up the issues of privacy and whether a person has a legal right to film other people in public. Cunniffe. In the case. location information will link the physical world to the virtual meta-world of sensor data. heart rate. knowing where a photo was taken. aerial. Simon Glik was arrested for using his cell phone’s digital video camera (a wearable computer) to film several police officers arresting a young man on the Boston Common (Glik v.1  PUBLIC POLICY Although not the focus of this book. as with cataract surgery. and replacing the lens with the IMT. It can contain an accelerometer to measure changes in velocity. everything from the clothing we wear to the roads we drive on will be embedded with sensors that collect information on our every move. this is an intriguing step in that direction and a look into the future of wearable computers. and our desires. and integrate that information with comprehensive satellite. which included violation of Massachusetts’s wiretap statute and two other state-law offenses. an important topic for discussion is the use of augmented reality and wearable computers in the context of public policy especially in regard to privacy. body-worn sensors are being used to monitor blood pressure. will add rich ­metadata that can be employed in countless ways. I should point out that in the United States any device containing a contact lens or other eye-wearable technology is regulated by the Federal Drug Administration as a medical device. a gyroscope to measure orientation. and blood ­glucose. given the ability of hackers to access networks and wireless body-worn devices. and that his clearly established Fourth Amendment rights were violated by his arrest without probable cause. On this point. that Glik was exercising clearly established First Amendment rights in filming the officers in a public space. In addition. Glik then brought suit under a U. While the vast amount of information captured by all the wearable digital devices is valuable on its own. For example.C. The court held that based on the facts alleged. or when a car passed by an automated sensor. Steve Mann presents the idea that wearable computers can be used to film newsworthy events as they happen or people of authority as they perform their duties. Consider the following case decided by the U. Constitution. 1. In effect. weight.S. While telephoto eyes are not coming soon to an ophthalmologist office.S.

an unwanted advertisement on the side of a building. a reported benefit of virtual advertising is that it allows the action on the screen to continue while displaying an ad viewable only by the home audience. 2001).g. or animated images into television programs or movies. I published an article “Commercial Speech. former U. 18 U. are rapidly moving under the skin as they begin to connect the functions of our body to the sensors external to it (Holland et al. actually kicked out a patron for wearing Glass..S. Furthermore.. and Advertising Using Virtual Images Inserted in TV. What may be worrisome about the use of virtual images to replace portions of the real world is that corporations and government officials may be able to alter what people see based on political or economic considerations. The restaurant is standing by its no-glass policy. Dick Cheney. Furthermore. Film.C. what about the privacy issues associated with other wearable computing technology such as the ability to recognize a person’s face. In the United States. Ohio. Another point to make is that sensors on the outside of the body. police record or credit report). Vice President. and the Real World” in UCLA Entertainment Law Review. We can think of virtual advertising as a form of digital technology that allows advertisers to insert computergenerated brand names. I discussed the legal and policy ramifications of placing ads consisting of virtual images projected in the real world.. Video Voyeurism Prevention Act of 2004. logos. if that signal was used to assist the individual in seeing and perceiving the world? On just this point. equipped with a pacemaker had its wireless feature disabled in 2007. for example. The Homeland Security agents removed the programmer who was wearing Google Glass connected to his . For example. e. if any. such conduct is prohibited under State and Federal law (see.8 Fundamentals of Wearable Computers and Augmented Reality devices is becoming a major concern. Changing directions. would an individual be liable if they interfered with a signal sent to an individual’s wearable computer. the use of wearable computers ­combined with augmented reality capabilities can be used to alter or diminish reality in which a wearable computer can be used to replace or remove clutter. Lost Lake Cafe. what about privacy issues and the use of wearable computers to film people against their will? Consider an extreme case. despite mixed responses from the local community. and tag that information on the person as they move throughout the environment? As many of the chapters in this book show. On this topic. a theater owner in Columbus. Intellectual Property Rights. 1801). which is the act of filming or disseminating images of a person’s private areas under ­circumstance in which the person had a reasonable expectation of privacy regardless of whether the person is in a private or public location. Restaurants have also entered into the debate about the direction of our wearable computer future. a Seattle-based restaurant. saw enough of a threat from Google Glass to call the Department of Homeland Security. Video voyeurism is not only possible but being done using wearable computers (mostly hand held cameras). another policy issue to consider for people equipped with networked devices is what liabilities. say. In the article. would be incurred by those who disrupt the functioning of their computing prosthesis. Taking a stance against Google Glass. the consequences of which seem to bring up the dystopian society described in Huxley’s Brave New World. then search the internet for personal information about the individual (e.S. In the case of TV. an altered reality may then become the accepted norm.g.A. video voyeurism. or with Steve’s wearable computer technology and other displays. In another incident. the real world.

For example. Thus. The ticket was for violating a California statute which prohibited a visual monitor in her car while driving. my students and I built a head tracked augmented reality system that as one looked around the space of the laboratory. When I was on the faculty at the University of Washington. 1. Further.S. a reporter for Business Insider said he had his Google Glass snatched off his face and smashed to the ground in San Francisco’s Mission District. marking some of the first clashes over the nascent wearable technology. the ticket was dismissed due to lack of proof the device was actually operating while she was driving. Google is lobbying officials in at least three U. including its laws and regulations. my goal was to inform the ­readers of this book that while the technology presented in the subsequent chapters is fascinating and even inspiring. To show the power and influence of corporations in the debate about our wearable computer/AR future.Wearable Computers and Augmented Reality 9 prescription lenses. in addition to FDA regulations. Continuing the theme of how wearable computers and augmented reality technology impact law and policy.S. over concerns that drivers wearing Google Glass may pay more attention to their email or other online content than the road. technology is also influenced by society. With Jim Foley. I asked the question of whether there was any theory to explain how different characteristics of virtual . there are still policy and legal issues that will have to be discussed as wearable computer and augmented reality technologies improve and enter more into the mainstream of society.2  TOWARD A THEORY OF AUGMENTED REALITY As a final comment. San Francisco seems to be ground zero for cyborg disputes as a social media consultant who wore Glass inside a San Francisco bar claimed that she was attacked by patrons objecting to her wearing the device inside the bar. states to stop proposed restrictions on driving with headsets such as Google Glass. some jurisdictions are just beginning to regulate wearable computing technology if its use poses a ­danger to the population. And in a high-profile California case that raised new questions about distracted driving. a driver wearing Google Glass was ticketed for wearing the display while driving after being stopped for speeding. sparsely populated Wyoming is among a small ­number of U. By presenting the material in the earlier sections. I can conclude—while technology may push society further. there is a feedback loop. In fact. they saw a corresponding computer-generated image that was rendered such that it occluded real objects in that space. I  performed experiments to determine how people mentally rotated images rendered with different lighting models. I became interested in the topic of how people ­performed cognitive operations on computer-generated images. states eyeing a ban on the use of wearable computers while driving. We noticed that some attributes of the virtual images allowed the person to more easily view the virtual object and real world in a ­seamless manner. now at Georgia Tech. a San Francisco bar frequented by a high-tech crowd has banned patrons from wearing Google Glass while inside the establishment. one often hears people discuss the need for theory to provide an intellectual framework for the work done in augmented reality. This led to thinking about how virtual images could be seamlessly integrated into the real world. In addition. Later. Later.

and further into the future bionic eyes that record everything a person sees. the chapters also focus on providing solutions to some of the difficult design problems in both of these fields. consider a technical problem. We say that two dimensions (features) are integral when they are perceived holistically. and Shepard. 1. While this chapter focused more on a policy discussion and futuristic view of wearable computers and augmented reality. but we should all be involved in determining where technology leads us and what that future looks like. Clearly. an example being colors varying in hue. mediated. . integral dimensions combine into relatively unanalyzable. And with ­continuing advances in miniaturization and nanotechnology. A vast amount of converging evidence suggests that people are highly ­efficient at selectively attending to separable dimensions. For example. The authors of the paper noted that separable dimensions remain psychologically distinct when in ­combination. and I encourage readers of this book to do so. I also expect that wearable computing technology will become more-and-more integrated with the human body. especially for reasons of medical necessity. image registration. but I expect vast improvements in image registration as the world is filled with more sensors. Although people can selectively attend to integral dimensions to some degree. By contrast. Such technology will provide people augmented reality capabilities that would be considered the subject of science fiction just a few years ago. unitary wholes. that is. the remaining chapters focus far more on technical and design issues associated with the two technologies. the process is far less efficient than occurs for s­ eparable-dimension stimuli (Shepard. I think that much can be done to develop a theory of augmented.10 Fundamentals of Wearable Computers and Augmented Reality images combined to form a seamless whole with the environment they were projected into. an example being forms varying in shape and color. along with the capability to overlay the world with graphics (essentially information). The reader should keep in mind that the authors of the chapters which follow are inventing the future. to virtual images projected into the real world. I recalled a paper I had read while in college by Garner and Felfoldy (1970) on the integrality of stimulus dimensions in various types of information processing. brightness. or whether virtual images projected in the real world appeared separate from the surrounding space (floating and disembodied from the real world scene). Such research would have to expand the past work which was done on single images. 1964).3  CHALLENGES AND THE FUTURE AHEAD While the chapters in this book discuss innovative applications using wearable ­computer technology and augmented reality. or diminished reality using the approach discussed by Garner and Felfoldy. and saturation. GPS lacks accuracy. there are still many design challenges to overcome and many amazing applications yet to develop—such goals are what designing the future is about. head-worn displays will be replaced with smart contact lens. it’s hard to visually decode the value of one independently from the other.

and Barfield.. PRESENCE: Teleoperators and Virtual Environments.3d 78 (1st Cir. 355–385. Mahwah. Boston. Canada Publisher. H. Presence Connect. N. Mann. Doubleday. 747–792. J.. Toronto.. 282–292. Anchor Canada Publisher. D. 1. Journal of Mathematical Psychology. T. Computing under the skin. T. F. 1997. W. 655 F.. Integrality of stimulus dimensions in various types of information processing. . and Niedzviecki. 2001. A survey of augmented reality. in Barfield. Fundamentals of Wearable Computing and Augmented Reality. W. vol.. 11(2).. pp. Takemura. and that the arrest of the citizen for a wiretapping violation violated the citizen’s First and Fourth Amendment rights). Inc. 2002.. (eds. L. and Kishino. S. 6(4).... R. A.. H. D. Augmented reality: A class of displays on the reality–virtuality continuum. 1964. 225–241. G. Garner.Wearable Computers and Augmented Reality 11 REFERENCES Azuma. 1994. 1970. NJ. MIT Press. P. R.). 2001. Attention and the metric structure of the stimulus space. Mann. R. Milgram. Glik v. and Felfoldy. 1. and Caudell.. pp. Lawrence Erlbaum Associates. Cyborg: Digital Destiny and Human Possibility in the Age of the Wearable Computer. Mediated reality with implementations for everyday life. in Proceedings of the SPIE Conference on Telemanipulator and Telepresence Technologies. Roberson. August 6. MA. Presence: Teleoperators and Virtual Environments. 2351. W. 54–87. Holland. Cognitive Psychology. Cunniffe. 158–175.. 2011) (case at the United State Court of Appeals for the First Circuit that held that a private citizen has the right to record video and audio of public officials in a public place. Utsumi. the online companion to the MIT Press journal. Shepard. S.


..............2 Meeting the Challenge Wearable Computing Thad Starner CONTENTS 2....................... Why have they captured our imaginations now...... Instead.........................1 Networking............ 14 2.................... the tablet offers a different set of affordances (Gibson 1977) than the smartphone or the laptop..7 Industrial Wearable Systems............. who would have guessed that increasingly we would use it more for texting than talking? Some pundits look for a killer app to drive the adoption of a new class of device..............................................................................................28 References........ 17 2..............................24 2.................................... 15 2................. 18 2.......................................................... when the technology has been available for decades? While Fitbit’s fitness tracking devices are selling in the millions in 2014...................................................8 Academic/Maker Systems for Everyday Use................................................ what prevented FitSense (see Figure 2..............5 Virtual Reality................ tablets are outselling laptops in Europe.. the IBM Simon touchscreen smartphone had many features familiar in today’s phones...................20 2................................. Yet that can be misleading.........................26 2............................................................................ and sometimes developers can be surprised by the ways in which users run with a technology...........3 Mobile Input... but it was the Apple iPhone in 2007 that seized the public’s imagination (Sager 2012)......... Many new classes of devices have followed a similar arc of adoption...... Yet over 20 years later.................. When the cellular phone was introduced in the early 1980s........ and Reddy Information Systems had a commercial wearable with Reflection Technology’s Private Eye HMD in 1991 (Eliason 1992).............5) from having similar success with such devices in 2000? Since 1993 I have been wearing a computer with an HMD as part of my daily life........................................................................ Google Glass is generating more excitement than any of those early devices.............. the perceived need for a technology lags behind innovation......................................10 Meeting the Challenge................... In 1994.... yet there is no single killer app that drives adoption............... As of mid-2014.......... The fax machine was invented in 1846 but became popular over 130 years later.28 Wearable computers and head-mounted displays (HMDs) are in the press daily........................ 19 2............................................ 22 2...........6 Portable Video Viewers.9 Consumer Devices... making 13 ..................... Often......................................2 Power and Heat....4 Display.......

This position is very comfortable. In addition. the throughput of a cellular network in cities like Atlanta could be impressive. a Google Glass user might say. some spacesuits. a digital music player. and even smartphones for some applications. Today when sending a message. here I will focus on wearable computers that include an HMD. and cloud-based office tools are now commonplace on smartphones. my personal definition of a wearable computer is any bodyworn computer that is designed to provide useful services while the user is performing other tasks. I often use my wearable computer while walking. yet the latency would severely limit the usability of a user interface depending on it. but the message content “Remember to pick up . tablets. Such a secondary interface in support of a primary task is characteristic of a wearable computer and can be seen in smartwatches.” and the experience can be a seamless interplay of local and cloud-based processing. In fact. and work on large documents while typing using a onehanded keyboard called a Twiddler. It is often used while a user is exercising. Like all wearable computers. voice-based web search. for example. fitness monitors. The tablet is controlled by finger taps and swipes that require less hardware and dexterity than trying to control a mouse and keyboard on a laptop. 2. power and heat. much more so than any other interface I have tried. for reading in bed the tablet is lighter than a laptop and provides an easier-to-read screen than a smartphone. Wearable computers have yet a different set of affordances than laptops. put the focus of my HMD at the same depth as the ceiling. studying. but only in the past few years has the latency of cellular networks been reduced to the point that computing in the cloud is effective. The three commands OK Glass. Then I will present five phases of HMD development that illustrate how improvements in technology allowed progressively more useful and usable devices. send a message to. and no other device enables such on-the-go use. and mobile input. until recently.14 Fundamentals of Wearable Computers and Augmented Reality it more desirable in certain situations. Some of these devices are already commonplace. send a message to Thad Starner. On-the-go use is one aspect of wearable computers that makes them distinct from other devices. those based on HMDs have to address fundamental challenges in networking (both on and off the body). First I will describe these challenges and show how. or commuting. and the interface is used in short bursts and then ignored. However. Often the wearable’s interface is secondary to a user’s other tasks and should require a minimum of user attention. “OK Glass. and smartphones.1 NETWORKING Turn-by-turn navigation. I find it helps me think to be moving when I am composing. I often lie on a couch in my office. which also makes it convenient for use when the user is in positions other than upright at a desk. Take. and Thad Starner are processed locally because the speech recognizer simply needs to distinguish between one of several prompts. A decade ago. as these devices are at the threshold of becoming popular and are perhaps the most versatile and general-purpose class of wearable computers. but students often think that they are waking me when they walk into my office. they severely limited what types of devices could be manufactured. For example. Remember to pick up the instruction manual.

S. outdoor accuracy of 10 m (Varshavsky and Patel 2010). and the user may barely notice a difference in performance between local and remote services. the content is processed quickly. 2. or sometimes even an HSPDA connection.Wearable Computing 15 the instruction manual” requires the increased processing power of the cloud to be recognized accurately. The standard was not designed with power as a foremost concern. Since battery technology is unlikely to change during a . Bluetooth (IEEE 802.11) might seem a viable alternative to commercial cellular networks. GPS was accurate to within 100 m due to the U. However. a lithium-ion camcorder battery stores the same amount of power but weighs a quarter as much.2  POWER AND HEAT In 1993. the Global Positioning System uses a network of satellites to provide ­precisely synchronized radio transmissions that a body-worn receiver can use to determine its position on the surface of the planet. not as a body network. a part of a single chip can provide this service. with the widespread adoption of Bluetooth Low Energy by the major mobile phone manufacturers have wearable devices really had an appropriate body-centered wireless network. with a GPRS. the battery will often be one of the biggest and most expensive components. the wait for processing in the cloud can be intolerable.15) was originally intended as a replacement for RS232 connections on desktop PCs. For example. military intentionally degrading the signal with Selective Availability. Thus. Today. a sensor mounted in a shoe to monitor footfalls might have difficulty maintaining connection to an earbud that provides information as to a runner’s performance. Turn-by-turn directions were impossible. On-body networking has also been a challenge. but they required adapters that were the size of a small mobile phone and required significant power. battery life will continue to be a major obstacle to wearable technology. WiFi (IEEE 802. Similarly. Wearable computers in the late 1990s often used WiFi. Today. and even basic implementations were unstable until 2001. since improvements in battery technology have been modest compared to other computing trends. which is blocked by water and the human body. EDGE. while disk storage density increased by a factor of 1200 during the 1990s. Both WiFi and Bluetooth use 2. Fundamental issues still remain. For example. Only recently.3 kg. In a mobile device. Today. battery energy density only increased by a factor of three (Starner 2003). civilian accuracy has a median open. Most positioning systems also involve networks. With an LTE cellular connection. When the user was walking through the lab. the system could also re-route phone calls to the nearest phone (Want 2010). but before 2000. While that seems like an impressive improvement. GPS is probably one of the most commonly used technologies for on-body devices. Today. Modern GPS units can even maintain connection and tracking through wooden roofs. the location-aware Active Badge system made by Olivetti Research Laboratory in 1992 used a network of infrared receivers to detect transmissions from a badge to locate a wearer and to unlock doors as the user approached them. but until 2000 open hotspots were rare. my first HMD-based wearable computer was powered by a lead-acid gel cell battery that massed 1.4 GHz radio. It is hard to imagine life without it.

lowering the ­temperature at the surface? Or can lower-heat alternatives be found for the ­electronics? Unfortunately. Thus. One of those components is the DC–DC power converter. This ­tension between performance and physical size can be quite frustrating to designers of wearable devices. A typical converter might accept between 3. fashion is the key. Designing system and user software carefully for these CPUs can have significant benefits. Heat often limits how small a mobile device can be. Wireless networking requires significant power when the signal is weak. Today. and physical size and form are major components of the desirability of a device. Given a battery size. which tends to require highly s­ pecialized software. the battery should be specified first as it will often be the most constraining factor on the product’s industrial design and will drive the selection of other components. For non-crucial tasks. A wearable device is often in contact with a user’s skin. it does not matter what benefits it offers. One improvement in mobile consumer electronics that often goes underappreciated is the efficiency of DC–DC power converters. in the next moment. Unless the consumer is willing to put on the device. Yet in consumer products. can the package be made larger to spread the heat. Will the device have the ability to perform the expected services and not become ­uncomfortable to wear? If not. there is a corresponding reduction in heat production.4 and 4. system software can exploit knowledge about its networking to help flatten the battery load. Users often desire small jewelry-like devices to wear but are also attracted to powerhungry services like creating augmented reality overlays with registered graphics or transmitting video remotely. or it will have to throttle its performance considerably to stay at a ­comfortable temperature for the user (Starner and Maguire 1999). That package should be ­optimized in part for thermal dissipation given its expected use. and lithium-ion batteries last longer with a steady discharge than with bursty uses of power. many industrial design tools do not model heat. and it must have enough surface area and ­ventilation to cool. Similarly. an industrial designer creates a fashionable package. waiting for a better signal can save power and heat. Before 2000.16 Fundamentals of Wearable Computers and Augmented Reality normal 18-month consumer product development cycle. Designing . This slow-and-steady technique has cascading benefits: power converters are generally more efficient at lower currents. switching DC–DC converters are often more than 95% efficient and are just a few grams. One bright spot in designing wearable computers is the considerable effort that has been invested in smartphone CPUs and the concomitant power benefits. just the DC–DC converter for Google Glass could mass 30 g (Glass itself is 45 g). and the device might lose 30% of its power as heat.2 V from a nominal 3. Due to this efficiency improvement. Slower computation over a longer period can use significantly less power than finishing the same task at a higher speed and then resting.6 V lithium battery and ­produce several constant voltages for various components. Modern embedded processors with dynamic voltage scaling can produce levels of computing power equivalent to a late-1980s supercomputer in one instant and then. In practice. the iteration cycle between fashion and mechanical engineering constraints can be slow. can switch to a maintenance mode which draws milliwatts of power while waiting for user input. the design of a wearable device is often iterative.

dictating personal notes during a business conversation or a university class is not socially appropriate. even while moving. Speech interfaces seem like an obvious alternative. Unfortunately. What the user really wants is an interface that is unencumbering. Such silent. However. there are many situations in which a user might feel uncomfortable interacting with a device via speech. . first brought on the market in 1992. we have as much space as possible to buffer the heat produced and not overflow the cup. Modern.3  MOBILE INPUT Wearable computing interfaces often aspire to be hands-free.Wearable Computing 17 maintenance and background tasks (e. the Twiddler keyboard. 2. This term is a bit of a misnomer. Yet the device remains a niche market item for dedicated users. hearing. A wristwatch that senses a wearer’s gesture to decline a phone call or to change the track on a digital music player is certainly not hands-free. which makes an unencumbering interface design particularly challenging. In a series of studies. 2002. today’s mini-QWERTY and virtual keyboards require a lot of visual attention when mobile. Some self-­ contained headsets use trackpads or simple d-pad interactions. Xybernaut’s POMA wearable computer suggested another interesting variant on this theme. but it’s clearly better for use while jogging compared to stopping and manipulating a touchscreen. One option is to mount a remote c­ ontroller elsewhere on the body and use Bluetooth Human Interface Device profiles for connection. mobile keyboards will continue to be a necessary part of mobile interfaces. Perhaps as more users type while on-the-go and after the wireless Twiddler 3 is introduced. If the wearable is thought of as a leaky cup. speech recognition on Android and iOS phones has become ubiquitous. 2009). eyes-free mobile text entry still remains an opportunity for innovation. Learning the Twiddler requires half the learning time (25 h for 47 wpm on average) of the desktop QWERTY keyboard to achieve the greater-than-40 wpm required for high school typing classes (Lyons 2006). and heat as water filling it. is still the fastest touch typing mobile device. In fact. more people will learn it.g. Navigating interfaces while on-the-go also remains a challenge. big data machine learning techniques are enabling ever-better speech recognition. and sense of touch than when stationary. but some users would like a more subtle method of interaction. Launched in 2002. recognition rates are improving. an on-the-go wearable user has reduced dexterity. and with low-latency c­ ellular networks and processing in the cloud. Unfortunately. especially for any technology that can accelerate the learning curve. As enough examples of speech are captured on mobile devices with a large variety of accents and background noises. A method of mobile touch typing is needed. Bruce Thomas’s group at the University of South Australia explored both what types of pointing devices are most effective while onthe-go and where they should be mounted (Thomas et al. Thus. then one goal is to keep the cup as empty as possible at any given time so that when a power-hungry task is required. eyesight. His results suggest that mini-trackpads and mini-trackballs can be highly effective.. A user could run his finger over a wired. To my knowledge. caching email and social networking feeds) to be thermally aware allows more headroom for on-demand interactive tasks. attention. Zucco et al.

Perhaps with today’s smaller and lower-power components. and many of the features of Android and iOS can be accessed through these cruder gestures. holographic optics. 2. In the future. is a challenge in the mobile environment. Swipes. auditory and tactile displays are excellent choices for on-the-go users. these interfaces associate gestures with particular commands such as silencing a phone or waking up an interface. One device will not satisfy all needs. More recently. and many others types of technologies. CRTs. Unfortunately. LCDs. and a plethora of community-funded Bluetooth Human Interface Devices are being developed. and many other factors. Smartphones and mobile music players are almost always shipped with earbuds included. False triggering. Instead. I expect the closer contact with the skin made available by smartwatches to enable more reliable and expressive tactile interfaces than a simple on/off vibration motor. for sudden changes without devoting much attention to the process. design trade-offs are made between size. an interface that keeps triggering incorrectly throughout the user’s workday is annoying at best. which can be awkward to manipulate while doing other tasks. Audio displays are another good choice for on-the-go interaction. 2013). and gestures on phone and tablet touchscreens can be made without much precision. Zeagler and Starner explored textile interfaces for mobile input (Komar et al. 2009. Rendering audio in 3D can help the user monitor several ambient information sources at once or can improve the sense of participant presence during conference calls. Ambient audio interfaces (Sawhney and Schmandt 2000) allow the wearer to monitor information sources. transparency. smartphones have broken the former monopoly on graphical user interfaces. power. allows the wearer to hear notifications from the computer without blocking the ear canals. scanning mirrors. taps. however. . ­researchers and startups are spending considerable energy creating gestural interfaces using motion sensors. such as is used with Google Glass and by the military and professional scuba divers. Traditional windows. color. Almost all mobile phones have a simple vibration motor to alert the user to an incoming call. a phone vibrating in a pants pocket or purse can be hard to perceive while walking. HMDs can be created using lasers. however. Fortunately. HMDs can range from devices meant to immerse the user in a synthetic reality to a device with a few lights to provide feedback regarding the wearer’s performance while biking.4 DISPLAY While visual displays often get the most attention. weight. but there is much room for innovation. Yet these devices still require a flat piece of glass. Besides pointing. a wireless version could be made. eyebox (the 3D region in which the eye can be placed and still see the entire display in focus). focus.18 Fundamentals of Wearable Computers and Augmented Reality upside-down optical mouse sensor to control a cursor. Profita et al. For any given HMD. resolution. pointer (WIMP) interfaces are difficult to use while on-the-go as they require too much visual and manual attention. often focusing on rings and bracelets. and there will be an exciting market for third-party interfaces for consumer wearable computers. Bone conduction. icon. menu. The intended purpose of the HMD often forces very different form factors and interactions. brightness and contrast. like the volume of stock market trading.

Nintendo.9  cm LCD screens with LEEP Systems’ wide-angle optics to provide an immersive stereoscopic experience.1  Virtual reality HMDs. $6000).67  kg and uses 6. 2. (c) Virtual i-O i-glasses! Personal 3D viewer head-mounted display (1995. $300). (b) Nintendo virtual boy video game console (1995. However. An example of an early professional system was the 1991 Flight Helmet by Virtual Research. Virtual Research. I’ve clustered these into five categories: virtual reality. It has a 100-degree diagonal field of view and 240 × 120 pixel resolution. HMDs began to be affordable. $395). It weighs 1. By 1994. Virtual I/O. heavy. For its era. and required significant support electronics. the Flight Helmet was competitively priced at $6000. (d) Oculus Rift DK1 (2013. VPL Research.5  VIRTUAL REALITY In the late 1980s and early 1990s. (2014) for a more technical discussion of typical optics of these types of displays. LCD and CRT displays were large.19 Wearable Computing For the purpose of discussion.) . power hungry. academic/maker wearables for everyday use. (a) Virtual Research’s Flight Helmet (1991. the LCDs in the company’s VR4 had twice the resolution at half (a) (b) (c) (d) FIGURE 2. it was during this time that virtual reality was popularized. See Kress et  al. (Images courtesy of Tavenner Hall. and by the mid-1990s.1). portable video viewers. and many others generated a lot of excitement with virtual reality headsets for professionals and gamers (Figure 2. Subsequent Virtual Research devices employed smaller lenses and a reduced field of view to save weight and cost. and consumer devices. $180). industrial wearable systems.

20 Fundamentals of Wearable Computers and Augmented Reality the size. and with over a million devices sold. camcorder viewfinders were a major market for small LCD panels. it ranks among the largest-selling HMDs. Networking was not required. though.6  PORTABLE VIDEO VIEWERS By the late 1990s. mirror-style. The video iPod can stream video. power and network are rarely a concern with these devices. and most did not have the capability for controlling an external screen. With the Glasstron HMD (and the current HMZ line). adjusted for inflation. It uses Reflection Technology’s scanning. Instead. The biggest difference between 1991 and today. Lightweight and inexpensive LCD-based mobile HMDs were now possible (Figure 2. monochromatic display in which a column of 224 LEDs is scanned across the eye with an oscillating mirror as the LEDs flash on and off. there were no popular mobile computing devices that could ­output images or videos. like the Eyetop (Figure 2. which allowed the wearer to watch a movie during a flight or car ride. Minimizing head weight and simulation sickness continues to be a major ­concern with modern VR HMDs. Battery life needed to be at least 2 h and some devices. It costs $180. One concept was that the headsets could be used in place of a large screen television for those apartments with little space. the Oculus Rift Developer Kit 1 slightly surpasses the original Flight Helmet’s field of view and has 640 × 480 pixel resolution per eye while weighing 379 g. Sony was more agnostic about whether the device should be used while mobile or at home. . but it would not be released until 2005. offered packages with an external battery powering both the display and the DVD player. so manufacturers envisioned a small HMD ­connected to a portable DVD player. However. but it has no possibility of head tracking or the freedom of motion available with most VR headsets. Unfortunately. The user controls the experience through head tracking and instrumented gloves as well as standard desktop interfaces such as keyboards and joysticks. The Virtual Boy introduced many consumers to immersive gameplay. with today’s lighter weight panels and electronics. 2. and user input consisted of a few button presses. Unlike many consumer VR devices.2). would be the equivalent of over $10. Still. some users quickly complain of simulation sickness issues. HMD manufacturers started focusing on portable DVD players for entertaining the traveler. In-seat entertainment systems were rare. Smartphones would only become prevalent after 2007. the Glasstron line did include a place to mount Sony’s rechargeable camcorder batteries for mobile usage. they are examples of early major efforts in industrial and consumer devices and share many features with the next class of device. As a table-top head display. While these VR HMDs are not wearables by my definition. the Virtual Boy avoids the problem of too much weight on the head. It is portable and includes the full computing system in the headset (the wired controller includes the battery pack for the device). mobile video viewers. the Virtual Boy provides adjustments for focus and inter-eye distance. However. since they are mostly for stationary use and attach to desktop or gaming systems. creating an apparent 384 × 224 pixel resolution display with persistence of vision. However. is the price—the Rift DK1 is only $300 whereas the Flight Helmet.000 today. The 1995 Nintendo Virtual Boy game console is an interesting contrast to the Flight Helmet.2).

and most recently started integrating enough internal memory to store movies directly.2). (c) MyVu Personal Viewer (2006. flash memory-based mobile video players became common. (f) Epson Moverio BT-100 (2012. $499). having an internal battery and using a micro-SD reader or internal memory for loading the desired movie directly to the headset. portable video viewers became much more convenient.) As small. $270). (b) Eyetop Centra DVD bundle (2004. Modern video viewers.21 Wearable Computing (a) (b) (c) (d) (e) (f ) FIGURE 2. can be wireless. then flash-based media players like the video iPod. $250). (d) Vuzix iWear (2008. with the units even making appearances in vending machines at airports. (a) Sony Glasstron PLM-A35 (2000.2  Portable video viewers first concentrated on interfacing with portable DVD players. $599). . $170). (e) Vuzix Wrap 230 (2010. $700). Companies such as MyVu and Vuzix sold several models and hundreds of thousands of devices (Figure 2. (Images courtesy of Tavenner Hall. like the Epson Moverio.

The result is that the technician can test circuits more quickly and have a better ability to handle precarious . many older devices do not attempt 3D viewing. 2D or 3D. industrial wearable. ease-of-use. technicians must often hold two electrical probes and a test meter. For order picking. would provide a higher quality experience that consumers would prefer. For example.2) is especially interesting as it sits astride three different classes of device: portable video viewer. giving it ease of control and a good battery life. an interactive checklist on a one-eyed HMD can cut in half the required personnel and reduce the required time for completing the task by 70% (Siewiorek et al. and most consumers prefer watching movies on their pocket media players and mobile phones instead of carrying the extra bulk of a video viewer. 2014). allowing the user to hold a probe in each hand. Repairing telephone lines adds the extra ­complication of clinging to a telephone pole at the same time.22 Fundamentals of Wearable Computers and Augmented Reality The Moverio BT-100 (Figure 2.7  INDUSTRIAL WEARABLE SYSTEMS Historically. a process during which a worker selects parts from inventory to deliver to an assembly line or for an outgoing package to a customer. a graphical guide on a HMD can reduce pick errors by 80% and completion time by 38% over the current practice of using paper-based parts lists (Guo et al. industrial HMD-based wearable computers have been one-eyed with an HMD connected to a computer module and battery mounted on the waist (Figure 2. 2010). For instance. when testing an electrical circuit. and see-through and can run standard Android applications. A study performed at Carnegie Mellon University showed that during Army tank inspections. Its battery and trackpad controller is in a wired pendant. Instead. the HMD itself is a bit bulky and the noseweight is too high—both problems the company is trying to address with the new BT-200 model. The Triplett VisualEYEzer 3250 multimeter (Figure 2. Unlike the modern Moverio. and order picking. as simulator sickness was a potential issue for some users and 3D movies were uncommon until the late 2000s. these displays play the same image on both eyes. but such a wide field of view system is even more awkward to transport. An argument could be made that a more immersive system. and consumer wearable. the perceived value of the video system will be more determined by other factors such as convenience. Studies on mobile video viewing show diminishing returns in perception of quality above 320 × 240 pixel resolution (Weaver et al. Unfortunately.3) provides a head-up view of the meter’s display. Some HMD uses provide capabilities that are obviously better than current ­practice. when repairing a car. the HMD might show each step in a set of installation instructions. which can still provide a high quality experience. two-eyed. Carrying the headset in addition to a smartphone or  digital video player is a burden. Instead of removing the user from reality. Unfortunately. It has WiFi and a removable micro-SDHC for loading movies and other content. 2008). video viewers s­ uffer certain apathy from consumers. which suggests that once video quality is good enough. 2. these systems are intended to provide computer support while the wearer is focused on a task in the physical world such as inspection. and price. Improvements in performance for industrial tasks can be dramatic.3). like an Oculus Rift. repair. It is self-contained. maintenance.

$1500). (a) Xybernaut MA-IV computer (1999. (c) Xybernaut MA-V computer (2001. Current practice often requires anesthesiologists to divert their gaze to monitor elsewhere in the room. 2009). (b) Triplett VisualEYEzer 3250 multimeter (2000. $5000). industry has shown a steady interest in the technology. $7500). companies such as FlexiPC and Xybernaut provided a general-purpose line . The HMD overlays vital statistics on the doctor’s visual field while monitoring the patient (Liu et al. $3000). and military a­ pplications. $500). $1995). which reduces the speed at which dangerous situations are detected and corrected.3  Wearable systems designed for industrial. From the mid-1990s to after 2000.) situations. (d) Xybernaut/Hitachi VII/ POMA/WIA computer (2002. (f) Vuzix Tac-Eye LT head-up display (2010. With more case studies showing the advantages of HMDs in the workplace. In the operating room. anesthesiologists use HMDs in a similar way. (e) MicroOptical SV-6 display (2003.23 Wearable Computing (a) (b) (c) (d) (e) (f ) FIGURE 2. (Images courtesy of Tavenner Hall. medical.

and face-to-face conversations was a common additional use of these devices beyond what is seen on smartphones today. Switching to a real-time operating system could help with better battery life. which dictated many difficult design choices. which caused wearers to stop what they were doing and focus on the virtual interface before continuing their task in the physical world. See Figure 2. and chording keyboards such as a Twiddler (shown in Figure 2. but mobile input is still a fundamental challenge. cost. Thus. or a trackpad mounted on the side of the main computer. Meanwhile. and the number of parts required to make a full machine. Yet many opportunities still exist for improvements. a digital standard implemented on top of analog AMPS cellular service. and scheduling—apps that became mostly the domain of smartphones 15 years later. These devices were worn more like eyeglasses or clothing. as wireless Bluetooth implementations were often unstable or non-existent. Most on-body components were connected via wires.4b) or any of the 7. and companies like APX-Labs are adapting these devices to the traditional wearable industrial tasks of repair. but some academics and makers started creating their own systems in the early 1990s that were intended for everyday private use. interfaces are evolving quickly. Audio and visual displays were often optimized for text. meetings. Industrial customers often insisted on Microsoft Windows for compatibility with their other systems. and ×86 processors were particularly bad at power efficiency. 2.8  ACADEMIC/MAKER SYSTEMS FOR EVERYDAY USE Industrial systems focused on devices donned like uniforms to perform a specific task. specialty display companies like MicroOptical and Vuzix (Figure 2.24 Fundamentals of Wearable Computers and Augmented Reality of systems for sale. User input to a general purpose industrial system might be in the form of small vocabulary. However.4a) enabled desktop-level touch typing speeds. weight. The popularization of cloud computing also helped break the Windows monopoly. Due to the use of . taking notes during classes.or 8-button chorders (shown in Figure 2. wearables had to be large to have enough battery life and to dissipate enough heat during use. a dial. isolated-word speech recognition. lightweight. The default Windows WIMP user interface required significant hand-eye coordination to use. and I foresee an array of specialized devices in the future. many corporate customers began to consider operating systems other than Windows. as corporate customers considered wearables as thin client interfaces to data stored in the wireless network.3) made displays designed for industrial purposes but encouraged others to integrate them into systems for industry. and Optinvent ORA are ideal for manufacturing tasks such as order picking and quality control. Applications included listening to music. and maintenance. inspection. Today. After smart phones and tablets introduced popular. a portable trackball. navigation. Windows was not optimized for mobile use. CDPD.11 PCMCIA cards.3 for the evolution of Xybernaut’s line. Vuzix M100. One device is not suitable for all tasks. system complexity. self-contained Android-based HMDs like Google Glass. user experience. Users often explained that having the devices was like having an extra brain to keep track of detailed information. Wireless networking was often by 802. lighter-weight operating systems and user interfaces designed for grosser gesture-based interactions. was used when the wearer needed to work outside of the corporate environment. texting.

Networks included analog dial-up over cellular. designed by Rich DeVaul in 2000. depending on the illusion in the human visual system by which vision is shared between the two eyes. These displays appear see-through to the user because the image from the occluded eye and the image of the physical world from the non-occluded eye are merged to create a perception of both. (b) Lizzy wearable computer. (c) MIThril. CDPD. Several of the everyday users of these homebrew machines from the 1990s would later join the Google Glass team and help inform the development of that project. designed by Thad Starner in 1995 (original design 1993). designed as a commercial. (d) CharmIT. Displays were mostly one-eyed and opaque. in 2000. and Herbert 1 concentrated the electronics into a centralized package. Lizzy. open-hardware wearable computing kit for the community by Charmed. designed by Greg Priest-Dorman in 1994. In general.) lighter-weight interfaces and operating systems. Reflection Technology’s Private Eye (Figure 2. battery life tended to be better than the industrial counterparts. amateur radio.25 Wearable Computing (a) (b) (c) (d) FIGURE 2. The CharmIT. Inc.4d) were popular choices due to their relatively low power and good sharpness for reading text. opaque displays provide better contrast and brightness than transparent displays in daylight environments. but the MIThril and Herbert 3 (not shown) distributed the electronics in a vest to create a more ­balanced package for wearing. and WiFi as they became available. (a) Herbert 1. (Images courtesy of Tavenner Hall.4b) and MicroOptical’s displays (Figure 2.4  Some wearable computers designed by academics and makers focused on ­creating interfaces that could be used as part of daily life. . The opaque displays might be mounted up and away from the main line of sight or mounted directly in front of the eye.

Unless a user is willing to put on the device. As opposed to explicit input. shoe sensor. flash memory. and the design of the HMD itself. Since Bluetooth LE did not yet exist when the FS-1 was created. heat dissipation. input. (Images courtesy of Tavenner Hall. monitors steps taken.9  CONSUMER DEVICES Consumer wearable computers are fashion and. (c) Recon MOD Live HMD and watch band controller for skiing (2011. (a) Fitsense heart band. the Fitbit One (Figure 2. it does not matter what functionality it promises. Released Glass Explorer edition (2014.) . For example.26 Fundamentals of Wearable Computers and Augmented Reality 2. networking. The Fitsense FS-1 from 2000 had a similar focus but also included a wristwatch so that the user can refer to his statistics quickly while on-the-go. it used a proprietary. $400). consumer wearable devices continue to gain acceptance. Making a device that is both desirable and fashionable places constraints on the whole system: the size of the battery. low-power. and environmental information in the background. elevation climbed. and calories burned during the day.5  As technology improves. on-body network to communicate between (a) (b) (c) (d) FIGURE 2. Consumer wearable computers often strive to be aware of the user’s context. location. these devices may sense the wearer’s movement. $1500). and sensors to monitor the user throughout the day. above all. $200). and wristwatch display (2000.5). clipped on to clothing or stored in a pocket. must be designed as such. (d) 2012 Ibex Google Glass prototype. which requires leveraging low power modes on CPUs. (b) Fitbit One (2012. This information is often uploaded to the cloud for later analysis through a paired laptop or phone using the One’s Bluetooth LE radio. $100).

uses head motion. With the HMD. traffic. This reduction in time from intention to action allows the user to glance at the display. and photo spots. However. Networking is via 802. such as FS-1. This choice was necessary because of battery life and the lack of stability of wireless standards-based interfaces at the time. pairing a wrist-mounted controller with an opaque HMD and Android computer mounted in a compatible pair of goggles.5). descent. is precise. Being mounted on the head can also allow HMDbased systems to sense many signals unavailable to a wrist interface. Whereas mobile phones might take 23 s to access (to physically retrieve. much like the speedometer in a car’s dashboard. The Recon MOD Live takes advantage of both approaches. and get useful information while performing other tasks.11 WiFi or tethering to the user’s phone over Bluetooth. events. clock. and suggestions for restaurants. eye movement.Wearable Computing 27 its different components as well as a desktop or laptop. and can be subtle. and a multi-touch trackpad on one earpiece for its input. an increasing number of devices. or interact with apps. including head motion. of course. tourist spots. By definition. Glass alerts the user with a . 2008). For example. and even brain signals in ideal circumstances. he can use the wrist interface to select and scroll through text messages. and HMDs. eye blinks. Now that Bluetooth LE is becoming common. remembering one’s parking location. 2006). one of which is that it can be actually hands-free. another Android-based wearable HMD. is pairing the HMD with a camera so that the user can capture what he sees while on-the-go. Glass’s interface is designed to be used throughout the day and while on-the-go. weather. can provide information to the wearer while on the go. if the user is walking and a text arrives. The display is transparent and mounted high. Google Glass. pictures. and it meant that mobile phones could not interface with the device. This focus on microinteractions helps preserve battery life. Common uses include texting. wristwatches and HMDs can shorten that delay to only a couple of seconds (Ashbrook et al. including the Recon MOD Live and Google Glass (Figure 2. matching the user’s perspective. stock quotes. a wrist-mounted system can sense the user’s hand motions and may even be able to distinguish different types of actions and objects by their sounds (Ward et al. select music to play. such as Recon MOD and Google Glass. An HMD has several advantages over a wristwatch. calendar. and jump airtime. such manual control is easy for the user to learn. turn-by-turn directions. The system is designed for use while skiing to provide information like location. will leverage off-body digital networks by piggybacking on the connection provided by a smartphone. One use. HMDs are also mounted closer to the wearer’s primary senses of sight and hearing. This location provides a unique first-person view of the world. Because these displays are fast to access. a wristwatch requires at least one arm to check the display and often another hand to manipulate the interface. speech. status information can be provided in a head-up manner with little to no control required by the user. It is easily ignorable and designed for short microinteractions lasting a few seconds. email. they reduce the time from when the user first has the intention to check some information and the action to do so. Both consumer wristwatches. and navigate to the appropriate application). On the other hand. videos (10 s in length by default). When the user has more attention (and hands) to spare. unlock. The information can be shared to others via a Bluetooth connection to a smartphone. speed.

If the user ignores the alert. unobstructed. and current large field of view displays burden the nose and face too much. Quickdraw: The impact of mobility and on-body placement on device access time. Acting. interactions are short or broken into multiple small interactions. M. 2. Gilliland. Southern. power and heat.. pp. reply” and dictate a response. N. 71–74. Erlbaum: Hillsdale. Bransford (eds. The challenge now is in taking advantage of this new way to augment humanity. Because Glass displays a limited amount of text on the screen at once. X. Linz. and T. Starner. and N. but I hope to see the same sort of revolution and improvements in efficiency and lifestyle that happened around the PC and the smartphone. and network standards. Is it gropable?—Assessing the impact of mobility on textile interfaces. Starner. Eliason. C. C. Simoneau. The theory of affordances. In: IEEE ISWC. with ­improvements in optics technology. the field of wearable computing is being confronted with too many opportunities. which sends sound through the user’s head directly to the cochlea.10  MEETING THE CHALLENGE The challenges of networking. J. Ideally. Guo. Florence. Zeagler. K. In: Perceiving. April 2008. weight supported by the nose. S.28 Fundamentals of Wearable Computers and Augmented Reality sound. In: ACM Conference Human Factors in Computing Systems (CHI). nothing happens. Both the Recon MOD Live and Google Glass are monocular systems with ­relatively small fields of view. In: IEEE ISWC. . The user can read and dismiss it with nudge of his head upward. J. 219–222. 7. However.). T. 1992. self-­contained HMD-based wearable computers can be relatively minimal devices that are comfortable to wear all day. S. Gibson. Now that the historical challenges are being addressed. Patel. Gilliland. A. X. Starner. and mobile input will ­continue for wearable computing for the foreseeable future. Alternatively. Luo. Garg. Lyons. 1977. cart-mounted display (CMD). Clawson. and Knowing. March 29. The ears are kept clear so that the user maintains normal. WA. Shaw. J. J. and T. if the user tilts his head up.. New York Times. Italy. Glass is also designed to interfere as little as possible with the user’s senses. Austria. A comparison of order picking assisted by head-up display (HUD). 71–78. S. pp. the screen lights show the text. 2005) to help keep the user focused in the physical world. electronics miniaturization. pp. REFERENCES Ashbrook. Not only is the display mounted high enough that it keeps both pupils unobstructed for full eye-contact while the user is conversing with another person. J. the user can say “OK Glass. D. Bhardwaj. S. Seattle. This design choice minimizes size and weight—in particular. but sound is rendered by a bone conduction transducer. such on-thego interactions should be around four seconds or less (Oulasvirta et al. A wearable manual called red. Xie. September 2009. NJ. However. Raghu. R. H. Ismail. F. Clawson. binaural hearing.. and paper pick list. 67–82. Comfort is more important than features when designing something intended to be worn for an extended period of time. pp. Komor. Baumann. M. September 2014. light. It will take ten years and many companies to capitalize on the potential. display.

. Lukowicz. Zucco. 2013. Want. and B. 107–114.. D. and A. 2014. ACM Transactions on Computer–Human Interaction (TOCHI) 7(3). Starner. I. http://www. S. Nomadic radio: Speech and audio interaction for c­ ontextual messaging in nomadic environments. G. Gain. Bloomberg Businessweek. 2010. An evaluation of video intelligibility for novice American sign language learners on a mobile device. S. Kuorelahti. June 29. Morgan & Where does the mouse go? An investigation into the placement of a body-attached touchpad mouse for wearable computers. Clawson. San Rafael. B. Brac-de-la-Perriere. H. B. 97–112. Activity recognition of assembly tasks using body-worn microphones and accelerometers. 2009. 1553–1567. Sawhney. C. 2010. 2005. October 2010. AR and VR headsets. Starner. and (Accessed March 17.. Siewiorek. CRC Press: Boca Raton. Starner.. 47–54. Gilliland. 1–35. pp. . Orlando. J. Grimmer-Somers. In: Proceedings of the SPIE 9202. Oulasvirta. and Harsh Environments V. The segmentation of the HMD market: Optics for smart glasses. Aerospace. K. and T. 86–88. FL. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 28(10). Linz. 2008. Jenkins. Hamilton. N. and T. p. In: IEEE ISWC. Austria. Heat dissipation in wearable computers aided by thermal coupling with the user. pp. pp. 2003.). K. Clinical implementation of a head-mounted display of patient vital signs. An introduction to ubiquitous computing. September 5. J. Special issue on Wearable Computers 4(1). and V. J. IEEE Pervasive Computing 2(4). ACM Journal on Mobile Networks and Applications (MONET). pp. Grimmer. 2006. Starner. Sanderson.. FL. smart eyewear. Synthesis Lecture Series Monograph. In: ACM CHI. In: Pervasive Computing. Commercial. 285–319.. Patel. Weaver. Powerful change part 1: Batteries and possible alternatives for the mobile market. Zeagler. 343–392. Interaction in 4-second bursts: The fragmented nature of attentional resources in mobile HCI. 1999. Ward. Application Design for Wearable Computing. Sager. S. Starner. Zucco. Starner. Oregon. Smailagic. J. Maguire. T. T. Portland.. 2002. Troester. Milanese. T. and E. 89–96. In: ACM ASSETS. and P. 353–383. 2000. CRC Press: Boca Raton. Roto. Location in ubiquitous computing. Cockburn.. pp. Lyons. T. Krumm (ed. Budd. pp. A. Varshavsky. 92020D. Switzerland. Profita. HCI Journal 21(4).. 63–70. Tamminen. 2012. Personal and Ubiquitous Computing 6. K. Don’t mind me touching my wrist: A case study of interacting with on-body technology in public. A. 2015). 3–13. Saeedi. P. Liu.). J. CA. In: IEEE ISWC. Austria. pp. Experimental evaluations of the twiddler one—Handed chording mobile keyboard. D. A. 919–928. J. In: IEEE ISWC. Starner. and Y. E. Linz. September 2009. Krumm (ed. and S. In: Pervasive Computing. Thomas. T. and H. CA.Wearable Computing 29 Kress. Zurich. the first smartphone. B. Thomas. and S. A comparison of menu ­configurations and pointing devices for use with wearable computers while mobile and stationary. J. 2006. Photonics Applications for Aviation. FL.. Before IPhone and Android came Simon. Do. and J. K. V. Schmandt. R. San Diego.


........2..................................3...........2..2.4........................ 36 3........................ 43 3....................................... 48 3................................. 37 3...........................4....4 Merger of the Body and Technology.2..................................... 37 3......... Questioning the Present..........2...... 39 3......3 Think about the System.. 33 3..2 Quantifying the Intended User.................................3.............46 3...... Regional Innovation............................. 45 3..............1 Personal Technologies.....2........5 Prototypes and Iterative Design......................................................... 43 3.................. and Georgina Voss CONTENTS 3............2....1 Garment as Anchor.................................1 Introduction.................... Despina Papadopoulos......................................... 43 3......................4........3 Tracking in the Factories....................................2.....................6 Experimenting with the Future.............................5 Conclusion: Synthesis and Synaptics....................3..34 3...2 Future Scenarios: Ethical and Speculative Implications of How Our Embodied Materiality Is Affected by Emerging Technologies........................ 36 3.........3.......................................8 Life As We Know It—The Qualified Self...................... Stars...........4......7 Coloring... 35 3.............. and Human-Centric Perception.. 39 3..4................. Power.................. 53 31 ................................................... 42 3..........................................4 Requirements and Specifications Are for Humans Too................................................................................2 Start with Value............ and Potential Intimacy and Extimacy of Wearable Technologies Patricia Flanagan......................2 Inversion of the Design Process......40 3. 52 References................ 32 3..................... 35 3....3 Bridging Materiality and Information...................1 Sperm..........3 Self and the Social Politic of Wearable Technologies............................................4 Synaptic Sculpture: Vibrant Materiality and the Interconnected Body.............................4 Bodies at Work..........3 Ethics............. 35 3.....................................

rather than design of objects or products.32 Fundamentals of Wearable Computers and Augmented Reality 3. and techno-futurist approaches. The first section proposes the role of the designer to be one that includes a cultural approach to designing future ­scenarios—one that considers ethical and speculative implications of how our embodied materiality is affected by emerging technologies. play. anthropology. What questions should we be asking to engage a more critical design perspective? This chapter extends traditional functionalist approaches to design to engage cultural.and nanolevels. home. The section describes the emergence of a new paradigm in terms of our augmented perspective—our perception of scale expanding our awareness and sensitivity across macro. we consider where control lies across these networks. fungability. The chapter is organized into three sections. What is the relationship of the self to the proliferating wearable technologies? How is our sense of self changing as new technologies mediate the space between our experience of self and the world? We develop a methodology that asks designers and technologists to build future scenarios and envision how our embodied materiality is affected by emerging technologies. We then investigate how technologies such as Google Glasses and Quantified Self applications inform our relationship to our self and redefine our social interactions. Using a philosophical framework we explore design and its implications on the relationship of the self to the self and to social relationships.1 INTRODUCTION The chapter is founded on the premise that current wearable technology design practices represent a reductionist view of human capacity. and the movement of gathered data from the body into wider dispersed networks of power. ethical. and mobile social networks in recent years has seen traditional human–computer interaction (HCI) design methodology broadened through the integration of other methodologies and knowledge from the humanities such as social science. This perspective promulgates the space of design to be in the interface as mediator of experience. The final section adopts a techno futurist approach proposing synaptic sculpture as a process for creative design that engages vibrant materiality and the interconnected body. Moving from the micro (technology/body) to the macro (systems of production). at what unit of analysis. and case studies are presented to illustrate and exemplify the ideas promulgated. and interconnectedness of things. experience-based. The field of HCI is inherently interdisciplinary and its history is one of the inevitable disciplinary multiculturalisms spawned by the expansive impact of technological growth. The second section discusses the self and the social politic of wearable technologies from macro to micro perspectives. These new spheres of awareness become our normative environment—ones with an amplified awareness of the instability. and what their impact could be on the wider world as they are dispersed. We propose the need to develop a connoisseur of somesthetic . and ethnography. Wearable technologies are therefore discussed in terms of their critical. The democratization of technology into work. Considering wider supply and production chains and regulatory systems whose existence shapes the production and meaning of wearables—both their material form and design. political. and speculative potential.

physicality. and the way we engage with the world and others. and to the downwardly mobile. the focus has been on a singular vision of what it means to be human. Streamlined clothes appealed to the still prosperous. This subverts the traditional fashion design methodology away from the trickle-down theory to one that can enhance a relationship between designer and user who can become coproducers and connects materiality to anthropology and the lived experience of the individual. cultural. presence. “Why we look the way we look now. . it is important that we also develop a methodology that asks designers and technologists to build future scenarios and envision how our embodied materiality is affected by emerging technologies. It also reflects ideological relationships not only to means of production—the industrial revolution after all ultimately presaged ready-to-wear and a democratization of access to fashion. Wearable environments are laden with symbolic.” (Cohen. but also to morality. anxious to hide their wealth.Intimacy and Extimacy 33 qualities surrounding the design of wearables. From the early MIT experiments of a cyborgian self (spearheaded by Steve Mann and Thad Starner) to today’s Google Glass and Quantified Self applications. Our clothing has always expressed our relationship to social structures and to the ways we perceive others and want to be perceived by them. In the past 20 years we have seen increased development in the realm of wearable technologies. When the zipper was introduced to male pants in 1901 critics at the time considered it a sign of moral decline. 3. and intimacy. high heels. and casual Fridays all exemplify our collective attitude toward capability. Deborah Cohen in an article for The Atlantic. corsets. 2014) writes: Look closely at the emergence of our modern style. and emotional meaning and therefore provide a unique space to investigate questions of physicality. This same framework can be used to investigate how technologies like Google Glass and Quantified Self applications inform our relationship to our self and redefine our social interactions. who hoped to conceal their slide. In many ways the wearable environment is that interface that connects. and at the same time creates a boundary with the world. Today. and you can see politics in the fabric seams.2 FUTURE SCENARIOS: ETHICAL AND SPECULATIVE IMPLICATIONS OF HOW OUR EMBODIED MATERIALITY IS AFFECTED BY EMERGING TECHNOLOGIES What is the relationship of the self to the proliferating wearable technologies? How is our sense-of-self changing as new technologies mediate the space between our experience of self and the world? As we create more and more wearable devices. Economic collapse and the search for social unity—the conditions that made the New Deal possible—created an unlikely alignment of tastes. as we are developing a new range of wearable devices it would be instructive to use a framework that explores design and its implications to the relationship of the self to the self and to social relationships. Similarly.

34 Fundamentals of Wearable Computers and Augmented Reality Part ­science fiction. designers experiment with fashion’s potential for self-expression resulting in a series of projects that has focused on the use of light. they mostly focus on aesthetic potentialities and no rigorous design methodology has emerged that can be applied to the development of future devices. The potential of technology to create new materialities has been eclipsed by virtuality. ethics asks us to consider what is a life worth living and how we create meaning for ourselves. part optimized efficiency. Central to the discourse of ethics are questions of capability. responsibility. physicality. we can ground human-to-computer interaction on principles of sustainability. 3. They become an extension of the human body and using the . As our uses of technology and the devices that surround us are now a defining part of our material culture. More fundamentally. Critical design and even ethics have an important role to play in reframing our uses of technology and in developing a methodology for design and the building of tangible. While these experimentations look more critically into the potential uses of technology as an agent of expression and of investigating the interactional possibilities of the wearable environment. What are the responsibilities of designers and technologists when dealing with the intersecting nodes and dwellings of existence? What are the future worlds we want to build and inhabit? Why are physicality and tangibility important and what has design offered in the thinking of our relationship with ourselves.2. incorporated. and being in the world. Countered to this approach. we need to critically consider what we want our culture to evolve toward and the ways in which these technologies will mediate the space between ourselves. others. and with our increasingly complex environments.1  Garment as Anchor As part of this process we should remember that wearables should first and foremost be wearable. wearable devices and interfaces. and our ­relationship to the social. experimentation with textile technologies. Considering foundational questions of ethics and human capabilities as well as looking at definitions of usefulness and its relationship to design. these visions tend to a reductionism of what creates meaning. others. and refocusing notions of connectivity and intimacy. and a mostly materialistic and reductionistic relationship to data and their meaning. and a design methodology centered around explorations of physicality. but often are not specified in an articulated manner. inevitably. awareness. and humanism. a causal (and casual) relationship to self and others. By starting with human-to-human interaction. mindfulness and notions of sociability can help us ensure more thoughtful and useful applications. understanding. As we build specifications matrixes we can include these fundamental questions to expand the horizon of what wearable devices can offer and how they create meaningful experiences. Can these questions be incorporated in a design and product development practice? In many ways they are already tacitly. and the world at large? By framing a discourse within experimentation with physical and computational materials we can overcome the duality and reductionism that has informed most of the current and future vision of technology and technological devices.

This statistic has been repeated often and calls into focus questions of value. approaching them through the spectrum of the human body should be our starting point.4 Requirements and Specifications Are for Humans Too Too often requirements and specifications account for the needs of the device and not the needs of the user. How do ­proposed functions and features extend possibility. Adopting a view of the system of interactions and mapping the nodes and points where devices and their features connect to various touchpoints in the system will provide designers and technologists with insights for richer interactions and help find opportunities for innovation and adoptability. both external and internal. These relationships. 3. Our relationship to materials. a little over a .2. support. clothing and how it has evolved to fashion spans thousands of years and we must take cues and inspiration from the process of making garments and the rituals of donning clothes. and quantified-self applications and devices deliver? Questions of value should ground each product development and clearly articulate the value provided. and institutional components of a system. social. and while we are accustomed to measuring value in financial terms. 3. discovery. in human terms. value and the way we formulate relationships are part of a larger ­system of interactions. Why is the drop-off rate so high? What is the actual and perceived value that wearable. Google Glass. and challenge current systems. almost always include our relationship to our self (self-reflection). physical. There is tremendous opportunity in developing requirements and specifications that articulate the value for users and how this value is created and touches their entire system of use and interactions. What’s more. Value has multiple dimensions. interactions.2. it is necessary to qualify value as a network of relationships and clearly map how these relationships. and to our relationship with the world (our urban and natural environment). 3. 2014). Working with users to identify the leverage points for change and growth and to reimagine systems that enable better flows of value consistently results in designs that allow users to imagine their own uses—therefore overcoming barriers to adoptability. and sociability? What are the intersecting areas of activity and interest that emerge? By analyzing current infrastructures and mapping the cultural. half of them no longer use it” (Endeavour. to those close and near to us (our intimate relationships and to the social at large). we might be able to better understand the interactions that create.2 Start with Value A recent paper from Endeavour Partners found that “one-third of American consumers who have owned a wearable product stopped using it within six months. while one in 10 American adults own some form of activity tracker.2.3 Think about the System In other words. serendipity. and exchanges evolve over time.Intimacy and Extimacy 35 conventions and techniques used in garments and accessories.

If you’re asked to turn your phone off. identity. and at work can yield insightful observations that can be translated into design decisions and reflected in specifications and requirements. comfort. Standing alone in the corner of a room staring at people while recording them through Glass is not going to win you any friends. by releasing its product at an early stage. a Glasshole). parties. Glass-out. Respect others and if they have questions about Glass don’t get snappy. and the way social cues can be built into interaction design. turn Glass off as well. Glass was built for short bursts of information and interactions that allow you to quickly get back to doing the other things you love. additional emphasis must be placed when developing wearable devices. The future is often an exaggeration of the present. Things like that are better done on bigger screens. 3. The relationship we have with our clothes and accessories touches deeply our sense of self. and speculative design can be used to highlight the ramifications of design decisions.2. a social etiquette of sorts. cafes. and functionality considerations and reveal the potential value and context of use for current devices. Google Glass. Be creepy or rude (aka. and expression. Questioning the Present Imagining the future and engaging in speculative design can help designers and technologists explore edge-case scenarios and draw out the possible implications of design and technology decisions. a quick demo can go a long way. If you find yourself staring off into the prism for long periods of time you’re probably looking pretty weird to the people around you. published a set of social guidelines. 3. While the importance of early prototypes and iterative design is well understood and embraced as part of a design methodology. features.5 Prototypes and Iterative Design In many ways Google Glass is a widely released early prototype. So don’t read War and Peace on Glass.36 Fundamentals of Wearable Computers and Augmented Reality year after it released a version of the device to developers. Could Google Glasses be designed in such a way as to make the list of do’s and don’ts obsolete? Experimenting with scenarios of use and observing users in the street. socialization. It will be interesting to see how the insights gathered from this initial release can be used to experiment with social cues and overcome some of the backlash that it has attracted as a product so far. has been able to generate a vigorous discourse on privacy. Be polite and explain what Glass does and remember. In places where cell phone cameras aren’t allowed. the same rules will apply to Glass.6 Experimenting with the Future.2. . Creating early prototypes that imagine how wearable devices conform to the body and its rituals reveals opportunities for value and avoids the ergonomic and social pitfalls that many wearable devices in the market have fallen into. The list includes advice such as: Ask for permission. Breaking the rules or being rude will not get businesses excited about Glass and will ruin it for other Explorers.

At the same time it asks us to consider how our wearable devices can provide us with a new vocabulary and range for expression and communication. She writes: “The more I tried to put it together. Reporter. Aydin. such as brain chip implants. Coloring is imagined as a skin interface for people who use brain chip implants to track and manage their mental health. The project also assumes that innovations in materials technology will introduce new possibilities for treatment.1). MFA Interaction Design (SVA NYC.2. I decided to create a book that compared the two states of my data during the process of death and how I felt. and Amy Wu as part of a Future Wearables class. Openpaths. Happiness Survey. “Life As We Know It—The Qualified Self” is Asli Aydin’s graduate thesis. fascinated by the quantified self movement decided to use a series of life-logging techniques to track herself through a very difficult time in her life following her father’s cancer diagnosis and through his death. in implantable technologies and speculated on future scenarios of use.8 Life As We Know It—The Qualified Self Another student project. transforming our understanding of mental health. Moodscope. 2014) students Matt Brigante.2. the less I felt like it connected to my experience. It communicates with the user’s brain chip to display a real-time ­visualization of their emotional state. Melody Quintana. This future scenario can help explore current opportunities and create a framework for inquiry and extend what is possible (Figure 3. Emotions are mapped to a 7000-color spectrum. 2014) graduate Interactive Telecommunications Program looks critically at the uses of quantified-self applications and devices. users are given the agency to self-medicate when appropriate. empowering people with a new language to understand their feelings. They can simply blend harmonizing colors into their Coloring to balance their mood Coloring (2014) The project took as a starting point the work of John Rogers.37 Intimacy and Extimacy 3. 3. Rather than having to use blunt and unpredictable prescription drugs. Sam Wander. The spectrum is richer and more precise than our verbal emotional vocabulary. The project asks questions such as the following: “Why do we collect data? Do data tell us something we don’t know about ourselves? Does it change our behavior?” Aydin set out to discover whether or not her data could tell the story of her experience. professor of ­materials science and engineering at the University of Illinois at Urbana-Champaign. The project assumes that by the year 2046 significant leaps in psychology and neuroscience research will have taken place.” Aysin used the following applications and devices to life-log her experience: Jawbone. developed at NYU’s (ITP.7 Coloring Coloring is a hypothetical consumer health product that is launched in the year 2046 and was developed by School of Visual Arts. After months of intense self-quantification Aysin concluded that the qualified self is far from the quantified and she realized that her journal entries . right in the palm of their hand.

our expression. More fine-tuned readings can be gleaned with the help of an interactive map. The self is fascinating . They shape who we are. provided the insight and reflection that eluded her devices and applications. top-level readings without referencing a chart. . Sam Wander. 1000 colors in each core emotional “family” FIGURE 3.that fascination cannot be quantified… Her experience and reflections can be used as direct input and create guidelines for developing wearable devices that aim to change behavior and provide insight into the human experience. and Amy Wu. my biggest takeaway throughout this journey has been to remember to track my soul first. The truth is simple and it is not embedded in a set of data that tells me how many steps I’ve taken. At the end of her thesis presentation she writes: Every time we experience these moments the self is shaped. our confidence.1  Coloring by Matt Brigante. While data can be useful with specific set goals. Melody Quintana. Happiness Surprise Disgust Anger Contempt Fear Sad 7 Families Discrete Emotion Theory These seven specific core emotions are biologically determined emotional responses whose expression and recognition is fundamentally the same for all individuals regardless of ethnic or cultural differences. They shape our expectations.38 Fundamentals of Wearable Computers and Augmented Reality Color emotion spectrum 7000 colors Hierarchical structure enables simple.

.3. form the grounding framework for creating wearable devices. connectedness. Which systems of production are employed to bring wearable technologies to mass markets? Where does control lie across these networks? By considering the sites of production and supply. and its biochemical impulses. To consider the ethics of this current generation of wearables—intimate. they can reflect on whether that new product will take people away from themselves. Regional Innovation Wearable computing devices are personal. to take care of their feelings. we should adopt a methodology that is grounded in humanistic and ethical principles and critically consider how we want to use these innovations to interact with our communities and ourselves. the systems that we inhabit and the relationships that we create in them are all fundamental in the creation of meaningful and useful wearable devices. We have adopted a far too reductionistic approach for too long and have been leading product development based on a mechanistic model of what it is to be human. the Betwine wristband (Imlab. We stand at the precipice of incredible innovation in materials. 3. FitBit. the rituals of dressing and undressing.1 Personal Technologies. its movements. for example. their family. particular. In discussing the goals of these workshops. the modern wave of wearables have moved from clunky early adopter prototypes and spread out into mainstream markets. the delight of a soft material against the human skin. We have the opportunity to create new models of expression. communication.Intimacy and Extimacy 39 Thich Nhat Hanh is a Buddhist monk who was invited by Google (Confino. By doing that. they are legion. and corporeal. they will feel good because they’re doing something good for society. Comfort. engagement. Yet these devices are not the bespoke ornamentations or cumbersome user-designed apparatus of previous decades (Mann. 2014) which allows distal users to gently nudge and race against each other. and communities—see. They offer intimate understandings of the body—its rhythms. computational and power technologies. to the bodies that produce them. sensors. they are mass-produced. Thich Nhat Hanh commented: When they create electronic devices. They are FuelBand.3 SELF AND THE SOCIAL POLITIC OF WEARABLE TECHNOLOGIES 3. They offer intimacies across larger systems. yet manifold—involves bringing together the bodies on which they sit. Instead they can create the kind of devices and software that can help them to go back to themselves. and nature. 2013) to give a series of workshops and provide inspiration to its developers and product managers on how to engage users and develop applications and devices that can yield the insights that evaded those devices and applications that Aydin used and possibly account for the drop-off rate of current wearables. Engaging with the totality of human experience and probing into what creates value. and reflection. we can interrogate how these systems shape the meaning of wearables. Glass. and in order to do so. 1997). networks.

2013). the area transformed into a hub for software and Internet service companies and plays host to some of the world’s largest technology companies. answering Western desire for ICT consumer goods (Bound et al. while the device retails for around U. 1999). there are expected to be 485 million annual device shipments. the technology falls into not only Western trends around commercialized self-improvement (Maguire. Seventeen time zones away. it cannot be improved. Many of these firms. all of which have to be manufactured somewhere. Market researchers predict that wearable computing devices will explode in popularity in coming years to the extent that they will become the norm (ABI. The numbers are enormous: by 2018. (2014) In doing so. rather than having a third party like an athletics company telling you how fit you should be and what’s the proper weight for you.S. China is one of the largest and most rapidly developing economies in the world. Amit. are now edging into the wearables market. regional patterns of innovation and industry remain embedded into the earth. 2010). its founder explained. and the politics of the data that they gather. it costs less than one-fifth of that to make (Electronica. Despite rhetoric of a shrinking world. 2008) but also trajectories laid down by the earlier quantimetric self-tracking movement. with estimates that 61% of the wearable technology market in 2013 was attributed to sports and activity trackers. The San Francisco Bay Area in Northern California is home to the Silicon Valley information technology cluster (Saxenian.3. FitBit was. the intent becomes material. Yet these devices are also designed for users in the global North. the configuration of their intended use. 1996): after an early history around microprocessors and semiconductors. expanding the industrial capacity of its high tech industries to act as the global economy’s world factory.. So we are on a quest to collect as many personal tools that will assist us in quantifiable measurement of ourselves. G. Kelley (2007) . including Google and Facebook. While the founder company is based in San Francisco. certain places are better at doing some things than others (Howells. We welcome tools that help us see and understand bodies and minds so that we can figure out what humans are here for. 2013). designed as a quiet and personal device: From early on we promoted a notion of a more introverted technology that is more about the connection between yourself and your goal. FitBit locates its manufacturing in China.40 Fundamentals of Wearable Computers and Augmented Reality both in the materiality of their design.2 Quantifying the Intended User Unless something can be measured. pulling together teams of designers and engineers to haul together the concept and intent around these devices. Many of the current generations of wearables are designed by people in the global North and made by people in the global South. FitBit is the market leader in the wearable activity band market. $100. 3.

a circuit board. We are already seeing wearable technology being used in the private sector with health insurance firms encouraging members to use wearable fitness devices to earn rewards for maintaining a healthier lifestyle. and who has the leisure time to engage in fitness activities. whether it’s connecting with third parties to provide more tailored and personalized services or working closer with healthcare institutions to get a better understanding of their patients.000 health and fitness activities daily. for example. ethical issues have emerged around privacy—the tipping point between sousveillance and surveillance. found that 63% of U.S. described it: The rich data created by wearable tech will drive the ‘human cloud’ of personal data… With this comes countless opportunities to tap into this data. consent. University of London.K. waiting to mop the data up). giving rise to tools that offered insight into the data found within their own bodies. These are the fears of the intended user. ultimately. that someone else may be using rational means to see and understand bodies and minds. For the users of wearable tech in the global North. the perfect persona who chooses to explore self-knowledge through the body. the MapMyFitness tool—compatible with devices such as the FitBit and Jawbone. in the wings. owns it (Ng. 16 million registered users who log over 200. and 71% of U. it has. respondents thought that wearable technology had improved their health and fitness. The business models around the market indicated where the true value of wearables lies: not in the plastic and electronics of the hardware devices themselves but also in the fog of data that they extracted from the human body. as of May 2014. with one in three willing to wear a monitor that shared personal data with a healthcare provider (Rackspace. 2013). Control. 2014). Bauer (2013) While the devices themselves are manufactured in their millions. a company sits quietly. By this framing. the codirector of CAST. These suspicions emerge from the primacy of the idea of control and choice: that the users who make the choice to use wearable tech as a way to figure out what humans are here for may unknowingly and unwittingly relinquish control of the data it generates.41 Intimacy and Extimacy The term quantified self emerged in 2007 to describe the way that people—­initially an elite group of Bay Area inhabitees. Questions have been raised about whether data can be sold on to third parties. and choice are keys: over half of CAST’s respondents felt that wearable technology . Research done by the Centre for Creative and Social Technology (CAST) at Goldsmiths. and who. a piece of rubber. numerous software apps have also crawled into the world to make sense of this data: see. and a software algorithm (while. whether it is securely stored. wearables became a way of reducing wider physical—and mental—healthcare systems of infrastructure down to the level of the individual: self-tracking as a form of self-care. As Chris Bauer. Participants in CAST’s research cited privacy concerns as the main barrier to adoption. reconfiguring the relationship that might otherwise be formed between a patient and a medical professional to that between a user. including editors of WIRED ­magazine— sought to find answers to cosmic questions (“Who are we? What does it mean to be human?”) through rational corporeal self-knowledge.

Adam Littler’s quote. such as Cogiscan’s “Tracking and Route Control. Yet unlimited work does not necessarily map onto quantified labor—indeed. we’re holding it. in which a worker’s series of motions around various tasks—bricklaying. Innovations in this space include Hitachi’s Business Microscope. monitoring is not an autonomous choice made with agency about enlightenment and self-knowledge. given earlier. 2014).. it is possibly its antithesis. Amazon picker (Littler. As Ana Coote notes. choice is abstracted and bodies are intended to be surveilled. Little.” which uses real-time information to track the physical location and quantities of all products on the factory floor. gathering data that can be turned into interaction-based organizational and network diagrams. in which the motions of the body are willingly recorded by a participant in the body’s activity. Here. were timed to ensure the most efficient way to perform a job. or rights (Frey and Osborne. Unsurprisingly. an undercover . however. Wearables in the workplace are becoming more prevalent: CAST reported that 18% of employees now wear some kind of device. was taken from a BBC documentary in the enormous warehouses of the online retailer. the bodies at work that are the most quantifiable are those engaged in routine manual labor—not the creative knowledge-intensive work done in the designing and prototyping of wearables by engineers and designers. we are robots. we plug our scanner in. but we might as well be plugging it into ourselves. The body is quantified—not for self-directed self-­improvement. Jawbone. The Principles of Scientific Management.3 Tracking in the Factories We are machines. 2014). Down across the supply chain. but without need for sleep. fuel. and in doing so “minimises unnecessary movements of employees” (Cogiscan. Adam Littler. moving pig iron. Yet there is a much longer heritage of using rational metrics to measure the activity of the human body.3. but as a means to wring maximum physical efficiency out of it for an outside body: the boss. Amazon. The British supermarket chain Tesco equipped its employees with data bands— and determined that it thus needed 18% less of those same ­workers (Wilson. and Polar. We don’t think for ourselves. In his work published in 1911. and that 6% of employers provide a wearable device for their workers. and “there is no end to what employers can demand” (Coote et al. A host of software solutions supports this surveillance of workplace bodies. only by outside agents. but repetitive replicable tasks that are only an inch away from being replaced by the automated machines who can mimic the actions of human bodies. 2013) The notion of the quantified self derives from a core concept of agency and sousveillance. body and rhythm data between employees. These included techniques such as time-and-motion studies. where people can increasingly work anywhere. which stock a range of consumer activity trackers including FitBit. 2013). 2013). 3. Frederick Taylor described how the productivity of the workforce could be improved by applying the scientific method to labor management. maybe they don’t trust us to think for ourselves as human beings.42 Fundamentals of Wearable Computers and Augmented Reality helped them feel more in control of their lives. a lanyard packed with sensors that recognize face. we live in an era of instant communication and mobile technologies with global reach. but an act placed onto individuals within the power dimensions of the workplace itself.

Wearables’ ability to interconnect changes our perspective and relationships with others. 3. 2012). imbuing them with capacities to extend our perception of ourselves and of the world and the way we live in it. To assist him—to track him—he was given a handset that told him what to collect but that also timed his motions. increasing the stress on their own bodies—Littler himself ended up running around the warehouse during his nightshifts. For the pickers.000 ft2 of storage.4 SYNAPTIC SCULPTURE: VIBRANT MATERIALITY AND THE INTERCONNECTED BODY As technology is rapidly evolving it is becoming invisible. the scanner beeped.4 Bodies at Work In their piece. the scanner observes. covering nearly eleven miles in a night. Wales. There is no incentive for introspective self-betterment and self-knowledge from this device. it tracks. tradition. embodied within the materials of everyday life. in the factories and the warehouses. took a job as a picker in the warehouse in Swansea. Yet down along the supply chain. 1998) face disciplinary action.Intimacy and Extimacy 43 reporter. and cultural function that are evolving in a mash-up with science and technology. where he collected orders from around the 800. but also “increase worker produc­tivity by reducing the time it takes pickers to find products in a vast distribution center” (Master.3. the scanners increase productivity by leading to the intensification of tasks.1 Sperm. A woman’s role in reproduction was simply a vessel to . Textiles have a heritage. the product of the labor is the dance done by the workers as they assemble the clunky white plastic device. Stars. and Human-Centric Perception It was once believed that human spermatozoa contained all the elements needed for human reproduction. 3. the same transformative power of digital hardware around wearable technology answers the question: the human life is capital: the bodies themselves only actions. 75 Watt (2013). Workers who miss the productivity targets set down and enforced by the technologies of power (McKinlay and Starkey. and. 3. a measure that could be tracked by sousveillance through a consumer wearable. and it punishes. the ability to focus more explicitly and explore intimately at nanoscopic levels combined with macroscopic and virtual perspectives opens possibilities of completely new experiences of being in the world. artists Cohen Van Balen collaborated with a choreographer and Chinese factory workers to create a piece which reverse engineers the values of a supply chain by creating a useless physical object. if he made a mistake. To the human eye it appeared that a tiny figure was visible in the head of the spermatozoon. The handsets were introduced by Amazon to provide analysis of their inventory. Seventy-five watts is the average output of energy a human expends in a day. on the path to asking questions about the meaning of human life.4. counting down the set number of seconds that he had to find and pick each item.

that of navigating space by looking down into the screen of a digital device. journals wrote articles in awe of this new science—it seemed that we had procured the magical ability to capture moments of life in factual documents of light on photo sensitive paper. and they enabled us to imagine ourselves as part of a greater whole and see the earth—rather than endless and boundless in natural resources—as a delicate intertwined ecosystem of which we are just a small part. . in line with technological advancement. figures. The images and mythologies that went along with them were used to aid memory and help recall the location of stars in the visible night sky. The shift in perspective that we are fast approaching involves both time and scale. We are witnessing the emergence of a new paradigm in terms of our augmented perspective—our perception of scale expanding our awareness and sensitivity across macro and ­nanospheres that we will learn to accommodate and ultimately will become our normative environment. With the invention of photo­ graphy. The experience is that of observing oneself from above. except as the mold on which form is shaped” (Holmes. matter as a visible object is of no great use any longer. The invention of the microscope revealed the process of spermatozoa fertilization with ovum and changed the role of women profoundly. The wearable technology that surrounds and permeates our bodies will mediate this experience and augment our senses. In fact. images looking back at the earth were projected into people’s living rooms via television. Wearable technologies may at first seem to disorient or give a feeling of estrangement but as we explore new ways to understand the world around us. This is long enough to evaluate the theoretical hype of the late 1990s surrounding the digital and virtual world of the Internet that hypothesized homogenization of culture and the divorce of information from materiality. Maps became factual documents plotting out the heavens above in ever increasing detail. With the invention of the telescope the cartographer’s job changed drastically.44 Fundamentals of Wearable Computers and Augmented Reality nurse the spermatozoon until it developed enough to enter the world. Human consciousness is altered as new technology enables us to see things differently. Floating eye is a wearable work by Hiroo Iwata performed at Ars Electronica in 2000 where a floating blimp suspended above the wearer’s body supports a camera. 1859). for example. Our comprehension of the world is mediated by technology and is dependent on our ability to adapt and make sense of the information the technologies provide. A zealous appeal made in the comments by Oliver Wendall Holmes in an article published in 1859 heralds “this greatest human triumph over earthly conditions. and from the inside they view a panoramic screen projecting what is being filmed. the divorce of form and substance. The head of the wearer is encased in a dome. This work predates a perspective that we are becoming accustomed to. normal vision is superseded and interaction with the environment estranged. The teleological and ocular-centric faith in technology has deep-seated historical roots. Early star charts depicted animals. guided by a plan view and prompted by Google maps or the like. we are profoundly changing the way we live and interact. when we landed on the moon in 1969. … What is to come of the stereoscope and the photograph…[]… Form is henceforth divorced from matter. and objects in the sky. Since the mid-1990s we have lived in environments supported by digitization.

to need-driven design. (that is. the familiar future. to visionary design driven by concepts and principles. In the world of contemporary art we have witnessed a transition where the locus of meaning that once lay within the object. as the dividing line between digital and material evaporates within our human-technogenesis. Combined with the storage capacity of super computers massive data sets about micro personal information are guiding future strategies for big business design. “controlling the natural world” around us through industrialization. at the same time connecting materiality to anthropology and the lived experience of the individual.4. and evolution). This holds true throughout design and fashion sectors. Self-mapping and tracking means that data that was once considered the domain of a third party specialist to interpret is available for self-reflection and immediate reconfiguration. and ultimately to concept-driven design takes the design process from one of the enabling technologies. 2013) and Sadie Plant’s The Future Looms: Weaving Women and Cybernetics (Plant. and contradictions under conditions of uncertainty and accelerated change in three future modes: the extended present. and then in the medium. Textiles manufacture and techniques helped conceptualize digital technologies—from the protocol logic of knitting to the matrix of heddle structures in weaving machines. to experience (Table 3. 2007). Strings of 0’s and 1’s in raw state in the computer have no sensory or cognitive effect without material formations that interface with our proprioceptors to make sense of the data. complexity.Intimacy and Extimacy 45 In retrospect. 1997) evidence strong connection between the material and the digital. then we learnt to harness the energy in production of materials. in other words the evolution has changed the focus from production. Our relationship with the world is evolving from one in which historically we were hunter gatherers using the products of the world. The inversion of the design process from technology-driven design. 3. Teshome Gabriel’s Notes on Weavin’ Digital: T(h)inkers at the Loom (Gabriel and Wagmister. The development of computing is deeply indebted to the development of materials technologies. focusing on users.2 Inversion of the Design Process We are witnessing a subversion of the traditional fashion design methodology away from the trickle-down theory to one that can enhance a relationship between designer and user who become coproducers. to service. now lies in the interface (Poissant. and now there is a need for us to imagine the future. 1996) and Zero’s + Ones: Digital Women and the New Technoculture (Plant. Otto Van Busch’s Zen and the Abstract Machine of Knitting (von Busch. to the dpi and pixilation of hand techniques such as crossstitch.1). The Centre for Postnormal Policy at the Hawaii Research Center for Futures Studies forecasts a postnormal condition of chaos. to “design and craft our own world. tasks. The interface inherently involves a coupling between computer-mediated rendering of data and human response. to applications-driven design. 2014).” . 1997) explores traditional non-Western weaving in this light. the Internet has ultimately refocused our attention on matter. and the un-thought future (Sweeney.

3 Bridging Materiality and Information In the history of wearable computing the predominant focus has been on ocular-­ centric ways of knowledge transfer (Margetts. transmitters. 2005). 2003). historically the emphasis has been placed on vision as the means of input and output and this legacy has informed our perception of wearables. touch. and smell. Wearables utilize these 11 parameters as new media to sculpt experience. discourse needs to change the language around design away from functional attributes and technical capacities to develop a connoisseurship of somesthetic qualities (Schiphorst. The line between personal and global data will blur. discussed in Howes. and regulates temperature but also is rapidly becoming the substrate to embed sensors. Primary research funding is still spent in this area. 1994).1 Authors and Concepts That Point to the Growing Prominence of Experience/ Interaction/Interface Design (Including HCI) Author John Sweeney (2014) Louise Poissant (2007) Jannis Angelis and Edson Pinheiro de Lima (2011) Neil Gershenfeld (2011) Ishii Hiroshi (1997. 2012) PSFK (2014) Past Present Future The extended present governed by trends and weak signals Material object Focus on production The familiar future governed by images of the future(s) The medium Focus on service The un-thought future governed by design and experience The interface Focus on experience Computers controlling tools Technology-driven design Connected intimacy Machines making machines Need-driven design Building with materials containing codes Concept-driven design Tailored ecosystems Co-evolved possibilities 3. the classic example being Google Glass. sound. What we wear will record bodily data and exchange information with the environment.4. in an analysis of the senses David Howes cites both Marx’s doctrine and etymology when he proposes that “Late capitalism is much more than a ‘civilization of the image’ and it cannot be theorized adequately without account being taken of its increasingly multisensory materiality. taste. sensory stimulation will be felt at macro and micro spheres of human–computer engagement and interpersonal communication. In order to achieve this. and integrators (Pold. The difficulty here stems from the sensory bias intrinsic to the very notion of ‘theorization’: theory comes from the Greek theorein meaning ‘to gaze upon’.” It is high time for all of the senses (not solely vision) to become “directly in their practice theoreticians” (Quotes from Marx. . protects us.46 Fundamentals of Wearable Computers and Augmented Reality TABLE 3. The focus of wearable technology that concerns itself with experience views technology as the mediator rather than the end product. What we wear not only expresses our identity. actuators. Challenging the dominance of vision. recorders. diffusers. 2011). Although augmentation of other senses is being explored. These six elements expand and augment the body’s five senses of sight.

for example. plasma. Ben Underwood is able to echolocate and visualize spaces around him even though he is blind. radio. They can be analogue or digital. or hydraulic. and through numeric coding in digital formats. He is a painter and produces artworks based on music. The body’s sensual capacities can adapt and accommodate new experiences and wearables provide a platform for experimentation. or smart textiles. in order to produce activities such as movement. was shown to be stimulated (McCaffrey. pneumatic. It is well known that people without or with minimal function of certain senses. space. they can appear to embody autonomy in reaction to changes in conditions. become more acute in the function of others. medicine. as they have evolved so has the way we live in the world. Actuators can have different mechanisms such as electric. stress. light. or a buzzer. ultrasound detectors.Intimacy and Extimacy 47 Sensors perceive data in the environment. Diffusers are attachments for broadening the spread of a signal into a more even and regulated flow. or sound. An example is a wearable devise that explores sensory dissonance—Bamboo Whisper translates language from one wearer into percussive sounds and vibration felt by the wearer of a second device (Figure 3. X-bee. When combined with materials such as shape memory alloys. 2014). the mash-up between biology. He has legally registered as a cyborg and wears a permanent head mounted computer that enables him to hear color by converting light waves to sound waves. the part of the brain that normally deals with visuals. This has been as profound as the effect of the lens on our visual perspective from micro to macro understandings of the world. and augment memory. photovoltaic sheets. stretch sensors. For example. force. heat. Recorders take samples of reality or traces of activity and collect them.2).. they offer potential to reconsider time. and interaction. tissue engineering. or thermal imaging. etc. it enables exploration of the traditional boundaries that govern human perception. Translation of data from one type of input expressed through another form of output is a natural function within the fungible realm of data. Tests showed that when he performed echolocation his calcimine cortex. they can be based on the detection of different parameters such as light. a fan. Transmitters nullify distance. Internet. Neil Harbisson is an artist who has achromatopsy. film or photo-paper. 2012). and data gloves. . His senses have been augmented and his body adapted to the expanded somesthetic so that he perceives more than the natural human visible spectrum to include infrared and ultra violet (Harbisson. They are interfaces ranging from the telegraph to television. by producing oral clicking noises. facsimile. Integrators involve the integration of technologies into living organisms. thermochromic inks. Recordings can be transformed and altered. for example. meaning he cannot see colors. devices that spread the light from a source evenly across a screen. nanotechnology. and artificial life. They could be in the form of an electrostatic membrane or a projection screen such as LCD. in a­ nalogue formats by fixing them onto substrates like tape. humidity. a light. and noise and come in many forms such as microphones. movement.

4. 2012. The person as computer embodies new forms of intuitive computer control. breathing or singing to nurse babies. 2014). the sensual capacities to communicate emotion and understanding between bodies could reveal predemic universal Ursprache.4 Merger of the Body and Technology PSFK predicts a future of coevolved possibilities where “technologies are evolving alongside human behaviors to augment. through the use of the eleven parameters described earlier.) By exploring voice in terms of vibration. This methodology represents an alternative to design processes that use design to answer preconceived questions generated from historical or market generated data and formulated as problems to be solved. creating an increasingly connected relationship between people and their devices” (PSFK. Tricia Flanagan.2  Bamboo Whisper. is an attempt to address somesthetic issues as primary to design development where the technology itself does not govern but is a tool in the design of human experience. Steve Mann calls this Humanistic Intelligence (Mann. and Raune Frankjaer. replicate or react to natural abilities and inputs. for example.48 Fundamentals of Wearable Computers and Augmented Reality FIGURE 3. Bamboo Whisper extends earlier experiments by Dadaist Hugo Ball with his nonlexical ­phonetic poems and what the Russian futurist poets Khlebnikov and Krucënykh’s termed Zaoum (Watts. and stripping away semiotic analysis of language. Adopting a language that enables effective design of emotional experiences and fosters a connoisseurship of the interface. Parents intuitively adopt techniques of rhythm. What happens when body rhythms overlay one another? Does one regulate the activity of the other? Do the bodies adapt to the exchange and find new rhythms and harmonies in unison? We experience similar effects when dancing or sleeping close together. 1988). (Photo: © Tricia Flanagan. Affirmative design practices such as the latter are limited in their capacity and do not support design mavericks. . 3. In this light. Experimentation in wearable technology can develop and adapt similar strategies in order to generate fundamental questions and employ them in speculative design.

and then the rights of animals. 2006) as the point in the future when the capacity and calculation speeds of computers equal that of human neural activity. and our understanding of how the mind works enables us to replicate its function. Tricia Flanagan. following the attainment of universal human rights. By wearing electroplated false eyelashes and conductive eyeliner. Kurzweil promulgates artificial intellects superior to human ones. © Tricia Flanagan. The separation between human and machine intelligence traditionally lies in the human realm of emotions thought of as metaphysical. and artilects could attain rights in a similar manner (Sudia. Nonhuman entities are already represented in our juridical systems. Ray Kurzweil proposed singularity (Kurzweil. Recent scientific discovery has given us insight into emotions such as retaliation. landscapes. 2012. and trees (Dator. 1998. in the form of corporations. by amplifying these everyday gestures.3). 2008).Intimacy and Extimacy 49 2001. Blinklifier leverages the expressive capacity of the body (Figure 3.) . a wearable device that uses eye gestures to communicate with an onboard computer. (Photo: Dicky Ma. empathy.3  Blinklifier. Anticipating the merger of the body and technology. which poses the question: In the future will we be outsmarted by our smart-clothes? Artificial intellects known as artilects will conceivably have rights. and Katia Vega. 2008). and love that can now be understood within the frame of scientific knowledge. Flanagan and Vega’s research into Humanistic Intelligence produced Blinklifier. 2012). The body’s natural gestures are augmented and amplified into a head mounted light array (Flanagan and Vega. bio-data from blinking communicates directly with the processor without engaging cognitive action. 2001). FIGURE 3. We innately understand and interpret information from people’s eye gestures.

. Can wearables. 2007). our bodies produce mirror neurons in reaction to their behaviors in a similar way that we do to human entities (Gazzola et al. 2009). Quantum physics proposes that to understand the mind. from his website or a mobile telephone. In a future iteration of the project. For example. we must look outside the body. Mirror neurons have been discovered in the brain. We tend to anthropomorphize robots. A microphone was then embedded in the prosthetic enabling visitors to Stelarc’s website to listen to whatever his third ear hears. an approach that tackles ethical and sustainability issues. or if he opens his mouth someone else’s voice could speak from within it. Future textiles will be designed with highly engineered specifications—like skin—combining areas that are thicker. and consider the interconnected nature of everything as porous. when someone falls over and hurts himself or herself. At SymbioticA lab. Empathy could therefore be described as physiological rather than a purely emotional condition. the heart rate. so that people can speak to him through transmitters. Biologists are endeavoring to interpret emotional states into biological chains of events. he plans to implant a speaker into his mouth. Textiles of the future merge science and technology. 2009). Nobel laureate Alex Carrel headed the first tissue culture laboratory exploring one of the most complexes of all materials— the skin. 1989) assumptions that by mechanistic analysis of the materials of the body. Stelarc’s Ear on Arm (2006–ongoing) was cultured in the SymbioticA lab. Dopamine is a neurotransmitter.50 Fundamentals of Wearable Computers and Augmented Reality Lower levels of the neurotransmitter serotonin may affect your ability to keep calm when you think someone is treating you badly and promote your tendency to retaliate (Crockett. 2008). which activates the same circuitry that drugs like nicotine. between human but also nonhuman entities. Tests indicate that higher levels of oxytocin in females and vasopressin in males may foster trust and pair bonding at a quicker rate. the sender hugs their own body and a recipient body feels the experience. . Brain activity in dopamine related areas of the brain are active when mothers look at photos of their offspring or people look at photographs of their lovers. pose fundamental problems to Ray Kurzweil and Norbert Wiener’s (Wiener. designed to actuate physical haptic stimulus on another induce chemical emotional effect? What are the potential implications for health. Put simply. and skin temperature of the hugger and sends this data via Bluetooth to a recipient whose corresponding Hug Shirt actuators provide a simulated hug. you may instinctively say “ouch” and actually produce small amounts of the same chemical reaction in your body as if it happened to you. Aaron Catts has been growing cultured skins from enzymes to produce kill-free leather. we will ultimately understand and replicate them. for example. or ridged and that have the ability to adapt to the task or the environment. thinner. medicine. and heroine do to produce euphoria and addition. more flexible. Can the experience of digitally mediated touch produce physiological chemistry in the recipient? Cute Circuit’s Hug Shirt senses the pressure and length of a hug. cocaine. The ear was grown from tissue culture around a frame and then sutured to Stelarc’s forearm. and he will hear the sounds inside his head. and well-being? The interconnected networks that mirror neurons imply. which produce the same chemical reaction in your body when you witness an experience happening to another as are being produced by the body you are watching (Keysers. Love therefore can be described as “an emergent property of a cocktail of ancient neuropeptides and neurotransmitters” (Young.

a wetland environment. to better digest oil spills or absorb heavy metals. From the inside out. 2013). drift. sedimentation. greed. “Digestion becomes a way of figuring the landscape and encountering animals. envy.Intimacy and Extimacy 51 Through the convergence of biological engineering and nanotechnology. 2014). sloth. The swallowable Peptomics. and fungi. Their project has recreated the seven deadly sins of wrath. 2014). Rössle. erosion. liquids that thicken when they come into contact with the body and form a second skin. Tilbury’s research promulgates notions of garments formed from gases and nanoelectronic particles that automatically assemble on the body. convert the alphabetic language used to identify protein code into new configurations of molecular word modeling (Blank and Rössle. and environmental systems. Lindsay Kelley’s project Digesting Wetlands considers the space of the body as a micro biome. Digital ecosystems can be viewed in a similar fluid manner where keywords for future design describe events like calcification. with molecular gastronomy techniques providing metaphoric and literal frameworks for imagining how bodies and landscapes interrelate” (Kelley. and the stereoscope are all lens-based technologies. bacteria. Tonita Abeyta’s Sensate (Lupton and Tobias. One example born from this technology is a textile substrate that is self-cleaning that was developed by mimicking the cell structure of the lotus leaf. we are capable of 3D printing organs to replace and create new body parts that will integrate with fashion and technology affecting social and political agency of the future. pride. Nancy Tilbury’s speculative designs explore new definitions of designs without cloth or conventional fabrication methods (Quinn. or textural surface of the skin. and gluttony into new three dimensional chains of amino acid monomers linked by peptide bonds. where scanning the surface with AFM produces haptic vibrations that are translated by computer programs into visual images. and an interconnected ecosystem. From this approach bodies are viewed as floating islands where metaphors of atmospheric shift. the borders between inside and outside the body become permeable and fluid. pattern. swarm behaviors etc. The most dramatic change in our perception is occurring as we incorporate nanoscale environments. For example. Health projects of the future such as Coloring. At a micro scale. plants. . the microscope. the telescope. pictured earlier in the chapter. complete with their specific biological functions. Kelley designs molecular gastronomy events and edible objects as forms of environmentalist interventionism. future clothing and accessories simply could grow from our bodies. combine with biotech approaches like Peptomics provide examples of the merger of electronic and chemical synapses to potentially create consumer mood management products that may be closer to realization than we think. developed by Johannes Blank and Shaila C. Atomic force microscopy (AFM) enables optical imaging of the world at the nanolevel. and blooming open up productive spaces for intervention where fauna and flora from in and outside our bodies interact with changing populations of viruses. and encapsulated into word pills. lust. and surfaces that emerge from inside the body induced by swallowable technologies in the form of tablets or nanoprobes that create changes in the color. 2002) collection takes the tools of sexual hygiene and makes them into fashionable intimate apparel made from latex with built in male/female condoms. Interestingly as we have traced the changing perception of the world aided by new optical apparatus.

There is a need to establish a connoisseurship in the somesthetic design of wearable interfaces—be they physical or digital. Haptic—of or relating to the sense of touch.4. rather than design of objects or products. an idea that has been around for some time and has been explored in the work of Spinoza. 2011). Nanoperspectives reveal a world with completely different parameters and support a reconsideration of vitalism fundamental to string theory and quantum physics that question our current understanding of materiality. theorists are describing a vivacious new landscape: the Internet of things (Ashton. tangible bits and radical atoms (Ishii et al. air. 3. 2011). It is a specialized junction where transmission of information takes place through electronic or chemical signals. code. 2007). Traditional consumer-based capital structures are undermined from the bottom up (Howes. It is a neologism “the combination of three words: haptic. where buildup accumulates and changes the environment.5 Conclusion: Synthesis and Synaptics The current way we view HCI is predominantly mediated through a screen and keyboard. This perspective further promulgates the space of design to be in the interface as mediator of experience. a communication device for transmitting information. 2005) by digital ecology layering like calcification. is gaining recognition. and water are seen as relevant and related” (Thomas. 2010). . in particular relating to the perception and manipulation of objects using the senses of touch and proprioception. Adorno. viruses. Linking computing directly to the body—biology and technology goes way beyond consideration of semiotic analysis. 2009). Synaptic sculpture is an approach that views materials (biological or electronic) in terms of their potential as actants and bodies and things as agentic. Synaptic—of or relating to a synapse or synapses between nerve cells. Deluze. and Dreiesch.52 Fundamentals of Wearable Computers and Augmented Reality Macro perspectives gained through global information networks. incorporating the body’s full potential of senses. The importance of haptic engagement within communication. Across contemporary literature. vibrant matter (Bennett. cloud computing. and super computers that allow access to information instantaneously have enabled us to envisage an interconnected worldview. Thoreau. Materials and thinking are recombining and agency is paramount in our understanding of both human and nonhuman entities. fungability. it is not simply just a process of signs as Barthes (1973) would have it—it is a process involving all the senses. Bergson. a universe of bits (Poissant. synaptic and súnopsis.. 2012). Traditionally used in biology it lends itself well for use in relation to biotechnology and the hybrid spaces emerging from biological (chemical) and electronic (data flows) worlds. and interconnectedness of things is ­emerging as we acknowledge the vibracity of the world at a molecular level. all of which call for a transvital approach “where all matter–energy. and synaptic sculpture (Flanagan. a postvitalist point of view (Doyle. The term was introduced at the end of the nineteenth century by the British neurophysiologist Charles Sherrington. 2003). Simultaneously. Darwin. and Súnopsis—an ancient Greek word meaning sun—to combine or form plus ópsis to view” (Flanagan. It is a relational space at a micro level. Nietzsche. 2013). What is described earlier is indebted to the notion of vitalism. an amplified awareness of the instability.

peptomics. https://www. That ‘Internet of Things’ thing. Durham. Pera Museum. G. borders between public and private. 2014. See also http://www. http://www. as well as that of neural networks and mental representation. 2009. apparel and artifacts are created with molecular aesthetics. Lines of control. J. If you can print your own­ abiresearch. and S. Paper presented at the Third International Conference on Transdisciplinary Imaging at the Intersection of Art. Angelis.Intimacy and Extimacy 53 Bio-data and big-data combine to produce unprecedented detail of personal information enabling the tailoring of design to personal desires. while at the other end of the spectrum. J. human life is subsumed as a widget in the production line. G. Wearable technology that ignores emotional needs is a ‘Major Error’. 2014. 2009.: Granada. Abstract accessed August 30. Barthes. will exceed 485 ­million annual shipments by 2018. like Apple’s iWatch. 2011. Vibrant Matter: A Political Ecology of Things. 2014. http://www. Parry.K. and E. White Paper. J. 2010. The human cloud: Wearable technology from novelty to production. Bauer quoted in A humanistic intelligence approach to wearable technologies considers a seamless integration. 2014. Bennett. Macintyre. pp. June 26–28. 83–84. U. Rössle.C.rfidjournal. London. Ashton. Accessed May 20. they are synaptic sculptures where experience becomes a material to be molded and shaped in the design of interaction. R. de Lima. Peptomics—Molecular word modeling. REFERENCES ABI Research. 2013. Dezeen. M. The biosphere and the data-sphere become one through which sustainable design solutions will emerge. San Antonio. Design of future wearable tech. you take control away from legislation in terms of birth control or make DIY medical decisions because you have access to specific data sets that in the past were left to teams of experts—the questions who will control society and how governance systems will function in these new terrains remain unanswered. In: Service Design and Delivery. and J. extending the reach of the systems of the body into body coverings and into the world beyond. Science and Culture—Cloud and Molecular Aesthetics. There is a growing awareness of the porosity of the world and fungability of materials. Angelis (eds.­ cloudandmolecularaesthetics/peptomics/. The field of wearable technology explores the function of the mechanistic. . Shifting from production to service to experience-based operations.P. 2014. TX: Rackspace. Accessed May Istanbul. ABIresearch. At the nanolevel it is revealed that everything we thought was fixed and stable is chaotic and in motion. Accessed June 14. Vibrant materials will be crafted into bespoke manifestations of experience—apparel as extensions of natural systems. New York: Springer. June 22. Blank. Amit. 2014. 1973. An awareness of interconnectedness will promulgate designers to create works responsibly and tackle research problems by proposing creative solutions. Wearable computing devices.dezeen. NC: Duke University Press. RFID Journal. http://ocradst. are all to be renegotiated. The peripheral borders where physical atoms meet digital bits are fertile new spaces for design. 2010 (1957). Mythologies.

1999. On the rights and rites of humans and Originally published in The Atlantic Monthly. U. Wetwares: Experiments in Postvital Living. Social Identities 3(3): 333–344. Teardown: Fitbit flex. Keysers. Paper presented at the 10th Asia Pacific Conference on Computer Human Interaction (APCHI 2012). 1815–1900: An Anthology of Changing Ideas. Adams. Neuroimage 35(4): 1674– P. Electronics360. Wilsdon. p. MA. 2007. Electronics360. 2014. J. http://www. and C. 2014. http://www. Coloring. pp. Regional systems of innovation. Dator. 2014. TED Global. 2011. 2012. London. See also. Nesta. Gazzola. June 1859): 738–748. February.triciaflanagan. 21 Hours: Why a Shorter Working Week Can Help us All to Flourish in the 21st Century. O. 1997. London. J. http://www. Harrison. Wood. and F. and J. The stereoscope and the stereograph. and J. G. 2008. 2012. Paper presented at the International Conference for the Integration of Science and Technology into Society. http://www. Crockett. C. . 2014. Osborne. June 12. 3 (Boston. D. Nature 453: 827. 2014. M. http://www. Accessed August 30. Japan.). 2008. Accessed May 20. The ethics of collaboration in sunaptic sculpture. September 5. 2013. Cohen.html. 10. Vol. I listen to­sustainable-​ business/global-technology-ceos-wisdom-zen-master-thich-nhat-hanh​.net/white-papers/. Gershenfeld. The making revolution. http:// endeavourpartners.K. C. https://www. Accessed May 14.theguardian.theatlantic. 2013. Cambridge. Flanagan. Flanagan. 2013. 2003. 2014. Gaiger (eds. In: Art in Theory. Charny (ed. 24. pp. and the Crafts Council. Wicker. The Atlantic Magazine. Minneapolis. MA: Blackwell. April 16. and J.K. and K. Malden. Howells. Holmes. nesta. P. Coote. Harbisson. J.H. Accessed August 30. 2013. 2014. and J. Simms. J. Franklin. 2014. 2014. ted. U. Notes on weavin’ digital: T(h)inkers at the loom. Inside publications/ai…/RitesRightsRobots2008. Accessed August 30. Saunders. Michie (eds. http://pipa. Blinklifier: The power of feedback loops for amplifying expressions through bodily worn objects. 668–672. July 14–17. In: Innovation Policy in a Global Economy D. 2012. Matsue. MN: University of Minnesota Press.​waywe-look-now/359803/​. Gabriel. K. U. Ctr+P Journal of Contemporary Art 14: 37–50.54 Fundamentals of Wearable Computers and Augmented Reality Bound. D. 56–65. 1859.K. 2008. China’s absorptive state: Innovation and research in Korea. Psychology: Not fair. January 2014. Wagmister. Howells. T. Cogiscan. Accessed August 30. N. In: Power of Making: The Importance of Being Skilled. 2014. Accessed August 30. Archibugi.). 2014. Google seeks out Wisdom of Zen Master Thich Nhat Hanh.W. The anthropomorphic brain: The mirror neuron system responds to human and robotic actions.hawaii. http://interactiondesign. WIP tracking and route control. Cambridge University Press. Daejeon.: V&A Pub. London. Why we look the way we look now. www. Endeavour Partners.: New Economics Foundation.pdf. N. P.futures. The Guardian. 2008.: OMS Working Paper. Accessed August 30. R. Cogiscan. and M. V. A. Frey.K. U.. http:// electronics360..). J. 2013. A. Accessed November 25. 2014.. The Future of Employment: How Susceptible Are Jobs to Computerisation? Oxford. Accessed May 20.cogiscan.sva. Doyle. T.

Ng.). pp. 2001.K.nyu. Kelley. HYPERESTHESIA. Howes (ed. Tobias. Howes (ed. New York: Viking Press. Vol. Proceedings of IEEE 86(11): 2123–2151. http://www. 2014.-B. ITP Tisch. 1998. Labrune. Wearable computing: A first step toward personal imaging. Barcode scanners used by amazon to manage distribution centre operations.K. D. E. Action not words. 2014. Accessed June 29. 2005. Mann. Paper presented at the Third International Conference on Transdisciplinary Imaging at the Intersection of Art. pp. 1994. 245–250. In: Clicking in: Hot Links to a Digital Culture. New York: Princeton Architectural Press. 2014. 2014. Singularity: Ubiquity interviews Ray Kurzweil. 2014. Substance + Design. Toward Transformable Materials. D. . 38–47. McKinlay. Imlab. Howes.: Berg. 2014. Seattle. In: Empire of the Senses: The Sensual Culture Reader Sensory Formations Series. Lakatos. Oxford. S. and K. D. The future looms: Weaving women and cybernetics. H. The passage from material to interface. http://quantifiedself. The Age of Spiritual Machines: When Computers Exceed Human Intelligence. S. Kurzweil. 281–303. Computer 30(2): 25–32.: Berg. bits and atoms. Produced by Michael Price. In: Empire of the Senses: The Sensual Culture Reader Sensory Formations Series. New York: ACM. Mirrow neurons—Are we ethical by nature. 1996. WA: Bay Press. Maguire. Grau (ed. RFgen. Tangible bits: Towards seamless interfaces between people. Accessed May 20. U. Leisure Studies 27: 59–75. McCaffrey. D. C. Cambridge. Brockman (ed. and Kurzweil. Accessed May 20. Turkey. U. Digesting wetlands. 2008. Extraordinary people: The boy who sees without eyes. M. 229–251. Mann. https://www. Starkey. 2014. E. Abstract accessed August 30. ocradst. Ishii. MA: MIT Press. 19. 74–75. pp. IEEE Intelligent Systems 16(3): 10–15. M. pp. In: Media Art Histories. Accessed August 30.). http://blog. K. C. 1997. L. Radical Atoms: Beyond Tangible Bits. In: What’s Next?: Dispatches on the Future of Science: Original Essays from a New Generation of Scientists. O. 2009. Leisure and the obligation of self-work: An examination of the fitness field. 2013. 1997.C. 1998. http://www. http://itp.).: BBC. the sensual logic of late Mann. What is the quantified self? Quantifiedself. J. Littler. Zeros and Ones: Digital Women and the New Technoculture.K. January 1. 123–135. Hershman-Leeson (ed. New York: Vintage Books. N. Amazon: The Truth Behind the Click. H. L. Ishii. R. 36. S. Margetts. Plant.: Sage. 2014. January. Five privacy concerns about wearable Barcode-scanners-used-by-Amazon-to-manage-distribution-center-operations.S. L. and J. Lupton. A. New York: Routledge. U.rfgen. R. Kelley. Skin: Surface. D. Pera Museum.Intimacy and Extimacy 55 Howes. 2003. Poissant. 2014. D.). Wearable computing: Toward humanistic intelligence.K.. Istanbul. In: The Cultural Turn: Scene-Setting Essays on Contemporary Cultural History. U. Management and Organization Theory: From Panopticon to Technologies of Self. 2006. S. Aestheticization takes command. C. 2014. 2012. 2006. Bonanni. pp. Accessed May 20. 1997.).).com/2007/10/what-is-the-quantifiable-self/. pp. Keysers. M. and J. Ullmer. Foucault. Chaney (ed. Master. Science and Culture—Cloud and Molecular Aesthetic. Accessed May 20. 2007. London. Humanistic intelligence/humanistic computing: ‘Wearcomp’ as a new framework for intelligent signal processing. 2008. 2007. New York: Doubleday. Plant. L. S. 2012. Ubiquity.

NYC. 2011.2. 2015. Harvard Business Review Magazine. Saxenian. 2014. S. Textile 11(1): 6–19. 2013. Graduate Institute of Futures Studies. A. IL: Intellect.: Laurence King Publishing. S. Introducing coloring. Von Busch. http://vimeo. http://­ interactiondesign.L. 2001. MA: Harvard University Press. White Accessed January 8.C. 2015. Vol. http://www.­ academia. 2009. 2014. Zen and the abstract machine of toc/pmc15. J. 2014. Quinn. B. San Antonio. The future of wearable tech.K. pp. 11. 2013. Nature 457: 148. In: “Event” Arts and Art Events. 145–160. Foster (ed. Vol. pp. Rackspace. 1988. U. L. Love: Neuroscience reveals all. N. Thomas. 2014. edu/7084893/The_Three_Tomorrows_A_Method_for_Postnormal_Times. The human cloud: Wearable technology from novelty to production. Tamkang University. MI: UMI Research Press. NC: Rackspace. Accessed February 20. T. S. Self-evidence: Applying somatic connoisseurship to experience design. Accessed August 30. The dada event: From trans substantiation to bones and barking.sva. Wilson. Postmodern Culture 15(2). 2014. O. A jurisprudence of artilects: Blueprint for a s­ynthetic citizen.K. 119–131. Accessed January 12. Wiener.: Free Association. http://muse. September. Wearables in the workplace. London. Nanoart: The Immateriality of​ 81510205. 1989. 2013. 2005. 2014. https://hbr. Cambridge. Schiphorst. https://www. U. PSFK Labs.56 Fundamentals of Wearable Computers and Augmented Reality Pold.kurzweilai.J. Interaction MFA interaction design. Chicago. 2013. J. http://www. Accessed February 28. wearables-in-the-workplace. .W. Sudia. Regional Advantage: Culture and Competition in Silicon Valley and Route 128. 76–81. Interface realisms: The interface as aesthetic form. net/PSFK/psfk-future-of-wearable-technology-report: PSFK.html. pp. 1996.A. P. 2013. 2014. In: CHI ’11 Extended Abstracts on Human Factors in Computing​.jhu. Accessed August 12. Ann Arbor. F. Hawaii. SVA. H. Wander.lib-ezproxy. The Human Use of Human Beings: Cybernetics and Society. Accessed August 30. Watts.slideshare. Artifacts from the three tomorrows.). Young. Textile Visionaries Innovation and Sustainability in Textiles Design.

Section II The Technology .


........................................................................................7..........................................80 4..6............................................5............................................ 77 4....................... 63 4........ 62 4.....5................................................................7 Human Perceptual Issues............................................................1 Depth Perception...................7......................2 Field of View...................................................60 4..................................64 4................. 81 59 .................... 73 4................3 Occlusion.............5.........10 Sensing...........3 Eye-Relief..................................................................................................................1 Introduction...........5 Other Optical Design.................................. 77 4.................................................................... 79 4......... 61 4........ 70 4..6............................................ 78 4......7 Distortions and Aberrations....... 68 4..........................1 Resolution............................................................2 Brief History of Head-Mounted Displays.................................................2 Ocularity.....9 Multimodality..6...6...6..................................6........................................ 65 4....................................................................80 4.....8 Conclusion....................................................6...............6 Characteristics of Head-Mounted Displays...............................................................................................5 Latency..................7.......................3 Adaptation................. 72 4....................................................................................................6.................................................................................4 Typical Optical Design.2 User Acceptance......................................................... 65 4.....6................6.................................................................. 76 4.....................4 Depth of Field........ 81 References..................................................................... 70 4..8 Pictorial Consistency................................................................................................................................5.1 Optical and Video See-Through Approaches......60 4........... 70 4.................4 Head-Mounted Display Technologies for Augmented Reality Kiyoshi Kiyokawa CONTENTS 4................................................................................. 78 4...5 Hardware Issues......................................... 75 4............................5....................4 HMD-Based AR Applications...........................................................6 Parallax.3 Human Vision System............... 79 4............................................... 63 4..........

Air Force has studied HMD systems as a way of providing the aircrew with a variety of flight information. Bell Helicopter Company studied a servo-controlled camera-based HMD in the 1960s.5 min of arc. this is the first video see-through AR system. their capabilities. the AN/PVS-5 series night vision goggle (NVG) was first tested in 1973. this section introduces three issues related to HMDs: a brief history of HMDs. Since the early 1970s. attempts have been made to develop a variety of HMDs by researchers and manufacturers in the communities of virtual reality (VR). The first HMD coupled with head tracking facility and real-time computer-­ generated image overlay onto the real environment was demonstrated by Sutherland in late 1960s (Sutherland. which was first fielded in 1985 (Rash and Martin. Such a visual display is difficult to realize and therefore an appropriate compromise must be made considering a variety of technological trade-offs. with a dynamic range of 80 db. This was more like today’s telepresence system. visual stimulation should be presented in a field of view (FOV) of 200°(H) × 125°(V). at an angular resolution of 0. allowing each eye to observe a synthetic image and its surrounding real environment simultaneously from a different vantage point. called Sword of Damocles. the Sensorama Simulator in 1962. Headsight shows a remote video image according to the measured head direction. This tethered display. no single HMD is perfect. 1986). In a sense that the real-world image is augmented in real time. the first functioning HMD (Comeau and Bryan. Furness demonstrated the visually coupled airborne systems simulator (VCASS). . Heilig also patented a stereoscopic television HMD in 1960 (Heilig. the U. As the first system in this regard. and the device should look like a normal pair of glasses. 1965.S. human vision system. has a set of CRT-based optical see-through relay optics for each eye. which was equipped with a variety of input and output devices including a binocular display to give a user virtual experiences. 1968). and application examples of HMDs. This display provides the pilot an augmented view captured by an infrared camera under the helicopter for landing at night. at a temporal resolution of 120 Hz. The Honeywell integrated helmet and display sighting system (IHADSS) is one of the most successful see-through systems in army aviation. however. Air Force’s super-cockpit VR system (Furness. Because of HMD’s wide application domains and technological limitations. Using a magnetic tracking system and a single cathode ray tube (CRT) monitor mounted on a helmet.S. Ideally.60 Fundamentals of Wearable Computers and Augmented Reality 4. Comeau and Bryan at Philco Corporation built Headsight in 1961. As an introduction to the following discussion.1 INTRODUCTION Ever since Sutherland’s first see-through head-mounted display (HMD) in the late 1960s. This is why it is extremely important to understand characteristics of different types of HMDs. 1988). and limitations. 1961). Then he developed and patented a stationary VR simulator.2  BRIEF HISTORY OF HEAD-MOUNTED DISPLAYS The idea of an HMD was first patented by McCollum (1945). though computer-generated imagery was not yet used. augmented reality (AR). In 1982. the U. and wearable computers. 4. 1960).

developed in 1979 by Howledtt. which refracts the light on the retina.5 min of arc. cooperatively provide color perception within spectral region of 400–700 nm. the visual acuity drops drastically. There are two types of photoreceptor cells. most cones exist in the fovea. As shown in Figure 4. The LEEP system. in 1989. corresponding to different peak wavelength sensitivities.7° in d­ iameter. FOV of the human eye is an oval of about 150°(H) by 120°(V). the total binocular FOV measures about 200°(H) by 120°(V) (Barfield et al. EyePhone. As both eyes’ FOVs overlap. though they provide lower visual acuity than cones do. After the light travels through the pupil. Employing the LEEP optical system. Having a wide exit pupil of about 40 mm. Since then a variety of HMDs have been developed and commercialized. providing more than 70% of the total sensory information. one needs Cornea Nasal Iris Density Pupil Lens Ciliary body and muscle Blind spot Optic nerve (a) Cones Rods Temporal Degrees from fovea Retina Fovea Blind spot (b) 80° Nasal 20° 0° 20° FIGURE 4.1b.61 Head-Mounted Display Technologies for Augmented Reality The large expanse extra perspective (LEEP) optical system. it will enter the crystalline lens. McGreevy and Fisher have developed the Virtual Interactive Environment Workstation (VIEW) system at the NASA Ames Research Center in 1985. To compensate.3  HUMAN VISION SYSTEM Vision is the most reliable and complicated sensory. the LEEP requires no adjustment mechanism for interpupillary distance (IPD). VPL Research introduced the first commercial HMD. has been widely used in VR. 1995). it enters the pupil. The EyePhone encouraged VR research at many institutes and laboratories. The retina contains about 7 million cone cells and 120 million rod cells. Outside this region. 80° Temporal . The cones function under the daylight (normal) condition and provide very sharp visual acuity. 4. The innermost region corresponding to the fovea is only 1. provides a wide FOV (~110°(H) × 55°(V)) stereoscopic viewing. originally developed for 3-D still photography. while rods widely exist on the retina except for the fovea. The pupil is a round opening in the center of the iris.. Figure 4.1  (a) Human eye structure and (b) density of cones and rods. When light travels through the cornea. Three types of cone cells. which adjusts the pupil’s aperture size. The rods function even under the dim light condition. on the retina. rods and cones. Using the LEEP optics.1a shows the structure of a human eye. Normal visual acuity can identify an object that subtends an angle of 1–0. the ability to resolve spatial detail.

62 Fundamentals of Wearable Computers and Augmented Reality to move the eyes and/or the head. rods are saturated and not effective. navigation. In this case. According to the intensity of the light. scientific visualization.. Mesopic vision is experienced at dawn and twilight. cones function less actively and provide reduced color perception. the aviator sees a variety of situational information. Scotopic vision is experienced under starlight conditions. angular pixel resolution as well as registration accuracy is crucial in medical AR visualization. Size and weight of the HMD are relatively less crucial. However. and scotopic (Bohm and Schranner. peripheral vision can be effective to find dim objects. and shadows. from nearly 180°(H) at age 20. respectively. medicine. HMDs have a variety of applications in AR including military. training. 1987). and motion parallax. Photopic vision. Psychological monocular depth cues include apparent size. The horizontal FOV slowly declines with age. tactical. . 1996). A high-resolution HMD is preferred for a dexterous manipulation task. 4. mesopic. by changing the pupil diameter from about 2 to 8 mm. A wide FOV HMD is preferred when the visual information needs to surround the user. In this case. texture gradient. which is roughly circular with a radius of about 40°–50°. education. as the aviator needs to wear a helmet anyway which can also be suspended from the cockpit ceiling. Medical AR visualization eliminates necessity for frequent gaze switching between the patient’s body at hand and live images of the small camera inside the body on a monitor during laparoscopic and endoscopic procedures (Rolland et al. At the same time. and it can be effective up to a few hundreds of meters. These cues can be further categorized into physiological and psychological cues.4  HMD-BASED AR APPLICATIONS As readers may find elsewhere in this book. monocular convergence. Binocular convergence is related to the angle between two lines from a focused object to the both eyes. a monocular display is often sufficient. Binocular convergence and stereopsis are typical physiological and psychological binocular depth cues. The human eye has a total dynamic sensitivity of at least 1010. aerial perspective. features sharp visual acuity and color perception. Depth perception occurs with monocular and/or binocular depth cues. where the aviator often needs to see in every direction. An area in the view where fixation can be accomplished without head motion is called the field of fixation. In this case. including pilotage imagery. 1990). 1986). In this case. Through the HMD. as most targets are distant. and entertainment.. while stereopsis is about the lateral disparity between left and right images. Army aviation is a good example in this regard. occlusion. linear perspective. shades. it is important to identify crucial aspects in the target application. When considering the use of an HMD. experienced during daylight. Physiological monocular depth cues include accommodation. to 135°(H) at age 80. manufacturing. Stereopsis is the most powerful depth cue for distance up to 6–9 m (Boff et al. and operational data (Buchroeder. head motion will normally accompany to maintain the rotation angle of the eyes smaller than 15°. peripheral vision is more dominant than the foveal vision with poor visual acuity and degraded color perception because only the rods are active. For example. the dynamic range is divided into three types of vision: photopic.

4. . such as periphery vision and a mechanism for easy attachment/detachment.. A lightweight.1  Optical and Video See-Through Approaches There are mainly two types of see-through approaches in AR. is not crucial. 1993).2  Typical configurations of (a) optical see-through display and (b) video seethrough display. are more of importance. In these systems.. the real and synthetic images are combined with a Monitor Rendered image Overlaid image Optical combiner (a) Real image Camera Real image Captured image Image composition Rendered image Overlaid image Monitor (b) FIGURE 4. on the other hand. With an optical see-through display.5  HARDWARE ISSUES 4. less-tiring HMD is specifically preferred for end users and/or for tasks with a large workspace. optical and video. KARMA system for end-user maintenance (Feiner et al. Figure 4. Early examples in this regard include Boeing’s AR system for wire harness assembly (Caudell and Mizell. as the image overlay is needed in a small area at hand.Head-Mounted Display Technologies for Augmented Reality 63 Stereoscopic view is also important for accurate operations. Safety and user acceptance issues.2a shows a typical configuration of an optical see-through display. 1992).5. 1997). moderate pixel resolution and registration accuracy often suffice. Wide FOV. and an outdoor wearable tour guidance system (Feiner et al.

Table 4. Instead. Electronic merging can be accomplished by frame grabbers (such as digital cameras) or chroma-keying devices. researchers have often had to build them manually.5. As a result. Compared to optical see-through displays. TABLE 4.64 Fundamentals of Wearable Computers and Augmented Reality partially transmissive and reflective optical device. or to the side of the user’s head with relay optics.2b shows a typical configuration of a video see-through display. for example. where less obtrusive real view is crucial and a stereoscopic synthetic image is not necessary. In the case of a half-silvered mirror. and binocular. It is relatively small and provides unaided real view to the other eye. and finally the combined image is presented to the user. Advantages of video see-through HMDs over optical see-through HMDs include pictorial consistency between the real and the synthetic views and the availability of a variety of image processing techniques. This causes an annoying visual experience called binocular rivalry. geometric and temporal consistencies can be accomplished. it is normally located above the optical combiner. Advantages of optical see-through HMDs include a natural. In most optical see-through HMDs. A monocular HMD is preferable. With appropriate vision-based tracking and synchronous processing of the captured and the rendered images. The army aviation and wearable computing are good examples. commercially available video see-through displays are much less. The imaging device should not block the real environment from the eyes. the real world image is first captured by a video camera. Figure 4. using a closed-view (non-see-through) HMD and one or two small video cameras such as webcams. the optical combiner is normally placed at the end of the optical path just in front of the user’s eyes. whereas the synthetic imagery is reflected on it. This deficiency is prominent when using a monocular video see-through display. for some outdoor situations. then the captured image and the synthetic image are combined electronically. typically half-silvered mirror. The real world is left almost intact through the optical combiner. 4. and (generally) simple and lightweight structures. the real scene is simply seen through it. These categories are independent of the type of see-through. A monocular HMD has a single viewing device. either see-through or closed. while the synthetic image is optically overlaid on the real image. the two eyes see quite different images.2  Ocularity Ocularity is another criterion for categorizing HMDs. biocular.1 shows applicability of each combination of ocularity and see-through types in AR. There are three types of ocularity: monocular. seamlessness between aided and periphery views. With a monocular HMD. instantaneous view of the real scene. With a video see-through display.1 Combinations of Ocularity and See-Through Types Optical see-through Video see-through Monocular Biocular Binocular (Stereo) Good Confusing Confusing Good Very good Very good .

However. a problem of binocular rivalry does not occur. for example.. There is often confusion between binocular and stereo. a short eye-relief (the separation between the eyepiece and the eye) is desirable. For small total size and rotational moment of inertia. This is a typical configuration for consumer HMDs. As both eyes always observe an exact same synthetic image.5.3b). For example.3 Eye-Relief Most HMDs need to magnify a small image on the imaging device to produce a large virtual screen at a certain distance to cover the user’s view (Figure 4. the distance between the eye and the image will be roughly 60 mm. biocular video see-through HMDs are preferable for casual applications. 4. the larger the eye-relief becomes. where stereo capability is not crucial but a convincing overlaid image is necessary. Because of the stereo capability. 1989). and it is inconvenient for users with eyeglasses.4 Typical Optical Design There is a variety of optical designs in HMDs and each design has its pros and cons. at least around 10  mm in diameter. The exit pupil should be as large as possible. The eyepiece diameter cannot exceed the IPD normally. where 2D images such as televisions and video games are primary target contents. binocular HMDs are preferred in many AR systems. Optical designs used for HMDs can be divided into two types. has often been used in early HMDs to allow large FOV in exchange for large total size and weight (Hayford and Koch. Having an intermediate image. Entertainment is a good application domain in this regard (Billinghurst et al. one for each eye. As a compromise. an optical see-through view with a biocular HMD is annoying in AR systems because accurate registration is achievable only with one eye. when the eye-relief is 30 mm. It produces at least one intermediate image and an exit pupil that are collimated by the eyepiece. However. A binocular HMD has two separate displays with two input channels. Pupil forming systems are normally folded and placed . Similarly. which introduces heavier optics but a larger exit pupil size. virtual screens formed by an HMD appear differently in different optical designs (see Figure 4. eye-relief of an HMD is normally set between 20 and 40 mm. which varies among individuals from 53 to 73 mm (Robinett and Rolland. For AR. Some biocular HMDs have optical see-through capability for safety reasons. because a magnifying lens (the eyepiece functions as a magnifying lens) has normally equivalent front and back focal lengths. 4. also known as relay optics.Head-Mounted Display Technologies for Augmented Reality 65 A biocular HMD provides a single image to both eyes. pupil forming and non-pupil forming. 2001). a too-small eye-relief causes the FOV to be partially shaded off. 1992). For example. the larger the eyepiece diameter needs to be. Eye-relief and the actual distance between the eye and the imaging device (or the last image plane) are interlinked to each other. A binocular HMD can function as a stereoscopic HMD only when two different image sources are properly provided.5. Pupil forming architecture.3a). optical design can be flexible regarding the size of imaging device.

Figure 4. HOE-based and waveguide-based HMD Virtual screen is formed at a certain distance Light field display An arbitrary shape of virtual screen is formed within a depth of field Eyeball VRD. small imaging devices. around the head to minimize rotational moment of inertia. With the advent of high-resolution. As a drawback of non-pupil-forming architecture. pinlight display Image is projected on the retina (virtual screen is at infinity) (b) HMPD Image is projected onto the real environment FIGURE 4. refractive optics has been used .3  (a) Eye-relief and viewing distance and (b) locations of the virtual screen in different types of HMDs. non-pupil-forming architecture has become more common. the pupil of an eye needs to be positioned within a specific volume called an eye box to avoid eclipse. In early HMDs.4 shows a number of typical eyepiece designs in non-pupil forming architecture.66 Fundamentals of Wearable Computers and Augmented Reality Eyeball Eyepiece Imaging device Eye-relief (20–40 mm) Virtual screen (~2 × Eye-relief ) Viewing distance (1 m ~ infinity) (a) Conventional HMD. In such systems. which allows a modest FOV in a lightweight and compact form factor. optical design is less flexible.

the reflected light travels through this surface to the eye. In this case. However. Second.4c) reduces the thickness and weight without loss of light efficiency. at least three lenses are normally required for aberration correction.67 Head-Mounted Display Technologies for Augmented Reality Imaging device Eyeball Eyepiece Eyeball Imaging device (a) (b) Concave mirror Half-silvered mirror Imaging device Eyeball (c) Free-form prism FIGURE 4. . Catadioptric designs (Figure 4. (a) Refractive. chromatic aberration is not introduced. and enters the eye. because the light must travel through the half-silvered mirror twice.4b) contain a concave mirror and a half-silvered mirror. Optical see-through capability is achieved by simply making the concave mirror semitransparent.4a). For example.4c). Optical see-through capability is achieved by folding the optical path by an optical combiner placed between the eyepiece and the eye. The inner side of the back surface is carefully angled. a compensating prism can be attached at the front side (on the right side of Figure 4. and (c) free-form prism. the eye receives only one-fourth of the original light of the imaging device at most. because of small incident angles. 34° horizontal FOV is achieved with the prism’s thickness of 15 mm. which is the inability of a lens to focus different colors to the same point. The size in depth and weight of the optics are difficult to reduce.4  Typical eyepiece designs. A free-form prism (Figure 4. the light from the imaging device bounces off this surface with total reflection. travels through the half-silvered mirror. Light emitted from the imaging device is first reflected on the half-silvered mirror toward the concave mirror. The inner side of the front surface functions as a concave mirror. Besides. (Figure 4. A beam-splitting prism is often used in place of the half-silvered mirror to increase the FOV at the expense of weight. (b) catadioptric. The light then bounces on the concave mirror. To provide optical see-through capability. This configuration reduces the size and weight significantly. At first.

. 2000). Those image components are then coupled out of the waveguide using carefully designed semitransparent reflecting material such as HOE.5.5  Examples of (a) HOE-based HMD and (b) waveguide-based HMD. and bright optical see-through HMDs can be designed (Ando et al. scans modulated light directly onto the retina of the eye based on the principle of Maxwellian view. An optical waveguide or a light-guide optical element. A HOE can also function as a highly transparent optical combiner due to its wavelength selectivity.68 Fundamentals of Wearable Computers and Augmented Reality Holographic optical element (a) Imaging device Imaging device Eyeball Image guided with total reflection Couple-out optics Eyeball Couple-in optics Imaging device (b) FIGURE 4. developed at the University of Washington.5a. theoretically allowing . lightweight. Kasai et al.5b. Due to its diffractive power. 2002.5  Other Optical Design While typical HMDs present a virtual screen at a certain distance in front of the user’s eye. 1998). together with couple-in and couple-out optics.. Some of recent HMDs such as Google Glass and EPSON Moverio Series use a waveguide-based design. some HMDs form no virtual screen (Figure 4. a kind of diffractive grating. A holographic optical element (HOE). wide field of view HMD designs (Allen. As shown in Figure 4. image components from an image source are first coupled into the waveguide with total internal reflection. lightweight. An example of HOE-based stereo HMD is illustrated in Figure 4. 4.3b). The Virtual Retinal Display (VRD). Based on these unique characteristics. very thin. offers compact. has been used for lightweight optics in HMDs. a variety of curved mirror shapes can be formed on a flat substrate. The VRD eliminates the need for screens and imaging optics.

a near-eye light field display can potentially provide a high-resolution. two novel near-eye light field HMDs have been proposed.6  (a) Head-mounted projective display and (b) near-eye light field display. 2013) is optical see-through. wide FOV with very thin (~10 mm) and lightweight (~100 g) form factors. A typical configuration of HMPD is shown in Figure 4. light field displays can reproduce accommodation. selective occlusion. NVIDIA’s non-see-through near-eye light field display (Lanman and Luebke. this design is less obtrusive. called pinlight (Maimone et al.69 Head-Mounted Display Technologies for Augmented Reality for very high-resolution and wide FOV. The light field is all the light rays at every point in space travelling every direction. in exchange for a small exit pupil. and it gives smaller aberrations and larger binocular FOV up to 120° horizontally. but instead relies on a set of optimized patterns to produce a focused image when displayed on a stack of spatial light modulators (LCD panels). eliminating a common problem of the accommodation–convergence conflict within a designed depth of field. . 2014). refractive. Head-mounted projective displays (HMPD) present a stereo image onto the real environment from a pair of miniature projectors (Fisher. In 2013. by using an imaging device and microlens array near to the eye. 2013) is capable of presenting these cues. UNC and NVIDIA jointly proposed yet another novel architecture. the projected stereo image is bounced back to the corresponding eyes separately. Without the need for eyepiece.. In theory. Their prototype display renders a wide FOV (110° diagonal) in real time by using a shader program to rearrange images for tiled miniature projectors. University of North Carolina’s near-eye light field display (Maimone and Fuchs. Projector Retroreflective surface Microlens array Eyeball (a) Eyeball Half-silvered mirror (b) Imaging device FIGURE 4. Their approach requires no reflective. and binocular disparity depth cues. etched acrylic sheet). 1996). or diffractive components. It forms an array of miniature see-through projectors. The VRD assures focused images all the time regardless of accommodation of the eye. and multiple simultaneous focal depths in a similar compact form factor. thereby offering an arbitrary wide FOV supporting a compact form factor. Although image quality of these near-eye light field displays is currently not satisfactory. A pinlight display is simply composed of a spatial light modulator (an LCD panel) and an array of point light sources (implemented as an edge-lit. they are extremely promising because of the unique advantages mentioned earlier. convergence. From the regions in the real environment that are covered with retro-reflective materials. Because of the simple structure and a short distance between the eye and the imaging device.6a.6b). supporting a wide FOV. closer than the eye accommodation distance (see Figure 4. In 2014.

but it often suffers from geometric and color discontinuities at display unit borders. can be used as a video see-through HMD by attaching a digital camera.000 × 7. angular resolution and the number of total pixels are conveniently used to assess each component.2 Field of View A field of view of an HMD for AR can be classified into a number of regions.1 Resolution Resolution of a display system defines the fidelity of the image. as the pixel resolution of a flat panel has been steadily increasing. Medical visualization and army aviation are suitable for first and second options. unfortunately. For example. In AR systems.160) is used to cover 150° of horizontal FOV. Closed-type HMDs.200 pixels to compete with the human vision (60 pixels per degree (PPD) for the total FOV of 200° × 120°). The third option is promising. but 50° horizontally is a reasonable threshold. and size of the device. Aberrations and distortions introduced by the optical combiner are negligible. on the other hand. an ideal HMD will need to have as many as 12. To avoid unnecessary image deterioration. (2) lower angular resolution with a wider FOV. this configuration gives highest resolution to where needed. The border between first and second options is not clear. Its maximum total input pixel resolution per eye is 1. Video see-through displays.6  CHARACTERISTICS OF HEAD-MOUNTED DISPLAYS 4. and (3) array multiple screens (called tiling). provide digitized real images. Sensics’s piSight provides a wide FOV of 187°(H) × 84°(V) in 4 × 3 arrangements per eye. 4. If the system is linear. it is desirable that the camera’s pixel resolution is comparable or superior to that of the display unit.200. its PPD is over 25. if a 4 K display (3. respectively. This is. Optical seethrough displays provide close to the best scene resolution that is obtained with the unaided eye. An aided (or overlay) FOV is the most important visual field in AR where the synthetic image is overlaid onto the real scene. The idea is to provide a high-resolution screen and a wide FOV screen in a concentric layout. Mimicking the human vision system. For example.6.6.6. mentioned earlier. resolution of the real scene is a different story. (1) higher angular resolution with a narrower FOV. 1989). at which the pixel structure is difficult to notice. convolution of the individual components’ MTFs gives the MTF of the entire system. However. To compromise.840 × 2. Regarding resolution of the synthetic image.920 × 1.3. However.70 Fundamentals of Wearable Computers and Augmented Reality 4. The resolution of the observed real scene is limited by both of the resolution of the camera and the display. one needs to choose either of three options. this angular resolution-FOV trade-off is likely to disappear in future. weight. yielding the horizontal PPD of 10. Another way of using multiple screens is a combination of first and second options (Longridge et  al. A modulation transfer function (MTF) is often used to quantify the way modulation is transferred through the system. not yet easily obtainable from the current technology. An aided FOV of a stereo HMD typically . increased manufacturing costs. Resolution of the total system is limited by optics and imaging device. In the case of video see-through.. resolution of the camera must be taken into consideration as well.

Larger peripheral FOVs reduce required head motion and searching time. Outside of the aided FOV consists of the peripheral FOV and occluded regions blocked by the HMD structure. (2011) extended this design to be available in a mobile environment by using a semitransparent retroreflective screen (see Figure 4. However. Closed-type. for example. Kishishita et  al.. one can focus on the backside of the eyeglasses and the real environment at the same time. introduced in the previous section. such as breast needle biopsy. provided by half-silvered curved mirror. InfinitEye V2. However. Kiyokawa (2007) proposed a type of HMPD. Innovega’s iOptik architecture also offers an arbitrary wide FOV. by a custom contact lens.7b). by micro projectors. allow an arbitrary wide FOV in an eyeglass-like compact form factor. 50%. hyperboloidal HMPD (HHMPD) (see Figure 4. With this design. which offers the total binocular FOV of 210°(H) × 90°(V) with 90° of stereo overlap. for example. In optical see-through HMDs. leaving a wide peripheral FOV for direct observation of the real scene. only a limited region in the visual field needs to be aided.Head-Mounted Display Technologies for Augmented Reality 71 consists of a stereo FOV and monocular FOVs. the image is seen only from the very small sweet spot. a horizontal FOV wider than 180° is easily achievable. optical see-through HMDs tend to have a simple and compact structure. wide FOV (immersive) HMDs. Recent advancements in optical designs offer completely new paradigms to optical see-through wide FOV HMDs. is not an exception. (2003) proposed a very wide FOV HMD (180°(H) × 60°(V) overlap) using a pair of ellipsoidal and hyperboloidal curved mirrors. the focus of the ellipsoid. Through the contact lens. 2006). the actual effects of a wide FOV display on the perception of AR content have not been widely studied. A necessary aided FOV is task-dependent. which provides a wide FOV by using a pair of semitransparent hyperboloidal mirrors. By attaching appropriate cameras manually. such as Oculus Rift and Sensics’ piSight. have typically no or little peripheral FOV through which the real scene is seen. In VR. The real view’s transition between the aided and peripheral views is desired to be as seamless as possible. whereas none of the real or synthetic image is viewed in the occluded regions. (2014) showed that search performance in a divided attention task either drops . In medical 3-D visualization. However. overlay FOVs larger than around 60°(H) are difficult to achieve with conventional optical designs due to aberrations and distortions. Nguyen et al. A wide aided FOV is available if an appropriate image is presented on the backside of the eyeglass. Narrow FOV HMDs (<~60°(H)) commonly have 100% overlap. any closedtype HMDs can be used as a video see-through HMD. whereas wide FOV HMDs (>~80°(H)) often have a small overlap ratio. such as Oculus Rift. The real scene is directly seen through the peripheral FOV. Nagahara et al.7a). L-3 Link Simulation and Training’s Advanced HMD (AHMD) achieves a wide view of 100°(H) × 50°(V) optically using an ellipsoidal mirror (Sisodia et  al. The occluded regions must be as small as possible. peripheral vision is proven to be important for situation awareness and navigation tasks (Arthur. 2000). A video see-through option is available on market for some closed-type wide FOV HMDs. This configuration can theoretically achieve optical see-through. Pinlight displays.

3  Occlusion Occlusion is well known to be a strong depth cue. Depth information of the synthetic image is normally available from the depth buffer in the graphics pipeline. depth information of both the real and the synthetic scenes is necessary. visibility. In the real world. and that the estimated performances converge at approximately 130°. In terms of cognitive psychology. To present correct occlusion. and realism of the synthetic scene presented.72 Fundamentals of Wearable Computers and Augmented Reality (a) (b) (c) FIGURE 4. or increases as the FOV increases up to 100° of horizontal FOV. orders of objects in depth can be recognized by observing overlaps among them. Correct mutual occlusion between the real and the synthetic scenes is often essential in AR applications. such as architectural previewing. The occlusion capability of a seethrough display is important in enhancing user’s perception. Real-time depth acquisition in the .6. 4. incorrect occlusion confuses a user. depending on a view management method used.7  A hyperboloidal head-mounted projective display (HHMPD) (a) with and (b) without a semitransparent retroreflective screen and (c) an example of image.

for example. a partially occluded real object can be presented in a video see-through approach simply by rendering the occluding virtual object over the video background. a partially occluded virtual object can be presented by depth keying or rendering phantom objects. and never directly shows its intended color. in a mobile situation. To avoid blurred video images. it is impossible to focus on both the real and the synthetic images at the same time with a conventional optical see-through HMD. though its image quality needs to be significantly improved.8). This problem does not occur with a video see-through display. Maimone et al. each pixel of the synthetic image is affected by the color of the real image at the corresponding point. 2013) and is extremely promising. the camera is preferable to be autofocus or to have a small aperture size. but an inexpensive RGB-D camera is widely available nowadays. reflective approaches are advantageous in terms of color purity and light efficiency. making it impossible to overlay opaque objects in an optical way. 2003) (see Figure 4. On the other hand. In both cases. 4. the synthetic image is normally seen at a fixed distance.4 Depth of Field Depth of field refers to the range of distances from the eye (or a camera) in which an object appears in focus. the same effect in an optical way is quite difficult to achieve. and (3) using a HMPD with retroreflective screens. Another approach is a transmissive or reflective light-modulating mechanism embedded in the see-through optics. unless the focused object is at or near the HMD’s viewing distance. Similarly. and objects outside the depth of field appear blurred. the eye’s accommodation is automatically adjusted to focus on an object according to the distance. (2) using a pattern light source in a dark environment to make part of real objects invisible (e. ELMO displays proposed by Kiyokawa employ a relay design to introduce a transparent LCD panel positioned at an intermediate focus point. Some approaches to tackle this problem include (1) using a luminous synthetic imagery to make the real scene virtually invisible. 2013). though captured real objects can be defocused due to the camera. occlusion is reproduced differently with optical and video see-through approaches.g..6. Therefore.. However. An optical see-through light field display using a stack of LCD panels has a capability of selective occlusion (Maimone and Fuchs. In the real life. First approach is common in flight simulators but it also restricts available colors (to only bright ones).. However. 2004).Head-Mounted Display Technologies for Augmented Reality 73 real scene has been a tough problem. Any optical combiner will reflect some percentage of the incoming light and transmit the rest. The most advanced ELMO display (ELMO-4) features a parallax-free optics with a built-in real-time rangefinder (Kiyokawa et al. as the real scene is always seen through the partially transmissive optical combiner. Although they require a telecentric system. Besides. fixed focus of the synthetic image is problematic because accommodation and . Once the depth information is acquired. Reflective approaches have also been proposed using a digital micro-mirror device (DMD) or a liquid crystal on silicon (LCoS) (Cakmakci et al. Second and third approaches need a special configuration in the real environment thus not available..

. (c) without occlusion and (d) with occlusion and real-time range sensing.8  (a) ELMO-4 optics design. Its fourth generation. To focus on both the real and the synthetic images at the same time. Copyright (2003) IEEE. An occlusion-capable optical see-through head mount display for supporting co-located collaboration. a different optical design can be used. 3DDAC Mk. 133–141.) c­ onvergence are closely interlinked in the human vision system. Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR) 2003. specifically when the content to present is a realistic 3-D scene. (b) its appearance. This is not always advantageous. et al. On the other hand. K. Adjusting one of these while keeping the other causes eyestrain. Virtual images presented by VRDs and pinlight displays appear clearly in focus regardless of user’s accommodation distance. (Images taken from Kiyokawa. 2004) using laser scanning by a varifocal mirror that can present a number of . a number of varifocal HMDs have been proposed that change the depth of focus of the image in real time according to the intended depth of the content. In 2001..4 can change its focal length in the range between 0 and 4 diopters in about 0. the University of Washington has proposed True 3-D Display (Schowengerdt and Seibel.3 s. 2003. and overlay images seen through ELMO-4. 3DDAC developed at ATR in late 1990s has an eye-tracking device and a lens shift mechanism (Omura et al. 1996).74 Fundamentals of Wearable Computers and Augmented Reality Color display Optical combiner Masking LCD Real viewpoint Virtual viewpoint Mirror Mirror Virtual viewpoint Mirror (a) (b) (c) (d) FIGURE 4.

6. There exist some . confusion. a synthetic image larger than the screen resolution is first rendered. University of Arizona. although image quality needs to be improved further. this problem can be minimized by delaying the captured real image to synchronize it with the corresponding synthetic image. AZ. weight. In an optical see-through HMD.5 Latency Latency in HMD-based systems refers to a temporal lag from the measurement of head motion to the moment the rendered image is presented to the user.9) (Liu et al. This approach is advantageous in terms of size. This leads to inconsistency between visual and vestibular sensations.) images at different depths in a time division fashion. In this system. To compensate latency. Tucson. 2008). as it is not presented at a time. and then a portion of it is extracted and presented to the user according to the latest measurement. In 2008. latency is observed as a severe registration error with head motion.. Taking advantage of nonuniformity of visual acuity and/or saccadic suppression.Head-Mounted Display Technologies for Augmented Reality 75 FIGURE 4. Viewport extraction and image shifting techniques take a different approach. and cost.9  A liquid lens-based varifocal HMD. 2001). With these techniques. 4. which further introduces motion sickness. Being able to reproduce accommodation cues. near-eye light field displays are the most promising in this regard. In a video see-through HMD. (Courtesy of Hong Hua. prediction filters such as an extended Kalman filter (EKF) have been successfully used. Frameless rendering techniques can minimize the rendering delay by continuously updating part of the image frame. In such a situation. limiting regions and/ or resolution of the synthetic image using an eye-tracking device helps reduce the rendering delay (Luebke and Hallen. the synthetic image swings around the real scene. at the expense of artificial delay introduced in the real scene. This approach eliminates apparent latency between the real and the synthetic scenes. and disorientation. it is difficult to control an image depth. University of Arizona has proposed a varifocal HMD using a liquid lens (see Figure 4.

) hardware implementations of image shift techniques. By using a high-speed (1. from vestibulo-ocular reflex. coined a term Reflex HMD. video see-through HMDs are difficult to eliminate parallax between the user’s eye and the camera viewpoint.76 Fundamentals of Wearable Computers and Augmented Reality FIGURE 4. They propose a variety of Reflex HMD (see Figure 4. respectively. describing an HMD that has a highspeed head pose measurement system independent from the system latency and an image shifting mechanism. Their system uses a gyro sensor attached to an HMD to estimate the amount of rotation corresponding to the system latency and adjusts the cropping position and rotation angle of the rendered image. (Courtesy of Ryugo Kijima. parallax introduced in an optical combiner is negligible and not compensated normally. and OS.’s (2005) display by using a free-form prism and a half-silvered mirror.10) (Kijima and Ojika. 2002). Gifu. it compensates not only for head rotation (inter-frame latency). 2000) and State et al.6. As a rendering viewpoint. applications. Kijima et  al.000 Hz) inertial measurement unit (IMU) and a pixel resampling hardware. This approach is inexpensive and independent from machines. Examples of parallax-free video see-through HMDs include Canon’s COASTAR (Takagi et al. Japan. On the other hand.10  Reflex HMD. causing a false sense of height. the center of eye rotation . As another problem.. but also for a rolling shutter effect (intra-frame latency) of a display unit. Gifu University. Horizontal parallax introduces errors in depth perception. the viewpoint for rendering must match that of the eye (for optical see-through) or the camera (for video see-through).6 Parallax Unlike optical see-through systems. A similar mechanism is employed in Oculus Rift. Mounting a stereo camera above the HMD introduces a vertical parallax. 4. It is desirable that the camera lens is positioned optically at the user’s eye to minimize the parallax.

8 Pictorial Consistency Pictorial consistency between the real and virtual images is important for sense of reality as well as visibility of the overlay information. distortion must be corrected optically. barrel. With lateral shift of the eye. Typical distortions include pincushion. real-time high dynamic range (HDR) techniques could be used. Considering that full-color displays actually have only RGB components. no imaging device is bright enough to be comparable to the sunshine. Predistorting techniques are not effective to correct these aberrations. achromatic lenses are normally used. aspheric and/or achromatic lenses can be used. differences in image distortion between left and right images must be minimized to achieve correct stereopsis. 2000). For example. eyestrain and disorientation in AR.Head-Mounted Display Technologies for Augmented Reality 77 is better for position accuracy. the image gets distorted and blurred. To compensate. Scanning-based displays such as CRTs and VRDs are prone to image distortion. for example. whereas the center of the entrance pupil is better for angular accuracy (Vaissie and Rolland. Reflective optical elements such as concave mirrors do not induce chromatic aberrations. pictorial consistency is more easily achieved. which may increase weight and size of the optics. which consist of convex and concave lenses. To compensate. field curvatures cause blurred imagery in the periphery. optical distortions can be corrected electronically by predistorting the source image. resulting. Similarly. In an optical see-through HMD. and B planes at the expense of increased rendering costs. Because it takes several milliseconds to scan an image. Rapid head motion also induces annoying color separation with field-sequential color systems.6. image distortion on the retina will occur with rapid head motion. Instead. In video see-through systems. Lenses and curved mirrors introduce a variety of optical aberrations. some products allow transparency control.7 Distortions and Aberrations Image distortions and aberrations cause incorrect registration and rendered depths. Chromatic aberrations occur due to refractive power (a prism effect) of the lenses. 4. G. brightness and contrast of the synthetic image should be adjusted to that of the real image. low contrast (low dynamic range) of the captured image is often a problem. Spherical aberrations are induced by the spherical shape of the lens surface. 4. and trapezoidal. This technique greatly contributes to flexibility in optical designs. In optical see-through HMDs. this dynamic IPD has not yet been compensated in real time to the author’s knowledge. chromatic aberrations can be compensated by separately predistorting R. it is difficult to match them for a very wide range of luminance values of the real scene. . Without introducing additional optical elements. Instead. Although human IPD alters dynamically because of eye rotation.6. in inexpensive wide FOV such as Oculus Rift. though the author is not aware of a successful example in video see-through AR. In a stereo HMD. Instead. For example.

weather. and temperature as well as biological sensors for EEG. For example. a variety of HMDs for nonvisual senses have been proposed. 4. that is capable of both displaying an image and eye tracking at the same time using an OLED on a CMOS sensor by exploiting the duality of an image sensor and an . In such situations. iSTAR. a large number of attempts have been made on eye tracking. a noise-canceling earphone is considered a hear-through head-mounted (auditory) display in a sense that it combines modulated sound in the real world with digital sound.78 Fundamentals of Wearable Computers and Augmented Reality 4. and comfortable so that a wearer can continuously use it for an extended period of time a day for a variety of purposes. as it is expected to have a wide FOV covering user’s central field of vision. noise. when the content is not relevant to the current situation hindering observation of the imminent real environment behind. Most AR studies and applications are vision oriented. thus a head-mounted device is a good choice for modulating such sensory information. traffic. 2014). and the sense of balance reside in the head. where the FOV is relatively small and shown off center of the user’s view. However. An HMD can be combined not only with conventional sensors such as a camera and a GPS unit but also with environmental sensors for light.. Interplay of different senses can be used to address this problem. (2011) successfully presents different types of tastes to the same real cookie by overriding its visual and olfactory stimuli using a headmounted device. gustatory. In this way. multimodal displays have a great potential in complementing and enforcing missing senses. However.6. Among a variety of sensing information. to explore different types of senses in the form of head-mounted devices. and switch contents and its presentation style properly and dynamically. Fraunhofer IPMS has proposed an HMD.9  Multimodality Vision is a primary modality in AR. a head-mounted device will be cumbersome if a user needs to put on and take off frequently.10  Sensing Unlike a smartphone or a smart watch. other senses are also important. Such information includes environmental context such as location. This problem is less prominent with an HMD for wearable computing. Meta Cookie developed by Narumi et al. small. 2010). This problem is more crucial with an HMD for AR. Literally speaking. skin conductance. integration of sensing mechanisms into an HMD will become more important. physiological status. It will be more and more important.6. A typical prospect on a future HMD is that it will become light. and body temperature. an HMD will be useless or even harmful. Some sensory information is more difficult to reproduce than others. olfactory. Different types of contextual information need to be recognized to determine if and how the AR content should be presented.. Recently. time. and schedule. an AR system must be able to be aware of user and environmental contexts. at least at a lab level. In 2008. AR systems target arbitrary sensory information. as well as user context such as body motion (Takada et al. In this sense. For example. gaze (Toyama et al. Receptors of special senses including auditory. ECG.

2012). 4. The typical x-ray vision effect also causes confusion in depth perception. Corneal image analysis is a promising alternative to this system for its simple hardware configuration. standard HMDs do not support every depth cue used in the human vision system. A user’s view as well as user’s gaze is important in analysis of user’s interest. interaction-free HMD calibration (Itoh and Klinker. (2) lack of standard rendering approaches. offering a variety of applications including calibration-free eye tracking (Nakazawa and Nitschke. for example. object recognition. as we have seen in this chapter. First. (2003) proposed . (2014) revealed that a stereo eye tracker can estimate a focused image distance. Mori et al. by using a prototypical three-layer monocular optical see-through HMD. This is due primarily to (1) an HMD’s insufficient capability to support depth cues. it is often difficult to perceive depths of virtual objects correctly in AR. it has been difficult to acquire a wide parallax-free user’s view. image display. Such objects are less informative in terms of depth perception. 2014).Head-Mounted Display Technologies for Augmented Reality Hyperboloidal half-silvered mirror 79 Eye-hole (for eyeball observation) IEEE 1394 camera First-order mirror (a) (b) FIGURE 4.7  HUMAN PERCEPTUAL ISSUES 4.11  Wide view eye camera. It is more desirable to be able to estimate the depth of the attended point in space. shadows. Depth perception can be improved by rendering other types of monocular depth cues. aerial perspective. Toyama et al.g. Second. Eye tracking is also achieved by analyzing user’s eye images captured at the same time as user’s view. and (3) visual congestion.1 Depth Perception Even when geometrical consistency is achieved. some of those rendering techniques may not be preferable in some situations. wire-framed) intentionally so as not to obstruct visibility of the real scene. (2011) proposed a head-mounted eye camera that achieves this by using a hyperboloidal semitransparent mirror (see Figure 4. Appearance (a) and captured image (b).11). and texture gradient when appropriate. For a multifocal HMD. Virtual objects in an AR application are often rendered in a simple way (e. etc.7.. shades. many research groups such as Livingston et al. To support correct depth perception in such situations. however. estimation of gaze direction may not be enough.

as far as the visual performance satisfies the application requirements. varying. AR applications may need to display minimal information as long as the target task is assisted satisfactorily. Also found was a negative aftereffect. Bell et al. 2001). central vision will be lost under a system failure. wearing an HMD will cause the pupil’s dilation slightly. and comfortable to wear as possible. A wellbalanced heavy HMD feels much lighter than a poorly balanced lightweight HMD. . and to increase label visibilities. edge drawing styles and surface opacity. When safety issues are of top priority. a flip-up display design is helpful (Rolland and Fuchs. Video cameras on an HMD have privacy and security issues.2 User Acceptance Inappropriately worn HMDs will induce undesirable symptoms including headaches.7. In video see-through. relocation techniques in the screen space have been proposed by many research groups (e. It takes some time to adapt to and recover from a new visual experience. Even though the visual experience is inconsistent with the real world. optical see-through HMDs are recommended. 2012). Third. By its nature. complete dilation may take over 20 min whereas complete constriction may take less than one minute (Alpern and Campbell. making it difficult to perceive its distance. which can be harmful in some situations. 2001.3 Adaptation The human vision system is quite dynamic. or even severe injuries. To alleviate label overlaps. which obstructs situation awareness of the surroundings.. HMDs should have a low profile or cool design to be widely accepted. small. From a social point of view.7.80 Fundamentals of Wearable Computers and Augmented Reality a variety of combinations of visualization techniques. virtual annotations and labels may overlap or congest. (1997) describe the ultimate test of obtrusiveness of an HMD. 1896).” 4. Visual congestion degrades the visibility of the object of interest. the human vision system adapts to the new environment very flexibly. Bass et al. 1963). as “whether or not a wearer is able to gamble in a Las Vegas casino without challenge. Paying too much attention to the synthetic image could be highly dangerous to the real world activity. for example. Safety issues are of equal importance. Furthermore.. HMDs restrict peripheral vision. AR applications distract user’s voluntary attention to the real environment by overlaying synthetic information.g.. Similar adaptation occurs with AR systems with parallax in video see-through systems. HMDs must be as light. To accommodate these problems. For example. However. To prevent catastrophic results. Biocca and Rolland (1998) found that performance in a depth-pointing task was improved significantly over time using a video seethrough system with parallax of 62 mm in vertical and 165 mm in horizontal. motion sickness. a great ability of adaptation to the inverted image on the retina has been proven for more than 100 years (Stratton. The center of mass of an HMD must be positioned as close to that of the user’s head as possible. in some AR applications. 4. Grasset et al. For example. From an ergonomic point of view. shoulder stiffness.

emerging display technologies. Yamasaki. Barfield. Bjorneseth.. B. VR. T. Chapel Hill. CA. July.. The behavior of the pupil during dark adaptation.g. W. in emerging displays review. 7–12. and motion sickness. ACM SIGCHI Bulletin. Alpern. (1997). (2002). Bell. D.. and Campbell. However. REFERENCES Allen. O. Journal Physiology. (1998). Extensive user studies must be conducted to develop similar recommendations for see-through HMDs. NC. (1995). Orlando. Comparison of human sensory capabilities with technical specifications of virtual environment equipment. Kaczmarek. an accommodation-capable (e. Feiner. Okamoto. W. 5–7.. and wearable computing. K. A new fold in microdisplay optics.. and Shimizu. pp. W.. near-eye light field displays) or accommodation-free (e. A. One must first consider whether optical or video see-through approach is more suitable for the target task. and Lotens. Therefore. Ando. When the user needs to observe both near and far overlay information. San Jose. Proceedings of SPIE 3293. Stanford Resources. C. W. 34–39.8 CONCLUSION With the advancements in display technologies and an increasing public interest to AR. Novel optical designs such as near-eye light field displays and pinlight displays offer many preferable features at the same time. K. double vision. Practical Holography XII. The National Institute for Occupational Safety recommends a 15 min of rest each after 2 h of continuous use of a Video Display Unit (VDU) (Rosner and Belkin. M. 1989). F. 29(4). C. University of North Carolina at Chapel Hill. Issues in wearable computing: A CHI 97 workshop. Issues discussed in this chapter give some insights into the selection of an HMD. occlusion-capable optical see-through displays such as ELMO-4 should be selected. FL.. (2001). 329–356. VRDs) HMD may be the first choice. there is and will be no single right HMD due to technical limitations and wide variety of applications. 65. T. K. p. Effects of field of view on performance with head-mounted displays.. Multimodal output and sensing features will be more important as the demand for more advanced AR applications grows and HMD becomes indispensable tool.. doi:10. Proceedings of the ACM UIST 2001. K.g. 4. in short.. 183. This is. a recovery period should be given to the user whenever needed. Hendrix. Mann. View management for virtual and augmented reality.303654. fatigue. Head-mounted display using a holographic optical element.1117/12. Therefore. (1963). S. Arthur. appropriate compromise must be made depending on the target application. If true occlusion within nearly intact real views is necessary. and Thompson. E. such as a wide field of view and a compact form factor. Bass. Doctoral thesis. both research and business on HMDs are now more active than ever. Next consideration would be a trade-off between the field of view and angular resolution. S.. Siewiorek. pp.Head-Mounted Display Technologies for Augmented Reality 81 Long-term use of an HMD will increase the likelihood of user’s encounter to a variety of deficiencies. L.. and Hollerer. such as red eyes. a trade-off between the real world visibility and pictorial consistency. 101–110. Presence. 4(4). . (2000). M...

Cakmakci.. Kato. Munich. Helmet-mounted displays. and Ojika. Electro-optics. Kruijff. 75–82. P. Knowledge-based augmented reality. T. 115–118. D. J.. and Webster. D. Kiyokawa. F.. R. H. 172–179. Y. A.. Itoh. H. (2014).. OH. Electronics. T. Orlando. MN. and Rolland. issued August 8. Furness. tutorial short course notes T2. K. Comeau. P. 1290. The magicbook—Moving seamlessly between reality and virtuality.. P. 7(3). Proceedings of Fourth International Symposium on Wearable Computers (ISWC) 2000. FL. Communications of the ACM. pp. M. Requirements of an HMS/D for a night-flying helicopter.. S. N. (2012). Kaufman. and Thomas. Macintyre. 74–81. 1989. Proceedings of ISWC’97. Nara. Optical arrangement. A. M. Dayton. Headsight television system provides remote surveillance.82 Fundamentals of Wearable Computers and Augmented Reality Billinghurst. Kalkofen. (2000). Endo. Langlotz. 4854688. and Poupyrev. A. T. 2955156. (2004). Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR). Proceedings of the 1992 IEEE Hawaii International Conference on Systems Sciences. D. 6–8. IEEE Computer Graphics and Applications. (1961). Kishishita. and Rolland. Proceedings of the Ninth IEEE Symposium on 3D User Interfaces (3DUI). GA. FL. 207–210. US Patent No. 659–669. J. Germany. J. Feiner. and Seligmann. pp. H. (2014). Boff. Honolulu. C. FL. M. (1986). D. and Schranner. Japan. I. Mashita. issued October 4.. The super cockpit and its human factors challenges. Y.. Tanijiri. 30. D. 53–62. SPIE Technical Symposium Southeast on Optics. H. A wide field-of-view head mounted projective display using hyperbolic half-silvered mirrors. Atlanta... and Schmalstieg. 177–186. 262–277.. R. Interaction-free calibration for optical see-through headmounted displays based on 3D eye localization. (2007). E. R. (1990). Fisher. (1989). 93–107.. B. and Takemura. Reflex HMD to compensate lag and correction of derivative deformation. US Patent No. pp. T. R. Cambridge. Arlington. Buchroeder. Kiyokawa. (2002). Helmet-mounted displays II. S.. Ha. Minneapolis. John Wiley & Sons. J. A forgettable near eye display. Proceedings of the Human Factors Society. Tobias. Virtual eyes can rearrange your body: Adaptation to virtual-eye location in see-thru head-mounted displays. Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR) 2014... and Bryan. R.. O. (1992). G. (1987). (2001). Head-mounted projection display system featuring beam splitter and method of making same. (1997). Proceedings of the IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR). D. B. G. K. Feiner. J. pp. 1960.. and Klinker. Presence: Teleoperators and Virtual Environments (MIT Press). VA. pp. Stereoscopic television apparatus for individual use. J. and Koch. Macintyre. (1993). MA. Imagedriven view management for augmented reality browsers. and Sensors. Tatzgern. pp.. Orlando. Grasset. Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR) 2007. R.. . 86–90. Analysing the effects of a wide field of view augmented reality display on search performance in divided attention tasks. Caudell. (1960). 16–25. W. K. Hayford. V. Bohm. S. 48–52. (1986). 21(3). Handbook of Perception and Human Performance. GA. P. (1998). Atlanta. New York. 1996). I. T. A touring machine: Prototyping 3D mobile augmented reality systems for exploring the urban environment. Orlosky. A compact optical see-through head-worn display with occlusion support. Proceedings of SPIE. H.. Augmented reality: An application of heads-up display technology to manual manufacturing processes. T. pp. A. HI. Biocca. Kijima. pp. Orlando. and Mizell. Kasai. Heilig. L. 5572229. 34(10 November). M. 36(7). P... US Patent No. Proceedings of International Conference on Virtual Reality (VR) 2002. B. (November 5. and Ueda. Y.

P. pp. Super wide viewer using catadioptical optics. Proceedings of the 14th International Conference on Human-Computer Interaction (HCI International 2011). Thomas. H. M. (1988). The impact of the U.. Meta cookie: An illusion-based gustatory display. J. Williams. Japan. (2011). Maimone. pp. Kiyokawa. and Luebke. Resolving multiple occluded layers in augmented reality. Orlando. pp. Cambridge. Tokyo. Mashita. Lanman. Yang. S. Mori. (2008). J.. An optical see-through head mounted display with addressable focal planes. UK.. (2011). Mashita.Head-Mounted Display Technologies for Augmented Reality 83 Kiyokawa. Proceedings of SIGGRAPH Asia. and Woods. Hollerer. Rathinavel. Japan. Orlando. Yagi. D.. Billinghurst. 56–65. Tanikawa. Julier. and Martin. D. D. Tokyo. H. (2013). Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR) 2013. and Hua. and Nitschke. Vol. San Diego. X.170.). SID 1996 Digest. Japan. Adelaide. H. in Area of Interest/Field-Of-View Research Using ASPT. C. T... D. A. 275–283. Lanman. Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR) 2003. M. K. (1989). Point of gaze estimation through corneal surface reflection in an active illumination environment. Fernie. China.. Near-eye light field displays. Proceedings of IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2008. Gabbard. pp. A.. UK. Japan. Italy. 260–269.. N. pp. and Brown. Army’s AH-64 helmet mounted display on future aviation helmet design. Campbell.. Cheng. (1945). 33(4). Kajinami.. pp. Army Aeromedical Research Laboratory. 900–912. London.388. and Takemura. (2013). and Yachida. M. pp. (2012). 17(7). A. 2. Livingston. Narumi.. Washington. T. Proceedings of European Conference on Computer Vision (ECCV). D. 2. Rash.. Design of an eye slaved area of interest system for the simulator complexity testbed. 88-13. Proceedings of IEEE Virtual Reality (VR). 133–141. 220. Nagahara. US Patent No. C. and Hallen.. Air Force Systems Command. H. National Security Industrial Association. B. Shiwa. Perceptually-driven simplification for interactive rendering. D. Osaka. pp. K. FL. B.. Article No. 159–172. FL. Australia. T.. H. Hong Kong.. H. E. General-purpose telepresence with head-worn optical see-through displays and projector-based lighting. (2003). and Kishino. Yohan. Luebke. USAARL Report No. 23–26. 3-D display with accommodative compensation (3DDAC) employing real-time gaze detection. Stereoscopic television apparatus. and Hirose. 32(6). S. K. Maimone. Florence. M.S. T.. M. Proceedings of the 21st International Conference on Artificial Reality and Telexistence (ICAT 2011)... and Fuchs. Osaka. A. pp. (1996). Longridge (ed. 29–38. L. DC. Fort Rucker.. An occlusion-­capable ­optical see-through head mount display for supporting co-located collaboration. . H.. A.. and Fuchs.. Computational augmented reality eyeglasses. (2003).. H. Kiyokawa.S. Liu. and Takemura.. ACM Transaction on Graphics (TOG)... Nishizaka. ACM Transactions on Graphics (TOG). and Fuchs. E. J. B. E. and Wetzel. T. F. 223–234. Luebke. Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR) 2003.. T. J. A. Y. (2001). pp. IEEE TVCG. 169–175. K. Air Force Human Resources Laboratory. Proceedings of the ACM 12th Eurographics Workshop on Rendering Techniques.. Hix. Dou. (2014). Pinlight displays: Wide field of view augmented reality eyeglasses using defocused point light sources. Omura.. Proceedings of ACM VRST. T. H... McCollum. Nakazawa. 89. AL: U. K. Sumiya. Subjective image quality assessment of a wide-view head mounted projective display with a semi-transparent retro-reflective screen. 33–42. S. (2011)... E. D. 889–892.. D. pp. T. Nguyen. Longridge. CA. D. Maimone. Swan. H.. K.. (2013).. S. A. T.. State.. M. (2003). Dierk. A wide-view p­ arallax-free eye-mark recorder with a hyperboloidal half-silvered mirror and ­appearance-based gaze estimation. Keller. S.

Y. (1996).. 610–620. (2014). Rolland. P. Como. 1(1). Orlando. New York. and Rolland. Stratton. A. P. A. NY. FL. W. 5. (1989). T. A. University of Central Florida. J. (2004).. E.). J. Toyama. T. A computational model for the stereoscopic optics of a head-mounted display. Sisodia. Survey of Ophthalmology.. J. orthoscopic video see-through head-mounted display. D. The ultimate display. (1992). Proceedings of SPIE AeroSense 2000. D. L... D. H. A context-aware wearable AR system with dynamic information detail control based on body motion. J. and Kiyokawa.. I.. A. B. Yamazaki. FL. and Seibel. (2010). Simulation-based design and rapid prototyping of a parallax-free. K. Some preliminary experiments on vision without inversion of the retinal image. Technical Report. Natural interface for multifocal plane head mounted displays using 3D gaze. Optical versus video see-through head-mounted displays. (1965). K. Proceedings of the 2014 International Working Conference on Advanced Visual Interfaces. 757–764. Takagi. Germany. (1968). W. H. 68–80. 506–508. Information Processing 1965: Proceedings of IFIP Congress. Transaction on Human Interface Society.. and Kancherla. 33. and Fuchs. Schowengerdt. 25–32. T. 12(1).. State. Rolland. San Diego. J. 343–353. 611–617. Takada. Lawrence Erlbaum Associates: Mahwah. bringing objects displayed at different viewing distances into and out of focus. pp. Orlando. Proceedings of Medicine Meets Virtual Reality. pp. San Francisco. Proceedings of IEEE/ACM ISMAR.. 29–31. and Rolland. Vaissie. P. Vol. (2005). Wright. Munich. P.. Fall Joint Computer Conference. 47–56 (in Japanese). Psychological Review. CA. H. R. N. and Belkin. CA. (2000). M. 4021. Kiyokawa. pp. Cyber Psychology & Behavior. 2. (2001). A head-mounted three-dimensional display. M. True 3D displays that allow viewers to dynamically shift accommodation. SPIE Defense & Security Symposium. and Fuchs. A. Towards a novel augmented-reality tool to visualize dynamic 3D anatomy. Japan.. Vol. J. Sonntag. T. (2006). K. FL. L. TR96-02. . Sutherland. and Taniguchi. pp. Proceedings of International Symposium on Augmented Reality (ISAR) 2000. AFIPS Conference Proceedings. J. and Caudell. S.. Saito. Orlando. and Takemura.. Rosner. Santa Barbara. NJ. Presence: Teleoperators and Virtual Environments (MIT Press). Ogawa. Keller. Vol. and McGuire. Video display units and visual function. 33(6). Italy. Riser. CA (1997). (2000).84 Fundamentals of Wearable Computers and Augmented Reality Robinett.and HeadMounted Displays XI: Technologies and Applications Conference. pp. 45–62. Bayer. (1896). pp. in Fundamentals of Wearable Computers and Augmented Reality. M. Accuracy of rendered depth in head-mounted displays: Choice of eyepoint locations. Helmet. Development of a stereo video see-through HMD for AR systems. Orlosky... Advanced helmet mounted display for simulator applications. (eds. Vol. 515–522. Barfield. G. M. 3. Sutherland. 7(6). I.

................9................................4 Consumer Immersion VR Headsets........ 119 5..............11...........................................................................2 Immersion Display Smart Glasses................................................... 118 5...... 91 5..7 Optical Microdisplays...............................................................................12 Conclusion........................................... 122 5.......................................1 Voice Control.................1 Introduction....... 110 5........................................................................... 116 5.......... 119 5............. 87 5................9.................................. 114 5..................11........1 Display-Less Connected Glasses.....................11.... 86 5..........................5 Hand Gesture Sensing...................... 119 5........9.............97 5....... 118 5............................ Augmented Reality................................................................ 115 5. Smart Eyewear.........................................2 HMD/SMART Eyewear Market Segments...........................................8 Smart Eyewear...............................................................................................9................ and Eye Pupil......2 Input via Trackpad...................9 Examples of Current Industrial Implementations......11............................ 117 5........................ 110 5.3 See-Through Smart Glasses....................3 Head and Eye Gestures Sensors.............3 Optical Requirements....................................................................... 114 5........................11 Optics for Input Interfaces............................ 121 5..................... 88 5.... 107 5...6 Other Sensing Technologies....... 117 5.........................9............. 119 5.................. and Virtual Reality Headsets Bernard Kress CONTENTS 5............. 123 85 ......... 122 References............................ 117 5...........................................11.............................10.................99 5..11......................... Eye Relief............5 Diffractive and Holographic Extractors.....................................................................................4 Optical Architectures for HMDs and Smart Glasses...................1 Contact Lens-Based HMD Systems.............. Eye Box..... 102 5..............5 Consumer AR (See-Through) Headsets....10..............................................6 Notions of IPD................6 Specialized AR Headsets................2 Light Field See-Through Wearable Displays............................................................9...............5 Optics for Smart Glasses...4 Eye Gaze Tracking......................................................10 Other Optical Architectures Developed in Industry..

2006.86 Fundamentals of Wearable Computers and Augmented Reality This chapter reviews the various optical technologies that have been developed to implement head-mounted displays (HMDs). police. smart glasses. and analyze such market segmentation. 1997. Velger. novel microdisplays. medical. novel digital imaging techniques. logistics. Wilson and Wright. Today AR headsets have been applied to various markets. 2007). with emphasis on sensors. and battery technology). 5. Such segmentation has been possible. and more. and smart eyewear. surgery. such as firefighting.. and more recently as connected glasses. suited for different (and constantly evolving) market segments. Virtual reality (VR) HMDs (also called occlusion or immersive displays) have also been around for decades but have been targeted to various market segments. as augmented reality (AR) devices. although being dedicated solely to defense applications until recently (Cakmacki and Rolland. Successive attempts to mass distribute VR HMDs as well as AR HMDs to the consumer market has partially failed during the last two decades mainly because of the lack of adapted displays and microdisplays. engineering. and subsequent problems with high latency (see a few early examples of VR offerings from the 1990s in Figure 5.1  Early examples of VR systems from the 1990s. 2004. specific digital imaging. such as flight simulators and battle training for defense applications. Martins et al.1 INTRODUCTION Augmented reality (AR) HMDs (based on see-through optics) have been around for a few decades now. embarked sensors. 1999. 2010. thanks to recent technological leaps in the smartphones industry (connectivity. Consumer applications are also emerging rapidly.. . We review the typical requirements and optical performances of such devices and categorize them into distinct groups. Rash. 1998. FIGURE 5. virtual reality (VR) devices. development of complex sensors. on-board CPU power with miniaturization of ICs. and strong connectivity. in an attractive and minimalistic package. focused on connectivity and digital imaging capabilities. Melzer and Moffitt.1). Hua et al.

through a tethered smartphone. and sensors packed). logistics. their display resolution. and very low latency make them quite different from their ancestors. computing power. • Smart eyewear: Smart eyewear devices integrate the optical combiner into the Rx lenses (which could also be a zero diopter lens—such as curved sun shades).Optics for Smart Glasses. smaller and lighter. in which direction is sensed through low latency gaze trackers). With this evolution. Gaming VR devices will eventually evolve into a new breed of VR devices. Major challenges lie ahead for binocular consumer and enterprise AR headsets. with the addition of Rx lens prescription.1). but are packed with Bluetooth connectivity and in some cases also WiFi connectivity. with a cable running down the neck. thanks to the development of smartphones and associated chips. HMD markets have been split between low-cost gadget occlusion HMDs (low-resolution video players. new market segments are emerging. and more oriented toward new communication ways rather than toward pure entertainment. multiple sensors. it is rather located outside the Rx lens on the world side (occlusion and see-through combiners) or before the Rx lens on the glass frame (VR and large occlusion displays). see-through optics for large FOV (>100°) is still a challenge (especially in terms of size and weight). See-through smart glasses provide a contextual display functionality rather than a true AR functionality that requires merge FOV.e. They may incorporate digital imaging with >8 MP still images and high-resolution video feed. targeted for various applications. instead. Smart eyewear is an extension of see-through smart glasses that actually has the look and feel of conventional glasses. They have small displays and small FOV (usually around 10°–20° diagonally). and distribution. However. such as solving the focus/vergence disparity problem (by using light field . Foveal rendering (both in resolution and color) is an active area of research in VR to reduce the computational burden and connectivity bandwidth (i. none or minimal sensors. Such smart glasses may also incorporate prescription (Rx) lenses. no connectivity) and high-cost see-through defense HMDs (complex see-through optics. • Smart glasses: Smart glasses come in different flavors (occlusion or see-through).2  HMD/SMART EYEWEAR MARKET SEGMENTS Traditionally. sensors. but the optical combiner is here not part of the Rx lens.. Smart Eyewear. • AR headsets for consumer and enterprise: These are niche markets that have been proven very effective for specific market segments such as medical. Although the large FOV display challenge has been solved for VR. VR devices will tend to merge with large FOV AR devices. engineering. high resolution rendering over a few degrees only along the fovea. large field of view [FOV]. Today. such as the following: • Connected glasses: Such eyewear devices usually have no display or have single or multiple pixel displays (individual LEDs). 2008) and provide 3D stereo experience. see Figure 5. thus more bulky optics. firefighting. high resolution. and apps. • Gaming VR devices: VR HMDs have been with us for some time and still look like the devices first developed in the 1990s (bulky and heavy. Most of the occlusion VR headsets are binocular (Takahashi and Hiroka. Augmented Reality 87 5.

keeping the resolution at the eye’s resolving limit of 1. Some of the current offerings are depicted in Figure 5.3). or optical combiner technologies that are able to relocate in real time the entire FOV. 5. both for occlusion and see-through displays. FOV is one of the requirements that may differ greatly from one application market to the other.9. . but have not been implemented yet in commercial systems.. • Defense HMDs: The defense markets will remain a stable market for both large FOV VR (simulation and training) and large FOV AR HMDs (for both rotary wing aircraft and fixed wing aircrafts that tend to replace bulky heads-up display [HUDs]).3  OPTICAL REQUIREMENTS The optical requirements for the various HMD market segments described in Section 5. For more specific VR headset offerings.9. linked both to the target application and to the form factor constraints as summarized in Table 5. and is clearly expressed in the multitude of FOVs and FOV locations developed by industry (see also Figure 5.88 Fundamentals of Wearable Computers and Augmented Reality FIGURE 5.2 arc min). Eye strain should be one criteria when deciding how large and where to position the FOV within the angular space FOV available to the user.2 have very different requirements.1.4 through 5.e.2. refer Sections 5. VR.2  Some of the current AR HMD and smart glasses products. Requirements on the size of the FOV are directly linked to content and its location to the physical implementation and the application (AR. or fracture the available FOV/resolution into different locations are very desirable. Display technologies (such as scanners or switchable optics) that are able to scale in real time the FOV without losing resolution (i. or smart glasses).6. displays?) as well as providing integrated optical solutions to implement occlusion pixels for realistic augmented reality displays.

adjustments) +++ − + (Helmet mounted) − (Dial in) ++++ (Full color) ++ (Multicolor) ++++ (>90°) ++ (Occlusion) − ++ (>30°) ++++ (>500.1) ++++ − (Mono-/ multicolor) ++++ (>100°) ++++ (>500.1 Requirements for the Various HMD Market Segments Specs Smart Glasses VR Headsets Industrial HMDs Defense HMDs Industrial design Power consumption Costs Weight/size ++++ ++++ + − − +++ − − +++ +++ (Forgettable) ++ + + ++ Eye box ++++ (Minor mech. Augmented Reality TABLE 5. Smart Eyewear. adjustments) +++ (Combo/ monolithic) + (Mono.1) +++ ++ (Occlusion display) Binocular 3D +++ +++ Monocular Binocular 2D Rx glasses integration Full color operation FOV System contrast Environmental stability See-through quality Mono-/binocular Monocular − (NA) Note: + means critical. − means not critical.89 Optics for Smart Glasses. +++ means most critical.3  Display FOVs (both occlusion and see-through) developed by full color) – (≤15°) + (≥100. −−− means least critical.1) ++ − (Dial in) − (Dial in ) +++ (Minor mech. . Oculus Rift (115°) Sony Morpheus (90°) Sony HMZ-T2 (51°) Lumus DK-32 (40°) Zeiss cinemizer (35°) Google Glass (15°) Optinvent ORA (24°) Epson Moverio (23°) Vuzix M-100 (16°) Recon Jet (16°) Occlusion display See-through display FIGURE 5.

TABLE 5. with FOV of 15°–20°. Figure 5. As one might expect. the angular resolution tends to decrease when the FOV increases. An angular resolution of 50 DPD corresponds roughly to 1. Figure 5.90 Fundamentals of Wearable Computers and Augmented Reality In order to keep the resolution within or below the angular resolution of the human eye. While the FOV increases.5.2 arc min angular resolution for increasing FOVs (diagonally measured) can be quite large when attempting to implement 20/20 vision in VR headsets for FOV over 100°. the most dense pixel count display is a 2 K display (QHD at 2560 × 1440 on Galaxy Note 5) which would allow such resolution over a FOV of 60° only. 4 K displays (3840 × 2160 by Samsung) will be available. The FOV and the resulting resolution for various available HMDs today are listed in Table 5. which is the resolution of the human eye (for 20/20 vision). which is the minimum for VR but quite large already for AR applications. scaling large FOV is today a real challenge for immersed VR headsets. Next year. lower FOV is favored by smart glass and smart eyewear applications. Today. Indeed. even when the display resolution increases (see also Section 5. It is also interesting to organize the various existing HMDs on a graph showing the FOV as a function of the target functionality (smart glasses. professional AR applications tend to be favored. For smart glasses.2 arc min. which require very large FOV. occlusion VR gaming devices are the preferred application. The pixel counts to achieve the 1. and for maximal FOV.25:1 1. pushing the resolution up to nearly 100°. Dot per degree (DPD) replaces the traditional resolution criteria of dot per inch (DPI) as in conventional displays.2 FOV and Resulting Angular Resolution for Various Devices Available Today Device FOV Resolution Aspect Ratio Pixels per Degree Google Glass Vuzix M100 Epson Moverio Oculus Rift Zeiss Cinemizer Sony HMZ T2 Optinvent ORA Lumus DK40 15 16 23 115 35 51 24 25 640 × 360 400 × 240 960 × 540 800 × 640 870 × 500 1280 × 720 640 × 480 640 × 480 16:9 16:9 16:9 1. Several major display companies have been developing 4 × 2 K displays over a regular cell phone display area. which should be able to address a high FOV and decent angular resolution for VR systems up to 100° diagonal FOV. As one can expect. or VR)—see Figure 5. and thus also very dense pixel count. nHD (640 × 360 pixels. AR. a high DPI can result in a low DPD as the FOV is large. one ninth of full HD) or at best 720p resolutions are usually sufficient.4b shows how a 16:9 aspect ratio pixel count scales with FOV.2.74:1 16:9 4:3 4:3 48 28 48 9 28 46 33 32 .4).4a shows the angular resolution of some available HMDs in industry as a function of FOV.

AR 60 80 100 120 VR (OLED) 140 160 180 FOV (diag.2 arc resolution over a 16:9 screen for various FOV 40 4 K UHD (3840 × 2160) 30 25 2 K QHD (2560 × 1440) 20 1080p (1920 × 1080) 15 720p (1280 × 720) 10 nHD (640 × 360) 5 0 (b) Sony HMZ2 Samsung annoucement for 2015 Galaxy Note 5 0 20 40 Smart glasses. etc.4  OPTICAL ARCHITECTURES FOR HMDs AND SMART GLASSES We have seen in previous sections that there are very different application sectors for HMDs. Such optical tools include refractives. AR. within this optical zoo. waveguides. deg) 8 K SHD (7680 × 4320) 35 Size of display (Mpix) 40 Oculus DK2 Pixel counts required to achieve 1. Fresnel. immersed reflectives. Smart Eyewear. there are only two main ways to implement a seethrough (or non-see-through) optical HMD architecture: the pupil forming or the nonpupil-forming architectures (see Figure 5. diffractives. segmented reflectives. Most of the tools available to the optical engineer in his toolbox have been used to implement various types of smart glasses.6a). Augmented Reality Eye resolution limit (1. there is . However. reflectives. (b) Pixel counts as a function of FOV. It is therefore not surprising to see that there are very different optical architectures that have been developed to address such different requirements. and VR devices.2 arc min or 50 pixels/deg) 50 Glass 40 Moverio Optinvent Lumus Cinemizer Vuzix m100 30 20 Sony Morpheus 10 Resolution measured in DPD rather than DPI 10 (a) 20 30 50 60 70 Oculus DK1 80 90 100 110 FOV (diag. lightguides. 5. both on optical performance and form factor. MEMS. relying on very different optical requirements. deg) FIGURE 5. holographics. catadioptric.4  (a) Angular resolution as a function of FOV for various existing HMDs.91 Angular resolution (pixels/deg) Optics for Smart Glasses. In the pupil-forming architecture.

5  Smart glasses. AR. and VR as a function of FOV.VR 92 VR AR Moverio Very large FOV—occlusion—binocular Laster AR Lumus Large FOV—see-through—mono or binocular Optinvent Smart glasses Glass Connected glasses Vuzix m100 Olympus Medium FOV—see-through—monocular (Rx integration) Small FOV—occlusion—monocular Single pixel display—see-through 10 50 100 FOV (deg) FIGURE 5. Fundamentals of Wearable Computers and Augmented Reality Sony HMZ2 Oculus rift Sony Morpheus .

Non-pupil-forming architecture (magnifier) (a) Oculus Rift Occlusion Sony HMZ Huge FOV Vuzix Large eye box … do it yourself Vuzix M100 Partially occluded MyVu Small FOV … Medium eye box (b) Laster Sarl ODA Labs .Optics for Smart Glasses.6  (a) Pupil forming and non-pupil-forming optical architectures for HMDs. Smart Eyewear... (b) occlusion display magnifiers (VR).. Holographic reflector Mono or full color Large FOV Temple projector (c) FIGURE 5. Bug eye optics Large FOV Medium eye box Temple projector SBG Labs Corp Composyt Labs .. Pupil-forming architecture 2. (Continued ) . (c) see-through free-space combiner optics. Augmented Reality 93 1.

(e) see-through TIR freeform combiner optics. (Continued ) . (d) Without complement piece Canon Ltd Motorola HC1 Kopin Golden eye Distorted see-through (or opaque) Medium FOV Medium eye box With complement piece Imagine Optics Fraunhoffer Good see-through Medium FOV Medium eye box (e) FIGURE 5.6 (Continued )  (d) see-through light-guide combiner optics.94 Fundamentals of Wearable Computers and Augmented Reality MyVu Corp Distorted see-through Vuzix Corp Small FOV Olympus Ltd Medium eye box Kopin Corp Google Glass Good see-through RockChip Ltd small FOV ITRI Taiwan Medium eye box OmniVision Inc.

. • As the object is an aerial image (thus directly accessible—not located under a cover plate as in the microdisplay). Smart Eyewear. Augmented Reality Volume holographic combiner Curve coated reflector combiner Epson Ltd Moverio 1 & 2 .6 (Continued )  (f) see-through single mirror combiner optic. an aerial image of the microdisplay formed by a relay lens. Although the non-pupil-forming optical architecture seems to be the simplest and thus best candidate to implement small and compact HMDs. diffractive elements. and (g) see-through cascaded extractor optics. This aerial image becomes the object to be magnified by the eyepiece lens.. the pupil-forming architecture has a few advantages such as the following: • For a large FOV. as in a non-pupil-forming architecture. OK see-through Medium FOV Medium eye box Konica Minolta . a diffuser or other element can be placed in that plane to yield an adequate diffusion cone in order to expand. Other exit pupil expanders (EPEs) can also be used in that pupil plane (microlens arrays [MLAs].. Good see-through Medium FOV Small V eye box (f ) Microprism combiner Optinvent Sarl See-through OK Medium FOV Large eye box Injection molded Cascaded coated mirrors combiner Lumus Ltd See-through OK Medium FOV Large eye box All glass various coatings Volume holographic combiner Diffractive combiner Sony Ltd Good see-through Vuzix/Nokia Medium FOV (M2000AR) Large eye box BAE Q-sight Photopolymer (g) FIGURE 5. etc. the microdisplay does not need to be located close to the combiner lens (thus providing free space around the temple side). . for example. the eye box of the system..).95 Optics for Smart Glasses.

reflective optics might be used (right side of Figure 5. making them more complex surfaces than standard on-axis surfaces as in (1). Most of the defense HMDs are using the pupil-forming architecture. in order to produce the largest eye box. 3. this is a three-surface freeform optical element. and operate in off-axis mode. Multiple TIR bounces (>3) have also been investigated with this architecture. 5. the eye box tends to be reduced. and third surface partially reflective. segmented (Fresnel-type). or curved mirror as a single extractor or a leaky diffractive or holographic extractor. 4. For perfect see-through. Immersion display magnifiers (Figure 5.2) are using the non-pupil-­forming architecture. In see-through mode. as thin elements or immersed in a thicker refractive optical element.6d): Very often these architectures are not really lightguides. thus providing for hear wrap instead of straight optical path as in the non-pupil-forming architecture. See-through lightguide combiner optics (Figure 5. Most of the consumer offerings today (see Figure 5. See-through freeform TIR combiner optics (Figure 5. When the guide gets thin. on flat or curved substrates. The aerial image can be bounced off at grazing incidence through a mirror or a prism.96 Fundamentals of Wearable Computers and Augmented Reality • The optical path can be tilted at the aerial image plane. since any light reflecting (through TIR) from the surfaces might produce ghost images (or reduce the contrast) rather than contributing to the desired image. a compensating element has to be cemented on the partially reflective coating. or reflective diffractive/holographic (Kress et al. See-through free-space combiner optics (Figure 5..6d). keeping it from being affected by hair. It is very desirable in occlusion displays since it allows the relocation of the display on top or on the side and can allow for larger FOV. and VR devices are quite diverse. 2008).6f): This is a true TIR guide that uses either a partially reflective. 2009) in order to reduce the curvature and thus their protrusion. The optical platforms used to implement the optical combining function in smart glasses. They can be grouped roughly into six categories: 1. flat.6b): These are magnifiers placed directly on top of the display for maximum FOV (such as in VR devices) or further away in a folded path such as in smaller FOV smart glasses. the light field is constantly kept inside plastic or glass.6e): This is a classical design used not only in see-through combiners but also in occlusion HMDs (Talha et al. over single or multiple surfaces. They might be reflective. They may be implemented as conventional lenses or more compact segmented or Fresnel optics. smart eyewear. See-through single mirror TIR combiner optic (Figure 5.6c): Such optics are usually partially reflective (either through thin metal or dichroic coatings). Such surfaces can also be freeform to implement large FOV. The combiner element (flat or curved) as seen by the eye should have the widest extent possible. first surface transmissive. etc. However. 2. This is why the combiner mirror (or half-tinted mirror) .. scatter from dust. Typically. second surface TIR. AR.

1998). such as depicted in Figure 5. producing. ranging from dichroic mirrors to partially reflective prism arrays and variable efficiency reflective and transmission holographic extractors (Kress and Meyrueis. Smart Eyewear.Optics for Smart Glasses. Augmented Reality 97 should be oriented inside the lightguide in such a way that the user sees the largest possible combiner area. the Bragg selectivity in volume holograms (index modulation in the material) cannot be implemented as a surface relief diffractive element. distortion. cascaded extractors (Thomson CSF. or efficiency.7  Diffractive and holographic optics implementing various optical functionalities.7. therefore. The issues related to potential eye strain are more complex when dealing with bi-­ocular or binocular displays (Peli. although there has been extensive research and development for stereoscopic displays for the consumer market. See-through cascaded waveguide extractors optics (Figure 5. 5. A diffractive element is. The Bragg ­selectivity of volume holograms is a very desirable feature that has already been implemented in defense HUDs for decades. Although the optical phenomenon is similar (diffraction through material m ­ odulation). Only recently have volume holograms been applied to AR headsets and Holographic and diffractive optical elements Holographic optical elements (HOEs) Diffractive optical elements (DOEs) Sandwiched “goop’’ with index modulation Surface relief modulation Beam splitter Engineered diffusers DOE/aspheric lenses Microlens arrays (MLAs) CGH Grating/Beam redirection (custom pattern projection) Beam shaping/beam homogenizing FIGURE 5. Diffractive (surface relief modulation) and holographic (material index modulation) optics are similar in nature and can implement various optical functionality. The combiner is here a holographic or diffractive optical element. the largest possible eyebox. the optical effects are very different. without compromising image resolution. 1991) have been investigated. Most of the HMDs we review in this chapter are monocular designs. .5  DIFFRACTIVE AND HOLOGRAPHIC EXTRACTORS We describe here a particular type of optical combiner that can be implemented either in free space or in waveguide space. however. For example. 2009). 6. easier to replicate via embossing or injection molding or a combination of both.6g): In order to expand the eye box from the previous architectures (especially #5).

98 Fundamentals of Wearable Computers and Augmented Reality Hologram type Transmission hologram Angular selectivity η (%) @550 nm η (%) @30° 100 100 0 Reflection hologram Spectral selectivity 30 60 α(°) η (%)@550 nm 450 550 650 λ(nm) 550 650 η (%) @30° 100 100 0 30 60 α(°) 450 λ(nm) FIGURE 5. (a) (b) (c) FIGURE 5. .9  Examples of holographic and diffractive combiners such as: (a) Free space Digilens and Composyt Labs smart glasses using volume reflection holograms. (b): Flat Nokia/Vuzix/Microsoft and flat Microsoft “Hololens” digital diffractive combiner with 2D exit pupil expanders.8  Angular and spectral bandwidths of reflection and transmission holograms. (c) Konica–Minolta full color holographic vertical lightguide using a single RGB reflective holographic extractor and Sony monocolor waveguide smart glasses using 1D reflective holographic in–coupler and exit pupil expander out–coupler.

9). For obvious aesthetics and wearability reasons. allowing easy viewing of the entire FOV by users having different interpupillary distances (IPDs) or temple-to-eye distances.A. it is the criteria that has also the loosest definition.S. operating in either free space or TIR waveguide modes.6g.Optics for Smart Glasses. The FOV of the display is thus usually limited by the angular spectrum of the hologram. Smart Eyewear. EYE BOX. it is necessary to use reflection-type holograms (large angular bandwidth and smaller spectral bandwidth). for reflection and transmission volume holograms.A. loosing the display is quite subjective and involves a combination of resolution. at the eye relief (or vertex) distance.S.). without loosing the edges of the image (display). Figure 5.A. 5th percentile 70 mm Adult male (U. 5th percentile 65 mm Adult female (U. 95th percentile 41 mm Child. modulated by the spectral bandwidth.A.10). The IPD is an important criterion that has to be addressed for consumer smart glasses. in order to cover a 95 percentile of the potential market (see Figure 5.). it is desirable to have the thinnest combiner and at the same time the largest eye box Interpupillary distance 55 mm Adult male (U. . In order to reduce spectral spread (when using LED illumination) and increase angular bandwidth (in order to push through the entire FOV without uniformity hit).6b and a waveguide operation is depicted in Figure 5. 95th percentile 53 mm Adult female (U.6  NOTIONS OF IPD. Usually a combination of optical and mechanical adjustment can lead to a large covering of the IPD (large exit eye pupil or eye box). Augmented Reality 99 smart glasses (see Figure 5. but require also a higher index modulation. A free-space ­operation is depicted in Figure 5. low 55 mm Child.10  Interpupillary distance (IPD).S.).S. 5. AND EYE PUPIL Although the eye box is one of the most important criteria in an HMD. making it a complex parameter. and illumination uniformity considerations. high FIGURE 5. The eye box is usually referred to as the metric distance over which the user’s eye pupil can move in both directions. distortion.). A static system may not address a large enough population. EYE RELIEF. Transmission holograms have wider bandwidths.8 shows typical angular and spectral bandwidths derived from Kogelnik-coupled wave theory. However. especially when tri-color operation is required.

e. such complex diffractive structures require subwavelength tilted structures difficult to replicate in mass by embossing or injection molding. and a larger eye pupil (in darker environments) will produce a larger eye box. The EPEs can also be implemented in free-space architectures by the use of diffusers or MLAs. the distance from the cornea to the apex of the lens on the eye side surface. If the combiner is integrated within Rx lenses (such as in smart eyewear). If the combiner is worn with extra lenses (such as in smart glasses).. For holographic combiner and extraction (both free space—Figure 5. The eye box is also a function of the size of the eye pupil (see Figure 5. Designing a thin optical combiner producing a large eyebox is usually not easy: when using conventional free-space optics. A standard pupil diameter used in industry is usually 4 mm.g. the notion of eye relief is then replaced by the notion of vertex distance.9.11  Optical combiner thickness as a function of the eye box size for various optical HMD architectures. various EPE techniques have been investigated to expand the eye box in both directions (Levola.100 Fundamentals of Wearable Computers and Augmented Reality Thickness of combiner e Fr Aesthetic constraint ed v ur ec ac p es er bin Design space for smart glasses m co Lig ith w ide u htg on tor ma olli c s axi E o EP e w/ id eg u Wav Waveguide w EPE Mechanical stability constraint Min IPD coverage Full IPD coverage Eye box size FIGURE 5. upper right example (Nokia/Vuzix AR 6000 AR HMD and Microsoft HoloLens) using a diffractive waveguide combiner with both X and Y diffractive waveguide EPE.6c and waveguide Figure 5. which is based on waveguide optics using cascaded planar extractors. Figure 5.11). the eye relief remains the distance from the cornea to the exit surface of . See. or exit pupil (an eye box of 10 mm horizontally and 8 mm vertically is often used as a standard requirement for today’s smart glasses).12). 2005). the distance from the cornea to the first optical surface of the combiner. Urey and Powell. as in most of the architectures presented in the previous section. However. expect for architecture #6 (Figure 5. The eye box is modeled and measured at the eye relief. EPEs are often based on cascaded extractors (conventional optics or holographics) and act usually only on one direction (horizontal direction). but can vary anywhere from 1 to 7 mm depending on the ambient brightness. the eye box scales with the thickness of combiner (see. Typically. a smaller eye pupil (in bright environments) will produce a smaller effective eye box.6g).6g). 2006. also Figure 5. for example.

Smart Eyewear. . Eye box scales with eye relief Design space Eye box Non-pupil-forming architectures Full IPD coverage Pupil-forming architectures Min IPD coverage Vertex distance Aesthetic constraint Eye relief/ vertex distance FIGURE 5. Augmented Reality Eye box size Eye box is scaled by eye pupil size Eye pupil diam Bright conditions Human pupil 2–3 mm Low-light conditions Human pupil 7 mm FIGURE 5. the eye box reduces when the eye relief increases (see Figure 5. the eye box reduces as soon as one gets away from the last optical element in the combiner. and then get smaller. usually smaller than the nominal eye relief). in pupil-forming architectures (refer also to Figure 5.5). For non-pupil-forming architectures.13  Eye box versus eye relief for various optical HMD architectures. In virtually all HMD configurations. the combiner.12  Eye box size (exit pupil) as a function of the eye pupil diameter.101 Optics for Smart Glasses. However.13). not to the Rx lens. the eye box may actually increase until a certain distance (short distance.

the entire display might be seen indoors (large pupil). it is also the most difficult to implement (front light illumination layers have also been developed for other display systems such as the Mirasol MEMs displays by Qualcomm).7  OPTICAL MICRODISPLAYS The microdisplay is an integral part of any AR or VR HMD or smart eyewear. . from traditional panels including transmission LCD displays to reflective liquid crystal on silicon (LCoS). It may happen that for a specific position of the combiner. to optical scanners such as MEMS or fiber scanners. for any position of the combiner.14). Traditional illumination engines (see Figure 5. to organic LED (OLED) and inorganic LED panels.15.14  The eye box and FOV share a common space. Although edge illuminated front lights produce the most compact architecture. at the target eye relief. but the edges of the display might become blurry outside due to the fact that the eye pupil diameter decreases. with illumination engines as LED back or front lights. using a MEMS projector as a display.16) range from curved polarization beam splitting (PBS) films (large angular and external bandwidth films—left) to PBS cubes (center) with either free-space LED collimation or back light to thinedge illuminated front lights (right).102 Fundamentals of Wearable Computers and Augmented Reality Eye box is shared with FOV Eye box Eye box size FOV Eye box FOV FOV FIGURE 5. 5. or changing the position of the microdisplay and the focal length of the magnifier). Reflective LCoS panels are depicted in Figure 5. the eye box may vary from a comfortable eye box (small FOV) to a nonacceptable eye box blocking the edges of the image (for large FOV). However. Various technological platforms have been used. the eye box has to allow the entire FOV to be seen unaltered. The eye box is also shared with the FOV (see Figure 5. Finally. If the HMD can be switched between various FOV (by either altering the microdisplay. the effective eye box of a smart glass can be much larger than the real optical eye box when various mechanical adjustments of the combiner may be used to match the exit pupil of the combiner to the entrance pupil of the user’s eye.

producing compensation for visual impairments or various depth cues. In order to provide efficient light usage and reduce ghosting for most of the architectures described in Figure 5. Directionality of the emission cone . which. the emission cone of microdisplay should remain partially collimated.6a. Such panels are emissive displays. will produce an intensity pattern in either the far or the near field (see Figure 5.Optics for Smart Glasses. When using an EPE. In order to integrate them either in an HUD or an HMD system.6a) and an EPE (such as a diffuser). they have to be used in combination with a combiner optic (see Figure 5.16  Illumination engines for LCoS microdisplays. The efficiency of either LCoS or LCD transmission microdisplays remains low (2%–4% typically). upon coherent laser illumination. FIGURE 5. which reduces the attractiveness of that technology. HoloEye product). which do not require a additional backlight (or front light). wasting light after the panel. Augmented Reality 103 FIGURE 5. Such phase LCoS can be considered as dynamic computer-generated holograms (CGHs) and can therefore implement either Fresnel (near field) or Fourier (far field) patterns. an intermediate 2D aerial image has to be produced. but produce a Lambertian illumination.15  Liquid crystal on silicon microdisplays. Smart Eyewear.17a. Producing dynamically images appearing at different depths without moving any optical element is a very desirable feature in HMDs and smart glasses. OLED as well as inorganic LED panels are exciting alternatives to transmission LCD or LCoS microdisplay technologies. Phase LCoS microdisplay panels can also be used to produce a phase image.

Such OLED panels can also be used as bidirectional panels. R&D in most OLED companies focus on removing such color filters and patterning directly OLED pixels at sub-10-µm size. integrating a detector on the same plane. Today. This reduces the efficiency of the panel.104 Fundamentals of Wearable Computers and Augmented Reality (a) (b) FIGURE 5.17b). most of OLED microdisplay panels do have a silicon backplane. is also a desirable variable to control in order to increase efficiency of the combiner optics. a requirement for the high density of the pixels (pixels smaller than 10 µm). Instead of using directly patterned OLED pixels. Curved panels such as with OLED technology might help in relaxing the constraints on the optics design especially in off-axis mode or in large FOV VR headsets (see Figure 5. this object plane can be used as a degree of freedom in the global optimization process of the combiner optics. they use color filters on a single OLED material.18). thus enabling easy eye gaze tracking (see Figure 5. Instead of working with an object plane that is static (planar). (b) bidirectional OLED panel used in HMD (Fraunhoffer Institute). .17  (a) Phase LCoS microdisplays (HoloEye PLUTO panel) as dynamic computergenerated holograms for far field or near field display. OLED panels have been used to implement AR HMDs. because of the pixel density.

Smart Eyewear.17. PicoP display engine Combiner optics MEMS scanning mirror R G B Layers Projected image MEMS Mobile device with embedded PicoP FIGURE 5. eye gaze tracking can be integrated in a bidirectional scheme. Most scanners use laser light.19) but cannot produce an acceptable eyebox without forming an intermediate aerial image plane to build up the eyebox (optional diffusion at this image plane may also creates parasitic speckle when used with laser illumination). An alternative to micromirror MEMS scanner is the vibrating piezo fiber scanner (such as with Magic Leap’s AR HMD). One of the advantages of both MEMS or Fiber scanners is that the effective FOV can be rescaled and/or relocated in real time without loosing efficiency (provided the .20) or can be used as a digital image sensor (in reverse mode). making them ideal candidates for HMD pattern generators.19  MEMS micromirror laser sources for Pico projectors and HMDs. Such a fiber scanner can be used either as an image-producing device and thus integrated in an HMD system (see Figure 5. MEMS micromirror laser scanners are desirable image generators for HMDs (since the laser beams are already collimated—see Figure 5. and the laser or LED source can be located away from the fiber end tip (unlike in a MEMS scanner which is free space). These devices are very small. Augmented Reality FIGURE 5. as fiber can also be used in reverse mode. speckle should not appear to the eye. However. such as in the bidirectional OLED device in Figure 5. if there is no diffuser in the optical path.18  Curved OLED panels can relax the design constraints for HMD combiner optics. which can produce speckle (which has then to be despeckeled by an additional despeckeler device).105 Optics for Smart Glasses. Furthermore.

25:1 16:9 1.3 Microdisplays and Image Generators Used Today in HMD/Smart Glass Devices Device Type Display Resolution Google Glass Vuzix M100 Epson Moverio Oculus Rift DK1 Oculus Rift DK2 Silicon microdisplay ST1080 Zeiss Cinemizer Sony HMZ T3 Sony Morpheus Optinvent ORA Lumus DK40 Composyt Labs See-through Opaque See-through Opaque Opaque Opaque Opaque Opaque Opaque See-through See-through See-through Vuzix/Nokia M2000AR See-through LCOS LCD LCOS LCD OLED LCOS OLED OLED OLED LCD LCOS Laser MEMS scanner Laser MEMS scanner 640 × 360 400 × 240 960 × 540 1280 × 800 1920 × 1080 1920 × 1080 870 × 500 1280 × 720 1920 × 1080 640 × 480 640 × 480 Res and FOV can vary dynamically Res and FOV can vary dynamically Aspect Ratio 16:9 16:9 16:9 1.106 Fundamentals of Wearable Computers and Augmented Reality Head strap Brightness control Video camera and IR LEDS Scanning fiber display tube FIGURE 5. TABLE 5.74:1 16:9 16:9 4:3 4:3 Can vary Can vary .20  Vibrating piezo fiber tip produces an image and integrated in an HMD via a free-space optical combiner.

However. Most of smart glasses available today use a combination of glasses and optical combiner (two physical optical elements).22). The most straightforward way to combine Rx lenses and a combiner is to place the Rx lens in between the eye and the combiner (case 1. or convex-convex lenses might then not be used (such lens shapes would however allow for easier integration of flat optical combiner). Augmented Reality 107 combiner optic can still do a decent imaging job at such varying angles).24 shows some of the current products that combine two different optical elements. 5. which may not be fulfilled by using conventional optical elements. it might not be acceptable for aesthetic and weight reasons for the consumer market. microoptics. concave-plano. VR. Figure 5. but not in LCD or LCoS panels in which the backlight would always illuminate the entire display. This is possible to a certain degree in emissive panels. holographic. plano-convex. Table 5. The requirements for medical eyewear is very different than for consumer eyewear and may allow for much thicker eyewear implementing a large eye box combiner within a thick meniscus lens providing adequate eyewear prescription (As in Essilor/ Lumus smart glasses providing relief for age-related macular degeneration or AMD). such as segmented optics.a in Figure 5. it does not produce an acceptable viewing experience for farsightedness. and may require more complex optics. Furthermore. Integrating the optical combiner inside a conventional Rx meniscus lens is thus a complex challenge. While addressing visual impairment issues.8  SMART EYEWEAR The combination of optical combiner and prescription glasses. Integrating a flat optical element inside a curved meniscus will produce a thick lens. and smart glass offerings.3 summarizes the various image sources used for some of the current HMD. producing specific constraints for the users’ adaptation to the display (see Figure 5. The Rx lens is worn between the eye and the combiner (producing an effective correction for nearsightedness for both world and digital displays. . LCoS. but lacks in compensating farsightedness for the digital display).23).Optics for Smart Glasses. It is a difficult optical exercise if one desires to have a single optical element implementing both functionality—optical prescription including cylinder and optical combining—without any complex mechanical adjustments. Smart Eyewear. cylindrical and prism compensations is a de facto requirement for future smart eyewear (see Figure 5. planar. The glasses can be located either before or after the optical combiner. a combiner optic and an Rx lens. Most of the image generation sources are used today (LCD. MEMS). While this might be sufficient to correct for nearsightedness. a mechanical adjustment might not be suitable for smart eyewear. many HMD manufacturers allow the display to be adjusted in regard to the magnifier and produce an image appearing at various depths. In order to compensate for simple nearsighted and farsighted vision. The integration of Rx lenses based on spherical.22). or diffractive structures. as the only acceptable shape for an Rx or plano lens is a meniscus (for aesthetic as well as size reasons) (see Figure 5. or plano sunshades is crucial for its adoption by consumers. OLED.21).

” is the left eye. “O.25 +2. it gets added to the regular sphere prescription to get the near vision prescription.00 Uncorrected Corrected with lenses Farsightedness (Hyperopia) These numbers describe any astigmatism. A “+” means the prescription is farsighted.00 O. “O. +4.” is the right eye. Fundamentals of Wearable Computers and Augmented Reality The “spherical error” (that is. nearsighted or farsighted). the stronger the prescription.108 Nearsightedness (Myopia) Tells you which eye the prescription is for. A “–” means the prescription is nearsighted. S. Uncorrected Corrected with lenses FIGURE 5.21  Prescription compensation in combiner optics for smart eyewear to address a large potential consumer market.50 090 +2. +4.D. D. Some prescriptions simply list “L” and “R”. The “Cyl” number indicates the severity of the astigmatism.50 090 +2.S. . The higher the number after the + or –. SPHERE CYL AXIS ADO O. This number is used for bifocals. Axis tells you which way the astigmatism is oriented.00 +1.

b) Combiner to be compensated for nearsightedness (myopia) 3.50 –0.b) Generic uncompensated combiner but problematic for farsightedness (hyperopia) 2. Meniscus Biconcave Planoconcave Positive lenses +8.50 –4. Flat or curved combiner requiring TIR and injection port b. Combiner shares outer shape of Rx lens (1.00 –3. Combiner INSIDE Rx lens a. Combiner independent of Rx lens b. . Smart Eyewear.23  The only acceptable lens shape for smart eyewear is a meniscus. Augmented Reality 1.00 –8.22  Integration of optical combiner and prescription lenses.109 Optics for Smart Glasses.b) Combiner to be compensated for both myopia and hyperopia FIGURE 5.00 –5.a) (1.00 –6. Rx lens UNDER combiner a.00 +4.a) (3. Rx lens OVER combiner a.50 All +4D lenses All –4D lenses FIGURE 5. Flat/curved combiner (no TIR + injection port) (3.a) (2.00 Meniscus Biconcave Planoconcave Negative lenses +4. Combiner shares inner shape of Rx lens (2. Combiner independent of Rx lens b.00 +2.50 +0.00 +6.

smart glasses.24  Available prescription lenses implementations in smart glasses: (a) Google Glass and (b) Lumus Ltd. but high-resolution camera. with Bluetooth and/or WiFi connectivity (see Figure 5.9.1  Display-Less Connected Glasses No display here. smart eyewear. (Continued ) . AR. and VR headsets.25a). lon Smart Glasses Single LED alert light Geco eyewear Mita Mamma eyewear Fun-lki eyewear Life Logger Bluetooth WiFi camera headset (a) FIGURE 5. 5.9  EXAMPLES OF CURRENT INDUSTRIAL IMPLEMENTATIONS We review in this section some of the current offerings available on the market for connected glasses. 5.110 Fundamentals of Wearable Computers and Augmented Reality (a) (b) FIGURE 5.25  (a) Connected glasses available on the market.

STAR 1200 Augmented Reality System Vuzix Corp.25 (Continued )  (b) occlusion smart glasses available on the market.. (d) pseudo see-through tapered combiner device from Olympus. (Continued ) . MyVu Corp. (b) Google Glass. (c) seethrough smart glasses available on the market (Google Glass and various copy cats. Mountain View OmniVision. Taiwan (c) (d) FIGURE 5. + Laster SARL p­ roduct). Santa Clara Laster Sarl. Smart Eyewear. Augmented Reality 111 MicroOptical Corp. France ChipSiP and Rockchip.Optics for Smart Glasses. Google.

000 Cones 20.000 40.000 Cones 0 70 60 50 40 30 20 10 0 10 20 30 40 50 60 70 80 Angle (°) (g) FIGURE 5.000 140.000 60. (g) foveated rendering in high pixel count VR systems.112 Fundamentals of Wearable Computers and Augmented Reality Oculus Sony Silicon microdisplay ST1080 (e) Barrel distortion (in-engine) Pin-cushion distortion (from rift lenses) No distortion (final observed image) (f) Number of receptors per square millimeter 180.000 Rods Rods 120.000 80. (Continued ) . linked to gaze tracking. (f) oculus latency sensor (left) and optical distortion compensation (right) through software in VR.000 100.000 Blind spot 160.25 (Continued )  (e) some of the main VR headsets available today on the market.

 (Continued ) .113 Optics for Smart Glasses. Japan Lightguide Lumus Ltd.. Israel Cascaded mirrors Sony Ltd. Smart Eyewear. Japan Holographic (j) FIGURE 5.25 (Continued )  (h) focus/vergence accommodation disparity in stereoscopic VR systems. and IPD Left display IPD (interpupillary distance) IOD (interocular distance) ISD (interscreen distance) Right display (i) Optinvent. IOD.. Augmented Reality Focus accommodation/vergence disparity in stereoscopic VR systems Apparent object location Vergence distance Focus distance Focus distance Vergence distance Real object location Virtual image location Screen location VR magnifier lenses (h) Real 3D scene Stereoscopic scene in VR headset Managing the vergence with ISD. (j) consumer AR systems available on the market. France Microprisms Epson Ltd. (i) managing the eye convergence in stereoscopic based VR headsets..

114 Fundamentals of Wearable Computers and Augmented Reality (k) (l) FIGURE 5. Some of them (MicroOptical and Vuzix) have been available since the end of the 1990s as personal video players (no connectivity or camera). with the additional feature of Rx lenses. . and engineering.3 See-Through Smart Glasses See-through smart glasses combine both the characteristics of the previous monocular occlusion smart glasses with small FOV and the higher-end AR see-through HMDs. and (l) specialized AR headsets for medical and surgical environments.9.2 Immersion Display Smart Glasses Such immersive head worn displays were the first available consumer electronic HMDs. An early example is Google Glass. Rockchip.9. upper left).. in a smaller package. MicroVision). 5.g.25b. with a few copy cats (ChipSiP.25 (Continued )  (k) specialized AR headsets for law enforcement. 5. firefighting. the monocular Vuzix M100 smart glass on Figure 5. More recent designs include both camera and connectivity (Bluetooth and WiFi) as well as an operating system such as Android (e.

the performances are very different.Optics for Smart Glasses.4  Consumer Immersion VR Headsets VR headsets seem to be a Phoenix rising from the ashes of the defunct 1990s VR wave. This is solely technology related (more pixels. The focus/vergence disparity is a function of the IPD and the ISD. producing the sickness sensation for the user. however.25e shows the main contenders (Samsung recently disclosed a high-resolution VR headset with Oculus and Zeiss developed the VR-One). the resolution of the display (1080 p). An interesting alternative optical architecture has been developed by Olympus and others (Vuzix Pupil.9. AR headset systems require larger FOV centered on the user’s line of sight (see also Section 5. accelerometer. and especially the sensors (gyro. Augmented Reality 115 and other designs such as Laster. better sensors. some of them are related simply to technology.25g) is another key technique which will help alleviate the computing and bandwidth requirements for next generation VR systems having very high pixel counts. magnetometer) and their latency to display refresh (<20 ms). and thus yields the correct eye convergence (also called vergence). In a true 3D experience. shorter pixel latency). Solving for the focus/vergence disparity in conventional VR systems (based on stereoscopic imaging) is a complex task. The combiner is here an occlusion tapered combiner using a 45° mirror.25h) in stereoscopic displays such as the ones used in most of the VR systems today.9. but the end tip of the combiner is smaller than the usual 4 mm diameter of the eye pupil. faster electronics. In fact. For a stereoscopic display such as most of the VR systems today. low latency. other VR-induced sickness issues are more related to the architecture. Eye gaze tracking (see also Section 5. not an AR system: their limited FOV (limited mainly by the size of the optics) and angular offset of such FOV make them best suited for contextual display applications rather than for true AR applications. and some are related to the sickness they induce in some people. in which combiner is located vertically (see Figure 5. some are related to the market.11) is a key component for a true foveated rendering in VR systems. making it thus pseudo transparent to the far field (much like when one takes a knife edge close to the eye and can see through the edge).25f). the eye focus accommodation conflicts with the eye vergence. such as the focus/convergence disparity (see Figure 5. Smart Eyewear. Low latency (<20 ms) and software compensation for optical distortion and lateral chromatic spread are some of the key components of an effective VR system such as the Oculus Rift DK2 (see Figure 5. 5. Foveated rendering (see Figure 5. Figure 5. Sony Smart Glasses.5). and thus reduce sickness related to such.25d. Telepathy One)—see Figure 5. There are many problems on the road to the perfect VR system. However. both in the content. (interscreen distance) as well as the . Dense pixel counts and very low latency (<20 ms) have enabled a much smoother experience. the eye’s focus accommodation is not conflicting with the true location of the object. although the external aspects and the optics remains similar.25c). VR (as well as AR) sickness is occurring at various levels and have been responsible partly to the flop of the VR market in the 1990s. smoother transitions. Such see-through smart glasses or smart eyewear is. especially.

The simplest solution is again to introduce standard Rx lenses in between the combiner and the eye (such as described in Section 5. it is also important to occlude the reality when a digital image is displayed in the field. such as the products shown in Figure 5. Integrating such occlusion pixels in AR systems can be done. as described also in Figure 5. One (simple) way to implement occlusion pixels in AR systems is to take a video of reality and digitally superimpose onto the video feed any virtual objects to be viewed in a conventional VR environment (opaque screen). over FOV around 30°–40° diagonally. especially for the defense market.25i. Due to their moderate FOV (30°–40°) and the position of the virtual image directly in the line of sight. Microsoft’s Hololens AR combiner is similar to the Nokia/Vuzix diffractive combiner.6g (Epson Moverio bt200) is a single halftone mirror lightguide combiner relatively thick (10 mm).8). All three other configurations (Optinvent ORA—upper right. see Figure 5.5  Consumer AR (See-Through) Headsets AR headsets have been developed since a few decades. Another way to reduce such disparity in VR and AR systems (and therefore the associated sickness) is to use non-stereoscopic 3D display techniques such as dynamic holographic displays (using Fresnel diffractive patterns producing virtual images at specific distances). Consumer AR systems have recently been introduced. and Sony Smart Glass—lower Right—using holographic waveguide extractor) are waveguide combiners with 1D EPEs.6g. they are good candidates for true AR applications. Although these AR devices are integrated in glass frames. . 5. Integrating Rx lenses on top of such combiners is a difficult task and produces often clunky looking glasses.116 Fundamentals of Wearable Computers and Augmented Reality IOD (interocular distance). this is of course not true AR.25j. The upper left example in Figure 5. with unorthodox lens curvatures. Lumus—lower left—using cascaded dichroic mirrors. under a vast variety of different optical architectures. by using a pupil forming architectures with an LCD shutter on the image plane. For true AR experience. the combiners having the annoying requirement to be perfectly flat to function efficiently. so that this image appears as a real object and not a transparency over the field. using microprism array. simply because it is very difficult to do with traditional optics. However. Adjusting the screens and/or the lenses or using dynamic lenses can provide a potential solution. therefore creating the illusion of seamless integration of reality and virtuality. such as in the Lytro camera system). are another elegant way to produce a true 3D representation accommodating both eye vergence and focus. for example.9.25j as well as figures in next section) do not implement any occlusion displays. Light field displays such as the ones developed by nVIDIA (Douglas Lanman) or Magic Leap Inc. Most of the AR systems today (see Figure 5. they do not integrate any Rx lenses. light field display techniques are still in their infancy and require complex rendering computation power as well as higher display resolutions (similar constraints as the light field imaging. Such waveguide extractors have the benefit of producing a large horizontal eye box (due to the EPE effect of the linear cascaded combiners) in a relatively thin lightguide (<3 mm). However.

). 5. long before the current boom in consumer smart glass and VR headsets. Smart Eyewear. 2005). There are however a few other. One example is the single pixel display contact lens developed by Prof. Joe Ford. and the Innovega HMD based on a dual contact lens/smart glass device (see Figure 5. UCSD. Kopin-Verizon Golden Eye. .25k shows some of these devices (Motorola HC1.6e (non-see-through freeform TIR combiner). Babak Parviz at University of Washington. etc. more exotic. Other specialized markets include medical applications.7 the various optical architectures developed in industry to produce most of the devices described in the previous section. They might also include specialized gear such as FLIR camera and special communication channels.9. optical implementations that are based on nonconventional combiner architectures. Interestingly. and will remain a relatively small but steady market for such devices in the future. firefighting. contact lenses have already been used to implement various functionality. such as a telescopic contact lens (Prof. In one implementation. 2012) and the glucose sensing and insulin delivery contact lens at Google [X] labs in Mountain View.1  Contact Lens-Based HMD Systems As contact lenses are the natural extension of any Rx lenses.10. and engineering (Hua and Javidi.25l shows some implementations based on an Epson Moverio see-through AR headset (also shown in Figure 5.26). Wilson. Medical applications have been a perfect niche application for AR and smart glasses. Augmented Reality 117 5. and the smart contact lens providing high-resolution display to the user is yet a challenge to be undertaken. and have a price tag only accessible to professional markets (>$5 K). could therefore be contact lenses the natural extension of smart glasses? This is not so easy. and Google Glass (right). for patient record viewing or assistance in surgery (monitoring the patient’s vital signs during surgery. The orthogonal polarization coating filters on the microlens and on the rest of the contact lens provide an effective way to collimate the polarized display light originating from a microdisplay without altering the see-through experience other than in polarization. This said. on the basis of an opaque combiner located in the lower part of the vision.10  OTHER OPTICAL ARCHITECTURES DEVELOPED IN INDUSTRY We have reviewed in Section 5. There are however a few systems developed in research labs that have introduced the use of a contact lens. 2014.Optics for Smart Glasses. recording surgery for teaching requirements. and Canon).6f) (left). Such devices are not consumer devices. The Innovega HMD system relies on a contact lens that has an added microlens surface at its center to collimate the digital display field.6 Specialized AR Headsets A few industrial headsets have been developed for specialized markets such as law enforcement. Figure 5. Figure 5. these devices are usually built around the architecture described in Figure 5. a Motorola HC1 (center). We review these in the following section. 5.

10. U.A. Multiple images are seen under various angles and then presented to the viewer. the microdisplay is located on the temple with a relay lens forming an image on a reflective holographic combiner located at the inner surface of the glass lens. other types of input mechanisms have been developed. hand gesture sensing. which combine in a single image at a particular eye relief distance. the microdisplay is located directly on the inner surface of the glass lens. it provides for an excellent eye box and a large FOV. trackpad input. eye and head gestures sensors. In another implementation. Combo contact lens and glasses Standard OLED or LCD display panel Contact lens Video/audio IC Contact lens with embedded optics Outer filter Eyeglasses Optical filter conditions display light Center filter and display lens FIGURE 5.26  Dual smart glass/contact lens see-through HMD from Innovega. 5. This particular architecture is non-see-through. tablets.27 produces a light field display from a display and an array of microlenses as the collimator.118 Fundamentals of Wearable Computers and Augmented Reality Innovega. 2014).. laptops. For a head-worn computer such as in smart glasses and AR–VR HMDs.11  OPTICS FOR INPUT INTERFACES The good old keyboard and mouse. The example in Figure 5. . providing a true see-through experience. They can be summarized into six different groups: voice control.S. Due to the fact that the collimation lens is located close to the cornea. and that it actually moves with the cornea. and other miscellaneous sensing techniques. eye gaze tracking. have been implemented over the years as very effective input technologies for consumer electronics (computers. light field can also provide effective display functionality (Hua and Javidi. 5. making it thus non-see-through HMD. along with other devices such as pen tablets and touchscreens. or smart watches). smartphones.2 Light Field See-Through Wearable Displays Similarly to light field cameras (such as the Lytro integral camera).

no gloves. etc. CA.Displayed up-photo) image Optics for Smart Glasses. Eye gesture sensors. gyroscopes). Smart Eyewear. . fingers clean. no sweat.27  Light field occlusion HMD from NVidia. 5. Although it might be awkward to touching one’s temple side to control the device. 2005).) 5.11. it is very effective for a few operating conditions (hand available. they may require a quiet surrounding to function properly. (From Dr. Technologies range from conventional flood IR CMOS imaging of glints. not only in HMDs but also in older camcorder devices. Santa Clara. 5. 5. accelerometers.28 shows the five successive Purkinje images used in conventional gaze tracking applications. Augmented Reality Bare microdisplay 119 Near-eye light field display FIGURE 5. no rain..11.1  Voice Control Voice control for HMDs have been implemented directly as an extension of similar technologies available to the smartphone industry.11. Douglas Lanman at NVIDIA.3 Head and Eye Gestures Sensors Thanks to the integrated sensors available in today’s smartphones (magnetometers. by using dedicated optical paths or the reverse imaging path with single or multiple IR source flood illumination or laser structured illumination. the HMD industry was an early adopter of head gesture sensor (even in the early defense HMD years).4 Eye Gaze Tracking Various optical gaze tracking techniques have been implemented in industry. etc. (Curatu et al.2 Input via Trackpad Similarly to voice control. or scanning via waveguides. located often on the temple side of a smart glass can provide for an effective input mechanism.). such as wink and blink have also been integrated without the need for an imaging sensor (using a single pixel detector). Although the microphone(s) are always close to the mouth in an HMD.“Panchind” image (close. a trackpad. Figure 5.11.

P2.29)..30. Glints FIGURE 5. However.6e (freeform TIR combiner with compensation optic). In many combiner architectures.120 Fundamentals of Wearable Computers and Augmented Reality P4 P3 P2 P1 L FIGURE 5. One of them is based on the optical architecture described in Figure 5.29  The first Purkinje reflections (glints) used in eye gaze tracking. Single or multiple IR flood sources may be used to increase the resolution of the tracking (e. The freeform surfaces in the combiner optic added to an extra objective lens on the IR CMOS sensor allows for a very compact gaze tracking architecture including both IR flood illumination and IR imaging (see Figure 5. it is necessary to use either an additional objective lens on the IR CMOS sensor or to position the source outside the display imaging train (see Figure 5. it is desirable to use the same optical path for the display as for the glint imaging. P3.30).31). A cold (or hot) mirror would allow for the splitting between the finite and infinite conjugates. and P4) used for gaze tracking (the glint is the first Purkinje image off the retina—P1). . 4 IR sources located symmetrically to produce good vertical and horizontal gaze tracking). as the display uses an infinite conjugate (image at infinity) and the gaze tracking finite conjugates. The first reflection (the glint) is usually the most used one (see also Figure 5.g.28  Four Purkinje images (P1. More complex optical architectures have been developed to implement the finite conjugate IR imaging task within the infinite conjugate imaging task depicted in Figure 5.

31  Eye gaze tracking in a see-through combiner based on freeform TIR surfaces. but rather to the professional AR market.32). . Smart Eyewear. and computers (see Figure 5. It is however relatively thick and therefore not quite adapted to consumer smart eyewear.5 Hand Gesture Sensing Hand gesture sensing has been implemented in various HMDs as add-ons borrowed from gesture-sensing applications developed for traditional gaming consoles.11.Optics for Smart Glasses.30  Finite (gaze tracking) and infinite (collimation of the display) conjugates in HMDs. This device has been developed by Canon (Japan). 5. NIR sensor Freeform prism orre rm c fo Free Exit pupil Microdisplay NIR LED ctor Freeform combiner with complement piece for seethrough operation FIGURE 5. TVs. Augmented Reality 121 Microdisplay Infinite conjugate Combiner lens IR source Infinite conjugate Combiner lens IR eye gaze sensor Finite conjugate Combiner lens and additional lens FIGURE 5.

These various optical architectures produce different size.6 Other Sensing Technologies A few other. (b) Oculus Rift and Soft Kinetic. TV. input mechanisms have been developed and tested for HMDs and smart glasses input. the challenge resides in reducing thickness. 5.11. and for VR occlusion devices. for devices that are applied to new emerging markets. stemming from the original defense and VR markets as we have known them for decades. smart eyewear. etc. and functionality constrains. time of flight sensor (Soft Kinetic). such as brain wave sensing.33. Adapted sensing such as gaze tracking and hand/eye gesture sensing are being implemented in various HMDs. and occlusion VR HMDs. especially VR headsets. A few of these technologies have been applied to HMDs.122 Fundamentals of Wearable Computers and Augmented Reality FIGURE 5. as well as the use of optical occlusion pixels. and are now commercially available to the public. size. (a) Oculus DK2 and Leap Motion sensor.32  Some of the current hand gesture sensor developers for gaming. and sometimes exotic. All three architectures have been tested with VR systems such as the Oculus Rift DK2. stemming from the earlier efforts produced for conventional gaming devices. the wide FOV thin see-through optical combiner remains a challenge. weight. computer. such as flood IR illumination and shadow imaging (Leap Motion). especially VR headsets. and (c) Soft Kinetic and Meta Glasses. and . see-through AR HMDs. body tapping sensing. and structured IR illumination (Primesense Kinect)—see Figure 5.33  Gesture sensors integrated as add-ons to HMDs. (a) (b) (c) FIGURE 5. and HMDs. 5. For AR devices.12 CONCLUSION We have reviewed in this chapter the main optical architectures used today to implement smart glasses.

Wright. Melzer. the integration of the optical combiner into high wrap sunglass-type lenses or true Rx meniscus lenses is one of the most desired features for next-generation smart eyewear and is also one of the most difficult optical challenges. Curatu. Hua. etc. Applied Digital Optics.. Digital combiner achieves low cost and high reliability for head-up display applications. Raulot. 104–110. 2053–2066. and P. Proceedings of IMechE. Cheng. Kress. Artech House Publisher. Rash. Javidi. C. 2(3). and P.. and J. UK. low sensor/display latency. 5800. 38. .. 467–475. 1991 (Thomson CSF. Velger. B. and K. et al. and J. 5442. Liu. E. Moffitt. 68031N. 2007. and J. France). Head-worn displays: A review. and S. DC. Augmented Reality 123 weight of the magnifier lens and increases its performance over a very large FOV. Takahashi. MacGraw Hill. 1998. such as eye convergence/focus accommodation disparity and motion sickness (internal ear vs.. Projection based head mounted display with eye tracking capabilities.E. 2010. reprinted in 2011. A 3D integral imaging optical see-through head-mounted display. Powell. 103–114. 22(11).. many VR sickness issues remain to be addressed today.076. SPIE Newsroom Illumination and Displays. Projection based head mounted displays for wearable computers. For consumer products such as smart glasses and smart eyewear. Kress. Part C: Journal of Mechanical Engineering Science. Thomson CSF. D. 2009. and B. and S. 199–216. The visual effects of head-mounted display are not distinguishable from those of desk-top computer display. Helmet Mounted Displays and Sights.. Near-eye displays: State-of-the-art and emerging technologies.664 of December 31. 2005. Design of monocular head mounted displays for increased indoor firefighting safety and efficiency.D. US patent # 5. Rolland.. Proceedings of SPIE. Talha. 13484–13491. 1998. from Micro-Optics to Nano-Photonics. Stereoscopic see-through retinal projection head-mounted display.xml. Hua. 1997. http://spie. Wang. Proceedings of SPIE. Head Mounted Displays: Designing for the User. vol. 2004. 2005. Chichester. Boston. Wilson. J. T.. O. Wilson. Applied Optics. 2008. Chang. et al. 14(5). 769009.Optics for Smart Glasses. 2005. 221. Proceedings of SPIE.. H. Y. and P. Y. 2006. D. Peli. 7690. H. J. 2008. and K. FOV). Hua. Meyrueis. Vision Research. Cheng. Martins. B. Design of a compact wide field of view HMD optics system using freeform surfaces. high GPU speed. 1999. Proceedings of SPIE. John Wiley Publisher. M. 5875.M. United States Army Aeromedical Research Laboratory. H. Urey. Proceedings of SPIE-IS&T Electronic Imaging. Design of monocular head-mounted displays. Journal of Display Technology. Proceedings of SPIE. 6803. 2009. 58750J-1. Microlens array based exit pupil expander for full color displays. US Government Printing Office. have all been dramatically improved since the first VR efforts in the 1990s. Proceedings of the Journal of the SID (Society for Image Display).org/x35062. 4930–4936. with a case study on fire-fighting. M. H. Meyrueis. C. WA. perceived motion disparity). R. Head mounted display: Design issues for rotary-wing aircraft. Rolland. Diffractive optics for virtual displays. 2014.. 662416-1. V. Bellingham. Wang. 44(23). 2006. MA. Smart Eyewear. 6624. VR sickness in many users also needs to be addressed for both VR and large FOV AR systems: although display resolution (pixel counts vs. REFERENCES Cakmakci. Washington. Optics Express. J. H. Levola. Hiroka.


.............1 Definition of Energy Function........4......... 145 6................... 130 6..................................3.. 133 6....1.....................1 Camera Calibration for Camera pair................................................................2............ 148 125 .........3 Energy Term for Fiducial Marker Corners...2 Straight Camera Motion.............................4...............2 Energy Term for Epipolar Constraint...............3.4.....5..... 126 6. 127 6........2 Quantitative Evaluation in Simulated Environment............................................................1 Parameterization of Intrinsic Camera Parameter Change... 143 6........... 130 6..... 133 6.....4............. 136 6..................1 Camera Calibration for Each Zoom Value. 135 6................................................... 137 6................... 137 6...........................4..................2............. 131 6........6 Summary.. 135 6.2................................ 132 6................4 Energy Term for Continuity of Zoom Value.............2.....4 Camera Pose Estimation for Zoomable Cameras..............5 Camera Parameter Estimation Results..............................5.............1 Camera Calibration Result........4................ 139 6........................................................5 Balancing Energy Terms......................... 134 6.3........2 Geometric Model for Stereo Camera Considering Optical Zoom Lens Movement.........................1 Background.................................2 Literature Review.............5...............5.............1 Free Camera Motion.........................3 Qualitative Evaluation in Real Environment........6 Image-Based Geometric Registration for Zoomable Cameras Using Precalibrated Information Takafumi Taketomi CONTENTS 6....................4......3 Camera Pose Estimation Using Zoom Camera and Base Camera Pair........................... 131 6............................................2.............................................................4............4....................................4..............2............ 134 6......................... 131 6.........................3 Stereo Camera Case............ 126 6...6 Camera Pose Estimation by Energy Minimization............................1....4......................................... 129 6.......... 130 6......2 Intrinsic Camera Parameter Change Expression Using Zoom Variable.......4............4....3 Marker-Based Camera Parameter Estimation........2........................2 Monocular Camera Case.4..... 147 References........ 140 6........2................................5...................................

Quan and Lan 1999. changes in intrinsic camera parameters such as camera zooming are not used to prevent unnatural sensations in users. More specifically. This is due to the difficulty of estimating extrinsic and intrinsic camera parameters simultaneously. thus. Early augmented reality applications assume the use of head-mounted displays (HMDs) for displaying augmented reality images to users.4. In these applications. Klette et al. Lepetit et al. Removing the limitation of fixed intrinsic camera parameters in camera parameter estimation opens possibilities in many augmented reality applications. In contrast. we focus on the marker-based camera parameter estimation method because this method is widely used in augmented reality applications. the changes in intrinsic camera parameter hardly give unnatural sensations. Section 6. numerous methods have been proposed to solve the PnP problem when the intrinsic camera parameters are known (Fischer and Bolles 1981.6 presents the conclusion and the future work. Section 6. 6. camera parameters are estimated by solving the perspective-n-point (PnP) problem using 2D–3D corresponding pairs.2  LITERATURE REVIEW Many vision-based methods for estimating camera parameters have already been proposed in the fields of AR and computer vision. 2009. There are two groups for solving the PnP problem: the camera parameter estimation under the conditions of known and the unknown intrinsic camera parameters. Wu and Hu 2006. In this chapter. Finally. Hmam and Kim 2010). In AR. When using HMDs. In Section 6. Recently.126 Fundamentals of Wearable Computers and Augmented Reality 6. related works are briefly reviewed. to estimate camera parameters while the intrinsic camera parameters change. 2D–3D corresponding pairs are obtained by using a 3D model of the environment or a feature landmark database (Drummond and Cipolla 2002. augmented reality technology is often used in TV programs. extrinsic camera parameters (rotation and translation) are estimated by assuming fixed intrinsic camera parameters (focal length. aspect ratio.3 introduces general marker-based camera parameter estimation. assuming fixed intrinsic camera parameters is not a problem in conventional augmented reality applications. The framework for estimating intrinsic and extrinsic camera parameters using precalibrated information is described in Section 6. 1998. 2011). and its effectiveness is quantitatively and qualitatively evaluated in Section 6. In these methods. However. Taketomi et al.5.1 BACKGROUND In video see-through-based augmented reality (AR). estimating camera parameters is important for achieving geometric registration between real and virtual worlds. Both methods are extended versions of the markerbased camera parameter estimation method.2. mobile augmented reality applications that use smartphones and tablet PCs have been widely developed in the recent years. . most of the augmented reality applications still assume fixed intrinsic camera parameters. The remainder of this chapter is organized as follows. principal points. two methods are introduced: estimation using monocular camera and estimation using stereo camera. and radial distortions). In general. Most camera parameter estimation methods belong to this category. In addition.

we introduce a general marker-based camera parameter estimation process.3  MARKER-BASED CAMERA PARAMETER ESTIMATION In this section. In our method. This method can achieve more stable camera parameter estimation than that of the method in Bujnak et al. Bujnak et al. These methods are impractical for some AR applications because they require the user to arrange the CG objects and coordinate system manually. 2008) by joining planar and nonplanar solvers (Bujnak et al. Li 2006). 2008). camera parameters should be estimated from four 2D–3D corresponding pairs. 2006). the estimation of the intrinsic and extrinsic camera parameters uses this precalibration information and is based on the Kruppa equation. However. However. for estimating the intrinsic and extrinsic camera parameters.2. Sturm proposed a self-calibration method for zoom lens cameras. Unlike in the PnP problem.Image-Based Geometric Registration 127 Solutions for the PnP problem when the intrinsic camera parameters are not known have also been proposed (Abidi and Chandra 1995. and this method cannot estimate absolute extrinsic camera parameters. Kukelova et al. In contrast to previous methods. proposed a method for estimating extrinsic camera parameters and focal length. In the online process. The idea of this method is similar to that of the method described in Section 6.4. These methods are usually used in 3D reconstruction from multiple images. 6. In these applications. the method that we describe in this section can accurately and stably estimate intrinsic and absolute extrinsic camera parameters using an epipolar constraint and a precalibrated intrinsic camera parameter change. The method (Bujnak et al. most marker-based applications use a square marker (Kato and Billinghurst 1999). 2010). the accuracy of the estimated camera parameters still decreases in this method when the optical axis is perpendicular to the plane formed by the 3D points. Furthermore. Our marker-based camera parameter estimation process can be divided into three processes: marker detection. However. These methods can estimate the absolute extrinsic camera parameters and focal length from 2D to 3D corresponding pairs. Estimated intrinsic camera parameters are constrained by the precalibrated intrinsic camera parameter change. (2010). Stewenius et al. for example. the accuracy of the estimated camera parameters decreases depending on the specific geometric relationship of the points. In this method. Triggs 1999). intrinsic camera parameters are calibrated and then represented by one parameter. . corresponding pairs of 2D image coordinates in multiple images are used (Hartle and Zisserman 2004. a fiducial marker is used to obtain 2D–3D corresponding pairs. To solve this problem. Natural feature points that do not have 3D positions are used to stabilize the camera parameter estimation results. 2005. 2013). proposed the five-point-based method (Kukelova et al. the solution of the Kruppa equation is not robust to noise. they improved the computational cost of the method (Bujnak et al. they cannot estimate absolute extrinsic camera parameters. and camera parameter estimation. marker identification. which uses precalibration information (Sturm 1997). 2010) can be implemented in real time on a desktop computer. Although these methods do not need any prior knowledge of the target environment. in these methods. However. Their method uses a Euclidean rigidity constraint in object space (Bujnak et al. the structurefrom-motion technique (Snavely et al.

extrinsic camera parameters are estimated as unknown parameters. rotation of matrix R.2) where Pi is a 3D position of the fiducial marker feature. These 2D–3D correspondences are used to estimate camera parameters. in this estimation process.1) i where E is the cost pi is a detected 2D position of the fiducial marker in the input image pi′ is a reprojected position of the 3D point of the fiducial marker feature as shown in Figure 6.1 The position of the reprojected point can be calculated using the translation of vector t.128 Fundamentals of Wearable Computers and Augmented Reality A fiducial maker is detected from an input image using image processing techniques such as binarization. the marker is matched against known markers. Pi pi΄ pi World coordinate system Camera coordinate system [R|t] FIGURE 6. and the intrinsic camera parameter matrix K.1  Geometric relationship between a reprojected point and a detected point. .1. 3D positions of the fiducial marker features are associated with the 2D positions of the fiducial marker in the input image. extrinsic camera parameters are estimated by minimizing Equation 6. as follows: pi′ ∝ K �� R | t �� Pi (6. Thus. fixed intrinsic camera parameters are assumed in this camera parameter estimation process. Finally. Note that the distortion factor is ignored in this formulation. In general. After detection and identification. Most camera parameter estimation methods employ the following cost function: E= ∑ p − p′ i i 2 (6. Then.

3. On the other hand. In addition. the method for estimating intrinsic and extrinsic camera parameters is introduced. In the online camera parameter estimation process. Figure 6. However.2 shows the camera parameter estimation framework for zoomable cameras. In Section 6. Calculation of the energy function 4. Third-order spline fitting for each parameter change 3. Refinement of extrinsic camera parameters by minimizing the energy function FIGURE 6. an intrinsic camera parameter change is modeled by calibrating the intrinsic camera parameters for each zoom value. intrinsic camera parameters are fixed in the camera parameter estimation process in augmented reality. 6. the accuracy of camera parameter estimation will decrease when the camera moves with the camera zooming along the optical axis. simultaneous intrinsic and extrinsic camera parameter estimation methods have been proposed in the field of computer vision (Bujnak et al. Camera calibration for each magnification of the camera zooming 2. These methods can estimate intrinsic and extrinsic camera parameters using 2D–3D correspondences. Intrinsic and extrinsic camera parameter esitmation by minimizing the energy function 4. 2008. In the offline process. Calculation of the energy function 3. Extrinsic camera parameter estimation for the reference camera 2.4  CAMERA POSE ESTIMATION FOR ZOOMABLE CAMERAS In this section. intrinsic and extrinsic camera parameters are estimated using precalibration information. KLT-based natural feature tracking between successive frames 2. 2010). results are unstable when the marker features lie on the same plane. These intrinsic camera parameters are obtained in advance by using camera calibration methods (Tsai 1986. Two extrinsic camera parameter estimation methods are introduced: a monocular-camera-based method and a stereo-camera-based method. two methods for overcoming these problems are introduced. Magnification of zoom value estimation 3. Stereo camera calibration (only for stereo camera case) Online stage Monocular camera case 1.2  Flow diagram of camera parameter estimation for zoomable camera. . Zhang 2000). The monocular-camera-based method can be used for general marker-based augmented reality Offline stage 1. Fiducial marker detection Stereo camera case 1.129 Image-Based Geometric Registration In the past. The method can be divided into two processes: offline camera calibration and online camera parameter estimation.

Details of these methods are described in the following sections. In this method. the method described in this section can achieve stable camera parameter estimation during the online process. 6.1  Camera Calibration for Each Zoom Value In this process.2 Intrinsic Camera Parameter Change Expression Using Zoom Variable After getting the intrinsic camera parameters for each zoom value. In this method.4. On the other hand. • f x (m) ³ K(m) = ³ 0 ³– 0 0 f y (m) 0 cx (m) — µ cy (m) µ (6.4) 1 µ˜ By using this expression. In addition. the intrinsic camera parameter change is modeled using camera calibration results for each zoom value. Thus. we assume the intrinsic camera parameter matrix as follows. these four values for each zoom value are obtained by using Zhang’s camera calibration method (Zhang 2000). 2010). 6.4. the relationship of each intrinsic camera parameter change is retained.130 Fundamentals of Wearable Computers and Augmented Reality applications.1. Unlike in previous research that handles the intrinsic camera parameters independently (Bujnak et al. 6. the third-order spline fitting is employed to the result of camera calibration to obtain the intrinsic camera parameter change model for each parameter. the degree of freedom of the intrinsic camera parameters is four. • fx ³ K =³0 ³– 0 0 fy 0 cx — µ cy µ (6. the degree of freedom of the intrinsic camera parameter matrix is reduced to one.3) 1 µ˜ where fx and f y represent focal lengths cx and cy represent principal points In this method. The third-order spline fitting has features that an obtained function through . we assume zero skew and no lens distortion.1. This assumption is reasonable for most of the recent camera devices. we can improve the stability and accuracy of online camera parameter estimation. the intrinsic camera parameters are expressed in terms of the zoom variable m.1  Parameterization of Intrinsic Camera Parameter Change In this process.4. By using this model. 2008. the stereo-camera-based method can be used for situations wherein an additional camera can be attached to the camera capturing the augmented reality background images.

.2. In order to estimate camera parameters.5) where wmk and wz are weights for balancing each term. Eep implicitly gives the 3D structure information. the method for estimating intrinsic and extrinsic camera parameters with a monocular camera is introduced (Taketomi et al.3  Reprojection error based on tracked natural features. 6. the energy term Eep is calculated from the summation of distances between the epipolar lines and the tracked natural features as shown in Figure 6. These weights are automatically determined based on the camera parameters of the previous frame. 2014). Based on epipolar geometry.1  Definition of Energy Function In the online process.2. two energy terms are added to the cost function of the conventional marker-based camera parameter estimation. precalibrated information of the zoomable camera is used in this online camera parameter estimation process.4. 6. Eep and Ezoom help achieve stable estimation of the magnification of the zoom value. the details of the energy terms and the weights are described.4. In addition. a corresponding point must be located on the epipolar Tracked feature point Pi Reprojection error di p΄i Key frame Epipolar line li qi ei Epipole e΄i Current frame j FIGURE 6. In this method. two energy terms are added into the conventional cost function of the marker-based camera parameter estimation: an energy term based on the epipolar constraint for tracked natural features and an energy term based on the continuity constraint for temporal change of zoom values. and Ezoom gives the temporal constraint for zoom values.2  Energy Term for Epipolar Constraint In this method. The cost function Emono for the camera parameter estimation is defined as follows: Emono = Eep + wmk Emk + wz Ezoom (6. 6. These features are suitable for an energy minimization process in the online camera parameter estimation.2 Monocular Camera Case In this section.131 Image-Based Geometric Registration the all control points and each polynomial function is continuously connected at the borders.3. intrinsic and extrinsic camera parameters are estimated based on an energy minimization framework. Emk is used to estimate the absolute extrinsic camera parameters.4. In the following sections.

More concretely. Note that Pi in Equation 6. Note that the first frame is stored as the first key frame in the online process of camera parameter estimation. Reprojection errors are calculated from the correspondences between the fiducial marker corners in an input image and its reprojected points: Emk = 4 2 ∑ ( K ( m ) T P − p ) (6. By using this notation.3  Energy Term for Fiducial Marker Corners This term is almost the same energy term as that calculated in conventional camera parameter estimation methods. the energy term Eep is as follows: Eep = 1 Sj 2 i ∑d (6.9) j i =1 j i i . The epipolar line l can be calculated from the epipole ei′ and the projected position pi′ of the natural feature position pi in the key frame.2.6) i∈S j where S is a set of tracked natural feature points in the jth frame di is the reprojection error for the natural feature point i The reprojection error di is defined as the distance between an epipolar line l and a detected natural feature position qi in the input image. natural features are tracked between successive frames using the Kanade–Lucas–Tomasi feature tracker (Shi and Tomasi 1994). frames that satisfy the following criteria are stored as the key frames: 1. The distance between the current camera position and the camera position of the previous 10 frames is maximum.4. All the distances between the current camera position and key frame positions are larger than the threshold.7) pi′ = K ( m j ) Tj Pi (6.132 Fundamentals of Wearable Computers and Augmented Reality line on another camera image (Hartle and Zisserman 2004). 6.8 is already transformed into the world coordinate system using the matrices Kkey(mkey) and Tkey. To calculate this distance. In addition. 2. Epipole ei′ and the projected position pi′ are calculated as: ei′ = K ( m j ) Tj Pkey (6.8) where Pkey represents the key frame camera position in the world coordinate system Tj represents the extrinsic camera parameter matrix (camera rotation and translation) The subscript represents the estimated camera parameters in the key frame. we can represent the estimation error for the two frames with the epipolar constraint as the reprojection error.

In this section. camera parameters are estimated from a video feed. to effectively constraint the continuity of the zoom value. there are three energy terms. the estimated camera parameters will be unstable when the optical axis of the camera is perpendicular to the fiducial marker plane. 6. each energy term is automatically balanced using the estimated camera parameters in the previous frame. 6. In addition. In this case. In this method.4. Balancing each energy term is important to achieve accurate and stable camera parameter estimation. Unlike the conventional methods. the magnification of the zoom value continuously changes in successive frames. the weight wz is dynamically changed depending on the estimated intrinsic camera Optical axis Normal of fiducial marker plane θ FIGURE 6.4.4  Weight for the fiducial marker-based energy term. On the other hand.5  Balancing Energy Terms In the energy function E. In augmented reality.2. This is caused by the singularity problem in the optimization process of camera parameter estimation.11) π2 This weight function is experimentally investigated. as shown in Figure 6.2. we use the energy term Ezoom in the energy function: 2 Ezoom = ( m j −1 − m j ) (6.10) With this constraint. the details of the auto balancing framework are described. . respectively. For this reason. α is a minimal weight for Emk. the weight wmk for the energy term Emk is calculated from the angle θ between the optical axis and the fiducial marker plane as follows: wmk ( θ ) = 4 2 θ + α (6.4. a discontinuous change in the zoom value is suppressed.133 Image-Based Geometric Registration where Pi and pi are the 3D position of fiducial marker corners and its detected position in the input image.4  Energy Term for Continuity of Zoom Value This term is used to achieve stable camera parameter estimation. In fiducial marker-based camera parameter estimation. the magnification parameter m of the camera zooming exists in the intrinsic camera parameter matrix K in the jth frame. In order to add this constraint.

the zoom value mj−1 estimated in the previous frame and the extrinsic camera parameters estimated by using K(mj−1) are used as initial parameters.6  Camera Pose Estimation by Energy Minimization To estimate the intrinsic and extrinsic camera parameters. Finally. In this method.4. Intrinsic and extrinsic camera parameters for the zoomable camera are then estimated. For this reason. if we use the constant weight wz. we only use fx because the change of fx is almost the same as that of f y. we confirmed that the local minimum problem occurs along the optical axis of the camera.13) 1 + x2 where x represents the residual. Firstly.134 Fundamentals of Wearable Computers and Augmented Reality parameters in the previous frame. By using this weight. f y(m)) are drastically changed at a large image magnification resulting from the camera zooming. we should control the weight wz adequately.4. the reference camera is fixed on the zoomable camera. and its estimated camera parameters K(mj) and Tj are adopted as the final result. the extrinsic camera parameters for the reference camera are estimated using the fiducial marker. In the camera parameter estimation process. Experimentally. ρ( x) = x2 /2 (6. the lowest energy value resulting from all trials is chosen. The results of camera parameter estimation may converge at a local minimum. In this optimization process. To solve this problem. To avoid the local minimum problem. the method for estimating intrinsic and extrinsic camera parameters using a reference camera is introduced.2. the effect of the weight wz might be too strong or too weak in the camera parameter estimation process. we can adequately control the weight wz based on the rate of change of the intrinsic camera parameters. we employ the Geman–McClure function ρ. we employed a weight for wz which depends on fx(m) as follows: wz = 1 (6. We employ the M-estimator to reduce the effect of mis-tracked natural features in the optimization process. in the online process. 6. The focal lengths (fx(m). By using this model and estimated extrinsic camera parameters . the energy function E is minimized by using the Levenberg–Marquardt algorithm.3 Stereo Camera Case In this section.12) fx ( m j ) In this term. In general. the optimization process is executed using three different initial values generated by adding an offset β to the initial magnification value of camera zooming. the reference and zoomable camera pair is modeled by considering an optical lens movement. the relationship between the zoom values and intrinsic camera parameters is not proportional. 6. Thus. In this method. Intrinsic camera parameters of the reference camera are fixed during the camera parameter estimation process of the zoomable camera.

Concretely. the optical lens movement is modeled as the focal length change by zooming using a simple zoomable camera model (Numao et  al. in this modelization. 6. 1998).14) Zoomable camera fmin fmax p΄ F( f ) Trel Tzoom P p˝ Tref Reference camera FIGURE 6.3. 6. In this setting.3.5. In this method. the relative geometric relationship Trel changes depend on the optical lens movement because the optical center moves along the optical axis. This attached camera is used as a reference to estimate intrinsic and extrinsic camera parameters of the zoomable camera.2 Geometric Model for Stereo Camera Considering Optical Zoom Lens Movement In the case of camera zooming using an optical zoom lens. a relationship between the focal length of each zoom value fi and the minimum focal length f min is calculated: fi′ = fi − fmin   (6. In this calibration process. Intrinsic camera parameters of the reference camera are calibrated and fixed in the whole process. By using these known intrinsic camera parameters. intrinsic and extrinsic camera parameters for the zoomable camera can be obtained by estimating the zoom value.6. This relative geometric relationship is used to estimate intrinsic and extrinsic camera parameters of the zoomable camera. a relative geometric relationship Trel between the zoomable camera and the reference camera is calibrated by capturing a calibration pattern. .4.1  Camera Calibration for Camera pair This method assumes an additional camera attached to the zoomable camera as shown in Figure 6.135 Image-Based Geometric Registration for the reference camera.4. This simple zoomable camera model is shown in Figure 6.0 (non-zoom mode). Details of the algorithm are described in the following sections.5  Stereo camera model. the magnification of the zoom value of the zoomable camera is set to 1. the intrinsic camera parameters of the zoomable camera and the reference camera are known.

Extrinsic camera parameters of the zoomable camera Tzoom can be represented using the estimated extrinsic camera parameters Tref and precalibrated information Trel and F(f): Tzoom = F ( f ) TrelTref (6. •1 ³ F ( f ) = ³0 ³–0 0 1 0 0 0 1 0 — µ 0 µ (6. precalibrated information. The relationship between lens movement and the focal length change and the relationship between the zoomable camera and the reference camera are used to model the stereo camera model as shown in Figure 6. A regression line is fitted to the result of this calculation. In this figure. Tzoom and Tref are extrinsic camera parameters of the zoomable camera and the reference camera in the world coordinate system. respectively.136 Maximum zoom value Minimum zoom value fmin Image sensor L( f ) fmax Amount of lens movement L( f ) [mm] Fundamentals of Wearable Computers and Augmented Reality 0 0 fmin Focal length f (mm) FIGURE 6. In addition. the extrinsic camera parameters of the reference camera Tref are estimated using known intrinsic camera parameters Kref and the detected fiducial marker pattern. Firstly. extrinsic camera parameters are estimated similar to the conventional marker-based camera parameter estimation process.4.3 Camera Pose Estimation Using Zoom Camera and Base Camera Pair In the online process.15) where α and β are the parameters for the regression line.3. and the relationship between the focal length change and the lens movement. intrinsic and extrinsic camera parameters of the zoomable camera are estimated using the estimated extrinsic camera parameters of the reference camera. and then the relationship between lens movement and the focal length change L(f) is obtained: L ( f ) = αf + β (6. In this extrinsic camera parameter estimation process.6  Modelization of optical lens movement.17) .5.16) L ( f ) µ˜ 6. F(f) is the amount of optical zoom center movement.

5. and that the offset for the initial value β was set 0.4 × 10−4). Although this method can estimate lens distortion.19) where piref and pizoom represent detected marker corner positions in reference and zoomable camera images.137 Image-Based Geometric Registration In addition. Tzoom is refined by minimizing the following cost function using detected marker corner positions in reference and zoomable camera images: E= 4 ∑ i =1 2 K ( m ) Tzoom Pi − pizoom + 4 ∑ i =1 Kref Trel−1F f x ( m ) ( ) −1 2 Tzoom Pi − piref (6. It should be noted that all input video sequences start at the non-zoom setting. First. Next. In these evaluations. camera parameter estimation results of ARToolkit which does not handle the intrinsic camera parameter change are also shown as a reference. and extrinsic camera parameters using four 2D–3D corresponding pairs by minimizing the distances between 3D positions of natural feature points and its observed positions.18) ( ) In this equation. with an optical zoom (1×–20×) and a progressive scan at 30 fps.5  CAMERA PARAMETER ESTIMATION RESULTS Performances of the methods described in Section 6. the estimated camera parameters are compared with those of the Bujnak’s method (Bujnak et al. precalibration information is obtained using the camera calibration process described in Section 6. we used a Sony HDR-AX2000 video camera. in the experiments in real environment.93 GHz. This video .00 GB). we ignore the lens distortion effect because most consumer camera devices have no lens distortion except for camera-mounted wide-angle lens. Thus.1. Finally. lens distortion. projected points pi of 3D points Pi in the zoomable camera images can be represented using the intrinsic camera parameters of the zoomable camera K(z): pi ′ ∝ K ( m ) F f x ( m ) TrelTref Pi (6.1. 6. In addition. accuracies of camera parameter estimation of each method are quantitatively and qualitatively evaluated in simulated and real environments.4 are shown in this section. Memory: 4. The Bujnak’s method can estimate focal length. In this minimization process. In all experiments. the Levenberg–Marquardt method is employed and the zoom value in the previous frame is used as the initial value for optimization. which records 640 × 480 pixel images. all of parameters are known except for z. 2010).1 Camera Calibration Result In this experiment. we used a desktop PC (CPU: Corei7 2. The lens distortion of this camera is almost zero (κ1 = −1. we can estimate the magnification of the zoom value m by minimizing the reprojection errors of the detected marker and the reprojected marker corner positions. Note that the magnification of the zoom value z is fixed in this optimization process. 6. respectively.4.

u(z). 1200 fx(m) f y(m) Focal length (mm) 1000 800 600 400 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Magnification of camera zooming FIGURE 6.8 show the results of the camera calibration. we used the spline fitting results of fx(z).138 Fundamentals of Wearable Computers and Augmented Reality camera was used to generate virtual camera motions in the quantitative evaluation and to acquire actual video sequences in the qualitative evaluation. In these figures. Figures 6. The range of the image magnification resulting from camera zooming is divided into 20 intervals.7 and 6. In the following experiments. . the lines indicate the spline fitting results. the center of the projection changes cyclically because the lens rotates during zooming.7  Calibration result of focal length. Center of projection (pixel) u(m) v(m) 400 350 300 250 200 150 100 50 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Magnification of zoom value FIGURE 6. f y(z). and v(z).8  Calibration result of center of projection. and then. These results show that the focal length drastically changes when the zoom value is greater than 13. In addition. the intrinsic camera parameters for each zoom value are obtained using Zhang’s camera calibration method (Zhang 2000).

To quantitatively evaluate the estimated camera parameters.2  Quantitative Evaluation in Simulated Environment The accuracy of the estimated intrinsic and extrinsic camera parameters was quantitatively evaluated in a simulated environment. rotation angle θ is employed as the rotation error of the estimated rotation matrix. In this virtual camera motion acquisition process. Figure 6. Multiplying the two matrices Rd = Rtrue Rest .9 shows the geometrical relationship between 3D points and camera motions in the simulated environments. Additionally. two virtual camera motions in the simulated environment were acquired using ARToolKit (Kato and Billinghurst 1999) and video sequences captured in the real environment. 2. In this experiment. with the mean equal to zero and the standard deviation σ = 2. In this simulation. intrinsic camera parameters are fixed at the smallest magnification of camera zooming. camera position errors are measured by the Euclidian distance between estimated camera positions and true camera positions.139 Image-Based Geometric Registration 6.5. The differences between these two motions are as follows: • The camera moves freely during camera zooming in the simulated environment (free motion). because there was no noise in the projected points. 2011). On the other hand. • The camera moves straight along the optical axis during camera zooming in the simulated environment (straight motion). Gaussian noise was added. Free camera motion Straight camera motion FIGURE 6.9  Part of the camera paths and 3D points in the simulated environment. Decompose the matrix Rd into rotation axis vector w and rotation angle θ using Rodrigues’s formula. Finally. . The camera travels 2173 mm in the free camera motion and 1776 mm in the straight camera motion. This evaluation method is described in the literature (Petit et al. 100 3D points were randomly generated in 3D space (500 mm × 500 mm × 500 mm) and then the corresponding pairs were obtained by projecting these 3D points into virtual cameras.0. estimated camera poses are evaluated using the difference between the estimated rotation matrix Rest and the true rotation matrix Rtrue using the following calculation: T 1.

5. v) and the ground truth value for each frame.11 show the results of the estimated intrinsic parameters (fx.13 show the errors for estimated position and rotation.140 Fundamentals of Wearable Computers and Augmented Reality Focal length (mm) 6. which includes a translation. a rotation. In addition. Methods A and B can accurately estimate the center of projection.2. u.10 and 6. Figures 6. and zooming. Figure 6. the camera moves freely in the simulated environment.12 and 6. f y. These results confirm that Methods A and B can estimate the focal length more accurately than Bujnak’s method.1  Free Camera Motion In this case.11 shows the results of Methods A and B only. . the method for the monocular camera is labeled as Method A and the method for the stereo camera is notated as Method B. It should be noted that Bujnak’s method (Bujnak et al. In these figures. 2010) cannot estimate the centers of the projections. These results confirm that the accuracy of the estimated extrinsic camera parameters is 160 140 120 100 80 60 40 0 50 100 150 200 Bujnak’s method Focal length (mm) fx (estimated) 160 140 120 100 80 60 40 0 50 fx (ground truth) 100 150 fy (estimated) 200 Method A Focal length (mm) fx (estimated) 160 140 120 100 80 60 40 0 50 fx (ground truth) 100 150 fy (estimated) 200 fx (ground truth) fy (ground truth) 250 300 Frame number fy (ground truth) 250 300 Frame number Method B fx (estimated) 250 300 Frame number fy (estimated) fy (ground truth) FIGURE 6.10  Estimation results of focal length for each frame in free camera motion. Figures 6.

12  Estimated camera position errors for each frame in the case of free camera motion.141 Center of projection (pixel) Image-Based Geometric Registration 340 320 300 280 260 240 220 200 0 50 150 100 200 250 300 Frame number Method A Center of projection (pixel) u (estimated) 340 320 300 280 260 240 220 200 0 u (ground truth) 50 100 v (estimated) 200 150 250 300 Frame number Method B u (estimated) u (ground truth) v (ground truth) v (ground truth) v (estimated) FIGURE 6.11  Estimation results of center of projection for each frame in free camera motion. 30 Position error (mm) 25 20 15 10 5 0 0 50 100 Bujnak’s method 150 200 Method A 250 300 Frame number Method B FIGURE 6. .

Although the average reprojection error in Bujnak’s method is small. the processing times of Methods A and B are slower than those of Bujnak’s method.08 6. drastically improved by Methods A and B.0011 0.13  Estimated camera rotation errors for each frame in free camera motion. We consider that the multiple frame information and the continuity constraint of the camera zooming were responsible for this improvement.46 1. In addition.51 1.04 .83 0.06 0. the average estimation errors for each camera parameter decrease in Methods A and B. In Methods A and B. the errors for each camera parameter are still large.5 0. the energy minimization process accounts for most of the processing time. Table 6. In contrast.18 0. This is due to the difficulty in estimating the parameters using only 2D–3D correspondences. the minimization process is executed for three different initial values to avoid the local minimum problem.142 Fundamentals of Wearable Computers and Augmented Reality 12 Rotation error (degree) 10 8 6 4 2 0 0 50 100 150 Bujnak’s method 200 Method A 250 300 Frame number Method B FIGURE 6.1 1.31 0. However. These improvements are considered to be due to the accurate estimation of intrinsic camera parameters.31 0.1 Comparison of Accuracy in the Case of Free Camera Motion Average focal length error (mm) Average position error (mm) Average rotation error (degree) Average reprojection error (pixel) Processing time (s) Bujnak’s Method Method A Method B 13.1 shows the average errors for each camera parameter. we can confirm that translation errors are highly dependent on estimation errors of the zoom factor. in Method A.36 0. Especially. TABLE 6.37 1.82 0.

This condition cannot easily be handled by Bujnak’s method. Although the reprojection error is small in Bujnak’s method.5.17 show the errors for estimated position and rotation.2 shows the average errors for each camera parameter. Focal length (mm) 160 140 120 100 80 60 40 0 50 100 150 200 250 300 Frame number fy (estimated) fy (ground truth) 200 250 300 Frame number fy (estimated) fy (ground truth) 200 250 300 Frame number fy (estimated) fy (ground truth) Bujnak’s method fx (estimated) Focal length (mm) 160 140 120 100 80 60 40 fx (ground truth) 0 50 100 150 Method A fx (estimated) Focal length (mm) 160 140 120 100 80 60 40 fx (ground truth) 0 50 100 150 Method B fx (estimated) fx (ground truth) FIGURE 6. In addition.16 and 6.2.15 show the results of the estimated intrinsic camera parameters.14  Estimation results of focal length for each frame in straight camera motion. Figures 6. the estimated camera parameters are inaccurate. . the camera moves straight along the optical axis during camera zooming. the optical axis is perpendicular to the fiducial marker plane. Table 6.2  Straight Camera Motion In this case. These results show that Methods A and B can estimate accurate intrinsic and extrinsic camera parameters under this difficult condition.14 and 6. Figures 6.143 Image-Based Geometric Registration 6. This is due to the difficulty in estimating camera parameters when using only 2D–3D correspondences.

144 Center of projection (pixel) Fundamentals of Wearable Computers and Augmented Reality 340 320 300 280 260 240 220 200 0 100 50 150 200 Method A Center of projection (pixel) u (estimated) 340 320 300 280 260 240 220 200 0 u (ground truth) 100 50 v (estimated) 150 200 Method B u (estimated) u (ground truth) v (estimated) 250 300 Frame number v (ground truth) 250 300 Frame number v (ground truth) FIGURE 6. . 30 Position error (mm) 25 20 15 10 5 0 0 50 100 Bujnak’s method 150 200 Method A 250 300 Frame number Method B FIGURE 6.16  Estimated camera position errors for each frame in straight camera motion.15  Estimation results of center of projection for each frame in straight camera motion.

71 2. The size of the frustum changes depending on the focal length. 2010).17  Estimated camera rotation errors for each frame in straight camera motion.0012 2. In contrast. In these sequences. TABLE 6.43 0. the results of ARToolKit and the Bujnak’s method involve geometric inconsistency.19 and 6.73 0. These results show that Methods A and B can achieve accurate geometric registration using estimated camera parameters even in such a difficult condition.2 Comparison of Accuracy in the Case of Straight Camera Motion Average focal length error (mm) Average position error (mm) Average rotation error (degree) Average reprojection error (pixel) Processing time (s) Bujnak’s Method Method A Method B 13. Figure 6.1 1.04 6.54 1.145 Image-Based Geometric Registration 10 Rotation error (degree) 9 8 7 6 5 4 3 2 1 0 0 50 100 150 Bujnak’s method 200 Method A 250 300 Frame number Method B FIGURE 6.13 1.5.18 shows the results of the geometric registration.05 0. Frustums represent the estimated camera positions and poses.67 0. We can confirm that the virtual cube is accurately overlaid in Methods A and B.33 0.20 show the results of the estimated camera paths.24 1.66 7. the image magnification resulting from the camera zooming changes dynamically.51 0. . there is a large inconsistency in the geometric registration result of the Bujnak’s method when the optical axis is perpendicular to the fiducial marker plane. one of which was a free camera motion sequence and the other a straight camera motion sequence.79 0. The camera parameter estimation process was executed for two video sequences. Figures 6. A virtual cube is overlaid on Rubik’s cube.3  Qualitative Evaluation in Real Environment The geometric registration results were compared with those of ARToolKit (Kato and Billinghurst 1999) and Bujnak’s method (Bujnak et al. More specifically.

.18  Geometric registration results of each method. A virtual cube is overlaid on Rubik’s cube in each frame.19  Estimated camera paths for free camera motion.146 Fundamentals of Wearable Computers and Augmented Reality Bujnak’s method Method A Method B Zoom Non-zoom Zoom Non-zoom ARToolKit FIGURE 6. Bujnak’s method Method A Method B FIGURE 6.

However. intrinsic and extrinsic camera parameter estimation of the zoomable camera is achieved using the reference camera. By using this model and the reference camera. intrinsic and extrinsic camera parameters can be estimated by solving a one-dimensional optimization problem. the current methods do not consider the lens distortion. .20  Estimated camera paths for straight camera motion. two additional energy terms are added to the conventional marker-based camera parameter estimation method: reprojection errors of tracked natural features and temporal constraint of zoom value. the optical lens movement is modeled as the focal length change by zooming using a simple zoomable camera model. In this method. camera parameters are estimated by minimizing the energy function. 6. This figure confirms that the estimated camera paths of Methods A and B are smoother than those of the Bujnak’s method. the methods for estimating intrinsic and extrinsic camera parameters were introduced.Image-Based Geometric Registration 147 Bujnak’s method Method A Method B FIGURE 6. Lens distortion must be considered when using wide-angle lenses. in the stereo camera case.6 SUMMARY In this chapter. There is a large jitter in the estimated camera path of the Bujnak’s method. These methods can achieve accurate and stable camera parameter estimation. We confirmed that Methods A and B can estimate the camera path with more stability than the Bujnak’s method. In the monocular camera case. In this method. On the other hand.

pp. Z. Seattle. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. San Diego. 1999. November 8–12. Schaffalitzky. C. 27(7). A new efficient and direct solution for pose estimation using quadrangular targets: Algorithm and evaluation. Bujnak. pp. pp. Pajdla. Nister.K. Chandra. Fischer.. Real-time visual tracking of complex structure. 1998. A. Bolles. Nakatani. 21(8). Graz. 24(6). Seitz. International Journal of Image and Vision Computing. Japan. International Journal of Computer Vision. October 26.. Self-calibration of a moving zoom-lens camera by pre-calibration.. 1997. . 81(2). Kukelova. 85–94. Petit. N. 2009. Moreno-noguer. M. 2010.148 Fundamentals of Wearable Computers and Augmented Reality REFERENCES Abidi. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1994. Cambridge. Cipolla. H. Proceedings of the European Conference on Computer Vision. Billinghurst. Good features to track. H.. IEEE Transactions on Pattern Analysis and Machine Intelligence. Kahl. 835–846... P. 2013. Technical Report of IEICE. 1515–1523. June 23–28. Szeliski. 2006. 381–395. R.. Pajdla. Z. A. 2005. June 20–25. Anchorage. 25(3). 2006.. R. Austria. Proceedings of the International Workshop on Augmented Reality. D. Okutomi. WA. P. H. 593–600. A. Marker tracking and HMD calibration for a video-based augmented reality conferencing system. A simple solution to the six-point two-view focal-length problem.. Bujnak. A minimal solution for relative pose with unknown focal length. 2010. IEEE Transactions on Pattern Analysis and Machine Intelligence. 155–166. Proceedings of the International Workshop on AR/MR Registration. 28(11). Hartle. F. 11–24. Proceedings of the International Conference on Computer Vision. Computer Vision: Three-Dimensional Data from Image.. Snavely.: Cambridge University Press. F. Lan. 65–72. December 1–8. Calibration of a pan/tilt/zoom camera by a simple camera model. CA. Kato. New York: Springer. Kukelova. Li. C. Klette. A. pp. M. Queenstown. 1981. 17(5).. Proceedings of the Asian Conference on Computer Vision. pp. pp. T. June 21–23. 1–8.. Stewenius. 200–213. Kukelova. Marchand. New efficient solution to the absolute pose problem for camera with unknown focal length and radial distortion.. H. U. T. ACM Transactions on Graphics. F. Kanagawa. Numao. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Lepetit. Z. Sydney. Evaluation of model based tracking with TrakMark dataset. 2011. M. Zisserman. New South Wales. G. Tracking and Benchmarking. M. V. pp. PRMU. pp. 1999. R. Quan. 1998. 15. Z. Hmam. Pajdla. 2002. M. Drummond. Sturm. T. J. CA. T. editors. Kim. 534–538. Photo tourism: Exploring photo collections in 3D. New Zealand. Multiple View Geometry in Computer Vision. Tomasi. San Francisco. International Journal of Image and Vision Computing. Optimal non-iterative pose estimation via convex relaxation. Schluns. M. H. Shi. R. AK. Australia. 1995.. Communications of the ACM. 2008. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.. Uchiyama. Real-time solution to the absolute pose problem with unknown radial distortion and focal length. 774–780. Bujnak. Switzerland. M. T. Basel.-D. 789–794. E. T.. L. Caron. 932–946. EPnP: An accurate O(n) solution to the PnP problem. October 20–21. May 7–13. K. S. A general solution to the P4P problem for camera with unknown focal length. Fua. Koschan. Linear n-point camera pose determination. R. J. 583–589. Y. M. 2816–2823. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2004. A..

September 20–27.. FL. G. H. Sato. Yokoya. K. 2011. Y. 131–141. Journal of Mathematical Imaging and Vision. 364–374. 24(1). Greece. Camera pose estimation under dynamic intrinsic parameter change for augmented reality. T. Miami Beach. 22(11). 44. T. Hu. 1330–1334. Okada. . pp. Taketomi. Proceedings of the International Conference on Computer Vision. 768–777. Kato. Miyazaki. N. PnP problem revisited. Triggs.. Wu. 2006. Z. 1999. International Journal of Computers and Graphics. 11–19. 2000. 278–284. Kerkyra. B.Image-Based Geometric Registration 149 Taketomi. Camera pose and calibration from 4 or 5 known 3D points. IEEE Transactions on Pattern Analysis and Machine Intelligence. A flexible new technique for camera calibration. 35(4). 1986. Real-time and accurate extrinsic camera parameter estimation using feature landmark database for augmented reality. International Journal of Computers and Graphics. Tsai. pp. 2014. Z. J.. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. R. Y. Zhang. An efficient and accurate camera calibration technique for 3D machine vision. T. Yamamoto.


3....................4.................................................2 Framework for Simultaneous Tracking and Recognition..............3 Camera Pose Tracking with Robust SFM......................................... 169 Acknowledgment.1.......................2.................................2................ 152 7....3............... 161 7............................. 154 7..........................................3......................4....................4..........7 Visual Tracking for Augmented Reality in Natural Environments Suya You and Ulrich Neumann CONTENTS 7............................... 156 7...................4...2 Online Stage.1.. 161 7................... 169 References............................. 165 7........2............4 Camera Pose Tracking with LiDAR Point Clouds...........................4 Camera Pose Estimation......4........... 160 7......... 164 7...........................2...................................................1 Automatic Estimation of Initial Camera Pose.5 Incorporating Unmatched Keypoints................... 166 7.......................1 Introduction.. 152 7.. 165 7..........................1................................................2 Extracting Keypoint Descriptors......................................3 Incremental Keypoint Matching.....1 Offline Stage....................... 158 7..... 156 7.........3................... 165 7..3.............................................. 164 7.. 154 7...............................2 System Components................ 162 7......................................................4...1 Building the Point Cloud..........................................................................................................3 Estimate Camera Pose.................3 Experiments....................................2....... 158 7....... 153 7.............................3.............6 Experiments................................................. 167 7..................................... 160 7..1 Structure from Motion Using Subtrack Optimization.............2...........2 Extract Keypoint Features............3 Simultaneous Tracking and Recognition.1 Generate Synthetic Images of Point Clouds.................................... 170 151 ....................2..............2 Camera Pose Refinement..................3.........................2....5 Summary and Conclusions..............................................2.......3.

Rather than functioning as a separate initialization step. 2008. Mooser et al. 2006. 7. this entails capturing a sequence of images and determining a camera’s spatial pose (position and orientation) at each frame. A more practical approach is to track naturally occurring elements of the environment (Platonov et al. In both cases. is the ability to capture images. two tracking systems are presented. 2004. .. Within the same unified framework. Guan et al. This chapter focuses on the problem of robust visual tracking for AR in natural environments. is often impractical for use in wide-area environments. Claus and Fitzgibbon. estimate pose. possibly in combination with artificial markers (Jiang et al. 2008. the two often play complementary roles within a larger system. Visual tracking is a popular approach to AR pose determination. The advantage of visual tracking for AR.. the environment is prepared with artificial markers that can be easily detected and tracked (Cho and Neumann. 2011. Ababsa and Mallem.. 2012). matching. 2001. Moreover. by determining a sufficiently large set of 2D–3D feature correspondences between an image and the model database. 2012). identify visual content. Bleser et al..1 INTRODUCTION Augmented Reality (AR) is the process of combining virtual and real objects into a single spatially coherent view. In most cases. 2010. 2010. 2000). with either artificial markers or natural features.... Guan et al. The camera’s position and orientation. 2004. Uchiyama et al. an accurate camera pose can be computed and used to render virtual elements for AR display. is a primary technical challenge of AR. 2006. Emphasis is placed on a novel tracking technique that combines feature matching.. Tracking. along with its internal parameters.2 FRAMEWORK FOR SIMULTANEOUS TRACKING AND RECOGNITION Object recognition and tracking are closely linked problems in computer vision. Shibata et  al. Both frequently employ similar low-level tasks such as feature selection. and manage AR display all in one computing device with a camera and display.. In its simplest form. A tracking via recognition strategy is presented that performs simultaneous tracking and recognition within a unified framework. 1999.. 2001. recognition is as an ongoing process that both aids and benefits from tracking. provide the essential information needed to create augmented realities. and tracking supporting the recognition of a target’s identity (Mooser et al. and therefore the subject of a large body of AR research and development work (Neumann and You. Hsiao et al. or camera pose determination. Experiments show the advantages of this unified approach in comparison to traditional approaches that treat recognition and tracking as separate processes.. Uchiyama and Marchand.152 Fundamentals of Wearable Computers and Augmented Reality 7. visual recognition. 2012). Azuma et  al. with object recognition used to initialize or re-initialize tracking. 2006). Comport et al. and camera pose tracking.. and model fitting. 2006. while the second system employs a dense 3D point cloud data captured with active sensors such as LiDAR (Light Detection and Ranging) scanners.. however. The first system employs robust structure from motion (SFM) to build a sparse point cloud model of the target environment for visual recognition and pose tracking. The marker-based approach. Wagner et al.

This process of building this database represents a primary function of the offline stage. Pylvänäinen et  al.. In any case. At a high level. depth cameras. Figure 7. Computer vision methods using stereo. animations. geometric data is used to compute camera pose. Appearance data is used to recognize the features from a video input.1  Unified framework for simultaneous visual tracking and recognition. to quickly acquire dense 3D point cloud models in wide-area environments (Feiner et al. 2012. Keypoints are extracted from the landmarks and represented using local descriptors that serve as feature definitions for recognition and tracking. there are two stages. The landmarks can consist of artificial markers. The first. This data typically consist of 2D or 3D models with rendering attributes that may include color. Many methods can generate the features and 3D point cloud models needed for the feature database. These two stages are connected through a feature database and a virtual object database. Virtual annotations are often associated with a set of features in the feature database and will . and renders final AR output for each frame. 2011. New methods for constructing a feature database using both approaches are described in Section 7. texture. Guan et  al. or both.3. The second is an online stage that processes an input video stream. and SFM can produce accurate but sparse point cloud models of the target scenes. The output of the feature definition component is stored in the features database. defines a set of visual features and associated virtual content. All virtual content is also defined in the offline stage. an offline stage. and interactive behavior. An alternative approach is to utilize active laser scanners such as LiDAR systems. and stored in the annotation database.2. recognizes visual features. 2013).. 7. estimates camera pose.1  Offline Stage The offline stage is responsible for defining a set of visible landmarks for visual recognition and tracking. a landmark definition contains both appearance data and geometry data.1 illustrates the developed unified framework for simultaneous visual recognition and tracking.. and are often favored for smaller controllable workspaces. natural features.153 Visual Tracking for Augmented Reality in Natural Environments Offline stage Image input SFM LiDAR Create virtual objects Online stage Virtual object DB AR display Recognize Compute features camera pose 3D point Camera clouds poses Extract feature descriptors Render virtual objects Feature DB Match features Video input FIGURE 7.

This component thus serves in the key role of target recognition. Once a sufficient number of features are recognized. however its relationship to the feature database is made clear by inclusion in the offline stage. Once an object has been recognized. It begins with the feature database of learned keypoints and their descriptors. The construction and use of the annotation database is outside of this chapter’s scope. their geometry data are used to estimate the position and orientation of the camera. Then. If an object is visible in the scene. Keypoints that cannot be matched against the database are matched to prior-frame keypoints by back-projection with the current pose estimate.154 Fundamentals of Wearable Computers and Augmented Reality only be rendered when those features are recognized and visible. a small subset of the large collection of tracked keypoints is selected for matching against the database. renders them into the input image. 7. The camera pose component is responsible for tracking. then matching resulting from multiple frames can produce more confident object identification and more robust pose estimation. Keypoints are tested to determine . These two insights lead to a process called incremental keypoint matching.3  Simultaneous Tracking and Recognition A key goal of the tracking via recognition strategy is to tightly integrate recognition and tracking into a unified and symbiotic system.2. its tracked keypoints are continually matched against the database. 7. described in Section 7. the extracted visual features and descriptors are used to incrementally recognize objects of interest and estimate camera pose. using the current camera pose. Rather than functioning solely as a separate initialization step.2. the system can positively identify it and compute camera pose within a few frames. At each frame. recognition is an ongoing process that both aids and benefits from tracking. A matching component attempts to match visual features detected in each input frame to the feature database. The strategy exploits two key insights: • If keypoints are reliably tracked from one frame to the next. keypoints that fail to directly match to the feature database are employed to determine camera pose using a technique called incremental matching. It utilizes the feature database geometric data to achieve pose calculation and tracking.3. generating an additional set of correspondences across frames to maintain a stable camera pose through movements. one can infer the locations of all other keypoint matches and use those points to aid the tracking process.2  Online Stage The online stage processes video input to control the appearance of rendered AR output. all unmatched features are back-projected using the computed pose. First. thus bounding the CPU processing time used for matching in a single frame. • With sufficient keypoint matches to compute an object’s pose. This rendering produces the final output of AR applications. The final component in the online stage retrieves the virtual objects associated with recognized targets and.2. Rather than simply failing at these frames. There may be frames with too few matches to compute a pose.

Tracking these new points along with database-matched keypoints helps maintain accurate pose tracking through subsequent frames. however the projection matches ensure that tracking remains accurate and robust. The pose estimate is used to draw the white rectangle model of the book. These are used to produce stable pose estimates.2 illustrates the principle of incremental keypoint matching for simultaneous tracking and recognition. The scene contains a book cover viewed by a moving camera.2  Incremental keypoint matching for simultaneous tracking and recognition. If it does. (b) Camera pose is refined incrementally with back-projection matches. shown in Figure 7. In all frames. will not fit the pose estimates in future frames and these points are discarded. From this point forward. the detected dots on the book cover are the keypoints used for object identification and pose tracking. finds a few database matches that are used to estimate an initial camera pose.Visual Tracking for Augmented Reality in Natural Environments 155 whether the current pose estimate implies that they lie on the surface of the database object. (a) (b) (c) FIGURE 7. and those keypoint matches that are derived from the incremental backprojection algorithm are called projection matches. . its location in object space coordinates can be back-projected to generate a match.2c shows a frame with substantial object occlusion.2a. Figure 7. Figure 7. keypoint matches to the database are called database matches. estimated camera pose is used to compute the positions for new keypoints that have never been matched to the database. Thus. Keypoints that do not belong to the object. perhaps because they lie on an occluding object. In the event the system loses track of some of the originally matched keypoints. As the sequence continues. Figure 7. bold lines indicate database matches. and thin lines are projection matches. (c) Stable camera pose produced by combining the database matches and projection matches. it still tracks the object correctly using the newer matches. A feature database containing the book cover model and features was prepared offline.3b shows numerous keypoints matched by incremental back-projection. Only two of the original database matches remain. (a) Initial camera pose is estimated from the database matches. An early frame.

1. inaccuracies among the feature correspondences make the task far more challenging. Given an ordered sequence of captured images. Ordered sequences simplify the problem because consecutive frames are similar and easier to match with motion estimation approaches such as optical flow. In principle. A variant of Lucas–Kanade optical flow based on image pyramids is employed. SFM from video presents its own unique challenge because long sequences accumulate small errors that cause drift in the computed camera pose. Each keypoint projects along a ray in space.3  Illustration of hypothetical keypoint tracks for six camera locations. The optical flow . recover camera pose. the problem is straightforward given a sufficient number of feature correspondences between frames. the system includes two main components. we focus solely on images captured as a video sequence from a single camera. so there is no structure point valid for the entire track. Its output is a database of recognizable keypoints along with 3D descriptions of accompanying virtual objects. Both components depend on the ability to simultaneously determine scene structure and camera pose from input images. Experiments show that subtrack optimization algorithm produces a keypoint database that is both larger and more accurate than could be achieved using conventional SFM techniques.6 do have valid structure points.3 and k4. takes a video stream as input and builds a sparse point cloud model of the target environment using robust SFM.156 Fundamentals of Wearable Computers and Augmented Reality k1 k2 k3 k4 k5 k6 FIGURE 7. 7. the offline stage. However. an algorithm called subtrack optimization (Mooser. As illustrated in Figure 7.3  CAMERA POSE TRACKING WITH ROBUST SFM This section describes a tracking system for markerless AR based on the tracking via recognition framework.5. processes input frames sequentially to recognize the previously stored keypoints.3. however. SFM can be applied to any set of two or more images. The first component. In practice. Keypoints are extracted from the point clouds and represented as a set of local descriptors. However.2.1  Structure from Motion Using Subtrack Optimization In its general form. The six rays do not meet at a single point. The second part. To tackle this problem. and render associated virtual content. 7. subtracks k1. the online stage. 2009a) uses dynamic programming to detect incorrect correspondences and remove them from the output. keypoints can be extracted from the first frame and tracked using a standard optical flow technique. In our case. The goal of the optimization algorithm is to reliably perform this partitioning.

Partitions. would only partially address the problem.Visual Tracking for Augmented Reality in Natural Environments 157 process generates a set of keypoint tracks. k ) j j 2 (7. by definition. let kj and Pj be the k­ eypoint and camera.b be the subtrack spanning frames a to b inclusive.3. Keypoint tracks are often stable for a few frames. Over a long sequence. may be individually inconsistent. X. tracks can span only a few frames or several hundred. A simple solution identifies keypoint tracks that do not fit to a single structure point and remove them from the computation. however. at frame j and ka. respectively. and the accuracy of the resulting structure will suffer accordingly. that minimizes the equation is the structure point corresponding to ka. no matter how large or small. . Ideally. Because those rays do not meet at a common structure point. Simply splitting the track into fixed sized partitions.b is given by the error function E (ka. if consecutive partitions are consistent. The subtrack spanning frames 1–3 and frames 4–6 are consistent. It sets out to identify the longest possible subtracks that can be deemed consistent. A long keypoint track is generally stable over some portion of its lifetime and a more powerful approach will identify those sections and use them. then drift.b ) = min X 1 N b ∑ d(P X . the six frame tracks are. all keypoints in a given track correspond to the same 3D point in space. This idea is illustrated in Figure 7. Each track continues to grow until optical flow fails or the keypoint drifts out of view. The consistency of ka. however. in which case the keypoint track is deemed consistent. Moreover. and thus represent an optimal partitioning with both subtracks usable for pose computation. For keypoint track k. As a hypothetical camera moves from top to bottom. this seldom holds true.1) j=a where d is Euclidean distance in pixels N is the length of the subtrack The argument. and then become stable again. The motivation behind the subtrack optimization algorithm is to solve this partitioning problem optimally. inconsistent. In practice. Traditional outlier detection schemes such as RANSAC (Random Sample Consensus) may be used to this end.b. for example. each consisting of a keypoint correspondence spanning two or more consecutive frames. Favoring fewer. However. A method that is overly aggressive in partitioning a keypoint track will lose valuable information. Identifying the sets of frames during which a keypoint track remains stable is nontrivial. Each subtrack corresponds to a single structure point with its consistency determined by average reprojection error. a keypoint is tracked along a ray in space at each frame. longer subtracks is important because it ensures that they span as wide a baseline as possible. simply labeling entire tracks as an inlier or outlier ultimately discards useful data. it would be preferable to consider them as a whole.

the optimal partitioning pˆ of keypoint track k is defined in terms of a cost function C ( p) = ∑ (δ + E ( k a . This subtrack optimization process forms the basis for building the keypoint database during the offline training stage. the subtrack optimization algorithm requires an estimated camera pose at each frame. no poses are known.158 Fundamentals of Wearable Computers and Augmented Reality In general.b)  < 1. 7. The number of possible partitions is exponential in the length of k. however.2  System Components This section details the main components of the architecture shown in Figure 7. It also serves an important role in the online stage as described in the following section. The subtrack optimization algorithm introduced earlier can determine reliable feature locations and incrementally build a complete 3D point cloud of the scene.b ∈p where δ is a constant penalty term ensuring that the optimization favors longer subtracks whenever possible. given an estimate of the camera pose at each frame. This recursion can be computed efficiently from the bottom up using dynamic programming. See Mooser (2009) for a formal proof of its correctness and analysis of its run time.2. As it turns out. This data can be created by tracking keypoints through a video sequence acquired ahead of time.1  Building the Point Cloud The keypoint database consists of a cloud of 3D points. n)] (7. a partitioning suited to our needs can be found in low order polynomial time.2.b )) (7.1 for the case where the tracking system employs SFM with subtrack optimization.0 are deemed consistent. those subtracks spanning at least three frames and having E(ka. After ­optimizing each keypoint track. The key idea is to define the cost function recursively as C ( pˆ 0 ) = 0 C ( pˆ 1 ) = 0 C ( pˆ n ) = min[C ( pˆ a −1 ) + δ + E (a. Initially.2) ka . In order to partition a keypoint track.3.3) 1≤ a ≤ n where pˆ n is the optimal partitioning of a track only up to frame n. Although the final partition is optimal in that it minimizes Equation 7. all others are deemed inconsistent. Recall that finding long consistent subtracks is an important goal. so some method is needed to bootstrap the process. so a brute force search would be intractable. it is not necessarily the case that all subtracks are consistent enough for pose calculation. 7.3. each associated with several descriptor vectors. Only the structure points corresponding to consistent subtracks are included in the final structure. One way to achieve this is to select features tracked from the first frame to another early frame n and use them .

which is fit using Levenberg– Marquardt optimization. along with FIGURE 7. Using known internal parameters. they are used to estimate the initial camera pose of the new frame. this defines a rigid transformation relating the two cameras.Visual Tracking for Augmented Reality in Natural Environments 159 to fit a fundamental matrix. For each new frame.4 shows a resulting point cloud model for a building sequence. subtrack optimization is performed taking that new pose estimate into account. The resulting short sequence of camera poses. Subsequent frames are processed with optical flow applied to each keypoint to extend the corresponding track. For every keypoint tracked from frame 1 to frame n. For each inlier subtrack. With an initial camera pose for a new frame. with a new 3D structure point assigned to each new subtrack. . This continues for the entire input sequence to produce a final 3D reconstruction and feature database. Since those subtracks have known 3D structure points. provides enough information to apply the subtrack optimization algorithm to each tracked keypoint. typically 8–10 in our implementation. Each camera pose is a rigid transformation having six degrees of freedom. So each frame estimates one camera pose followed by one subtrack optimization. the system initially assumes that all consistent subtracks in the prior frame are still consistent for the current frame.4  Point cloud model of a building along with camera poses produced with the proposed robust SFM approach. rotated to provide an overhead view. Each keypoint track is repartitioned into subtracks. a 3D point is computed to form an initial point cloud. 2 through (n − 1). The initial structure points are then used to compute poses for the intervening frames. an initial structure point Xk is computed by linear triangulation. Figure 7. This partitions each track into one or more subtracks and determines which subtracks are inliers. The structure of the hedges is clearly visible.

the approximate nearest neighbor search algorithm called Best-Bin-First (BBF) (Lowe. which is highly compact and discriminative. each point must be associated with a visual descriptor that can be matched during the online stage. a match is accepted only if the distance ratio between its nearest and second nearest matches is less than a predefined threshold. To improve speed performance. The descriptor uses the Walsh–Hadamard (WH) kernel projection to describe a keypoint. The WH descriptor takes rectangular image patches (typically 32 × 32 pixel) as input and uses WH kernel transformation to reduce the patches to low dimensional vectors. Tracking hundreds of keypoints over hundreds of frames. Using the BBF algorithm. A distance ratio filter is applied. these keypoints are initially generated by a detector and tracked using optical flow.3. the nearest and second nearest matches are retrieved and their matching distances are computed. the ith element of the kernel projection is given by the inner product uiT p. 7. To complete the keypoint database. the resulting database can grow quite large. which returns the exact nearest neighbor with high probability while only requiring time logarithmic in the database size. Given an image patch p and the ith WH kernel ui. as described in the following section.3. However. If both of these matches are associated with the same structure point. exactly as in the offline stage. Experimental trials revealed that 20 dimensions are sufficient to retain the most characteristic features of each patch and provide reliable matching results. The first WH kernel simply computes a sum of the patch’s intensity values. the matching process is too time-consuming when applied to all keypoints in an image (typically 300–500). a Euclidean nearest neighbor search finds its match in a database of learned keypoints. that is. so the 20-dimensional descriptor vector is comprised by WH kernels u1 through u21. their matching distance ratio should be smaller than a predefined threshold and the match is accepted. 7. at worst. Given the 20-dimensional descriptor for each keypoint. which contains no discriminative information after normalization. logarithmic in the size of the database. by itself. the matching process maintains efficiency by employing an approximate nearest neighbor search that is. As in the offline stage.2. as detailed in the following section. The first kernel is thus discarded. with lower dimensions encoding low frequency information and higher dimensions encoding high frequency information.160 Fundamentals of Wearable Computers and Augmented Reality some keypoints arising from the brick walls and the lawn. The path of the camera in front of the building is computed and rendered on the left. A descriptor is extracted at each keypoint using the 20-­ dimensional kernel projections. only the 2D image locations of these keypoints are known. namely a 3D location for each structure point. contains only geometric information.2  Extracting Keypoint Descriptors The point cloud. and matched against the database using a Euclidean nearest neighbor search.2. Even with fast approximate nearest-neighbor searches.3  Incremental Keypoint Matching Each frame of the input sequence contains a set of keypoints. 2004) is employed. Initially. .

that is. Outliers are removed. Typically. pose is computed from projection matches since they persist without database matches. Incremental keypoint matching only attempts to match a limited number of keypoints in each frame. By selecting new keypoints for matching and tracking existing matches. This results in two distinct sets of matches to compute pose.4  Camera Pose Estimation Each successful database match produces a correspondence between a 2D image point and 3D structure point. 7. the subtrack optimization algorithm is applied to all keypoint tracks that are not matched to the database. RANSAC is used to fit a camera pose. If that same keypoint is tracked over several frames. The partitioned keypoints now have associated 3D locations and can be applied to the pose estimation of future frames. Some of these points are lost as they drift out of view or due to track failures. once a pose is recovered for at least 10 consecutive frames. The second advantage is that even when many database matches are found. we adopt the incremental keypoint matching approach described in Section 7. there are times that the camera views an area that was not included in the model building phase. as these 3D points are deemed incorrect. The advantages of including projection matches in the pose estimation process are twofold. however. As the sequence continues. additional matched points may be computed at each frame.2. each with a known pose. Most of the keypoints in the input image are never successfully matched against the database. successful matches are tracked from frame to frame. there are many times more projection matches than database matches.3. Incorporating these points into the pose estimation process can significantly improve the quality of the final results. its location in 3D space can be estimated.2. 7. incremental matching adds new matches at each frame.3. additional matches result in a smoother and more reliable pose. Thus. the number of tracked matches accumulates.Visual Tracking for Augmented Reality in Natural Environments 161 In order to limit this computational cost in any frame. such estimates are precisely the input to the subtrack optimization algorithm used during the offline stage.3. Keypoints are tracked through the sequence to enable pose recovery in subsequent frames. At the same time. In such cases. In fact. Residual errors exist in all keypoint position estimates. trying to maintain a match set large enough to compute a reliable pose. An unmatched keypoint in a single frame represents a ray in 3D space. and thus can be associated with any point along that ray. in which case database matches alone are insufficient to recover pose. The selected set varies each frame so the set of successful matches gradually accumulates.5  Incorporating Unmatched Keypoints The process described thus far only uses points whose 3D locations are known in advance and stored in the feature database. database matches and projection matches. . about 10 frames are sufficient for accumulating enough matches to recover a robust camera pose. In our experiments.2. A larger number of correspondences make the final least-squares fit more reliable and smoothly varying. First. While a few matches per frame are seldom sufficient to estimate a robust pose.

5  Sample tracking and augmentation scenes for all three test cases: (a) the fusebox is tracked from a variety of orientations. making the camera tracking extremely challenging.5a. and does not contain many easily identified features. measurable difference in the end results (Mooser.6 Experiments Experiments demonstrate various behaviors of the online and offline stages of the tracking system. the A/C Motor sequence in Figure 7.5c. the building sequence in Figure 7. targets an irregularly shaped object. The second test.2. not all of which were covered in the training process. Although the ground surrounding the motor is flat.5b. . Three test results are shown to demonstrate the system’s performance with different scenes. shows an outer (a) (b) (c) FIGURE 7.3. The first test. The primary focus of all of these tests is to show that the robust SFM tracking approach makes a substantial. while the keypoint dataset was built from only one side of the motor. it is mostly covered in gravel. and (c) the building scene contains both natural and man-made objects. In all test cases. the Fuse Box sequence in Figure 7. shows the exterior of an electrical fuse box in an industrial environment. The final case.162 Fundamentals of Wearable Computers and Augmented Reality 7. (b) the ac motor is tracked through a nearly 180° rotation. Both stages were thus compared with and without subtrack optimization. one video sequence was captured for the offline stage and a separate. longer video was used for the online stage. 2009). The scene contains a mixture of planar and nonplanar surfaces.

37 25. The reason for this is that a keypoint track that drifts significantly cannot be fit to any single structure point. the average optimized error is lower than the unoptimized cases. Table 7.49 (failed after 206 frames) With optimization Average inliers Reprojected error 438. Without subtrack optimization.56 438.1 shows the RMS reprojection error of all database keypoints produced in the offline stages.2 shows the results of recognizing and tracking in the online stage.28 0. as reflected in the results. While there is significant error reduction with optimization. such a track does not contribute any points to the database. the absolute error is not as low as in the offline stage.79 4.45 20. Table 7.89 0.1 Offline Stage Error Measurement Sequence Fuse Box AC Motor Buildings No optimization Average subtrack length Reprojected error 15. Without subtrack  ­optimization.48 1. all of which can go into the database.85 With optimization Average subtrack length Reprojected error 21.97 0.41 2.17 24.163 Visual Tracking for Augmented Reality in Natural Environments building scene comprising both natural and man-made objects. If the track drifts significantly over the whole sequence.08 (failed after 269 frames) 68.61 0.64 26. pose refinement relies only on database matches and no projection matches are used.06 . The annotations are virtual labels showing the way to nearby points of interest. In all tests.45 1.11 2.38 4.43 46. may find multiple subtracks that each have valid structure points. the structure point is poorly defined and produces a large reprojection error.54 TABLE 7. tracks are simply terminated when their error exceeds a threshold. Subtrack optimization. when running the offline stage with no optimization. keypoint tracks are never partitioned.2 Online Stage Error Measurement Sequence Fuse Box AC Motor Buildings Length (frames) 600 444 350 No optimization Average inliers Reprojected error 50.91 300. the keypoint database has far fewer total keypoints. pose accuracy is measured by average reprojection error of all keypoints in all frames. As in the offline stage. Moreover. TABLE 7. Without subtrack optimization.23 0. Every track thus corresponds to a single structure point. however.36 0. This is largely due to residual errors in the keypoint database.

distinguishable objects are identified to establish correspondences between the images and model. automatic pose initiation and iterative pose refinement. 7.3. although errors will accumulate in the absence of any visible database features. During those movements. It shows that the optimization step greatly increases the total number or projection matches available for use in pose calculation. but significant distinctions exist in keypoint selection. appearance data is used to recognize features.2 also compares the average number of inlier keypoints available with and without subtrack optimization.3 can automatically produce accurate. especially in cluttered and occluded natural environments. geometric data is used to compute camera pose.4. Without the use of projection matches. which are detailed in the following section. As database features become visible again.4  CAMERA POSE TRACKING WITH LiDAR POINT CLOUDS The SFM-based system described in Section 7. Camera poses are accurately estimated so the virtual objects are well aligned with the real scenes. Keypoints are detected in the projected images . Note that all the tests involved moving the online camera to areas outside of the areas viewed during the offline stage. colored 3D point clouds are projected into 2D images from predefined virtual viewpoints to form a set of synthetic view images. First. This system consists of two major steps. As in Section 7. This system shares many common components with the SFM-based system. and camera pose is estimated for rendering virtual content. camera pose and tracking fail immediately. but relatively sparse 3D point cloud models of varied scenes for use in visual tracking. the system is able to continue tracking. Figure 7. even the best matches can have an error of a few pixels. the system is unable to generate database matches.5 shows sample pose tracking and augmentation results for all three test scenes. the accumulated tracking errors are corrected. These are registered in a common coordinate system. correspondences matching. significantly improving the quality of the final results. The ability to produce a larger set of keypoints is a significant advantage. Using projection matches. Figure 7. 7. This section describes a tracking system that utilizes active LiDAR to produce dense point cloud models for object recognition and camera pose tracking. and pose recovery.164 Fundamentals of Wearable Computers and Augmented Reality Since the set of keypoints detected during the online stage are a subset of those found when building the model. The system accepts an input of video images and an offline-acquired 3D point cloud scene model.6 shows a portion of a 3D colored point cloud model (Los Angeles downtown) captured by a ground-based LiDAR system.1 Automatic Estimation of Initial Camera Pose Initial camera pose is automatically estimated in a two-step process. Table 7. More keypoints lead to a better chance of recognizing the objects and computing accurate tracking camera pose during the online phase. The point cloud data represent the scene’s appearance and geometry. where each point has both position and color. however. This larger number of observations to estimate pose makes the computation more reliable and robust to individual errors.

1.6.1. . Rather than maximizing the number of inliers that are consensus to the pose hypothesis. as shown in Figure 7. The extracted features are reprojected onto the 3D point clouds by finding intersections with the first plane that is obtained through a plane segmentation method (Stamos and Allen. 7.Visual Tracking for Augmented Reality in Natural Environments (a) 165 (b) FIGURE 7.4. These keypoints are then projected back onto the point cloud data to obtain their 3D positions.1  Generate Synthetic Images of Point Clouds A set of virtual viewpoints are arranged to face the major plane surfaces of the point cloud model. and their visual descriptors are computed.6  Dense 3D point cloud model (a) captured by a ground LiDAR system and a camera image (b) of a corner in downtown Los Angeles.7b shows examples of synthetic images generated from the Los Angeles point cloud dataset shown in Figure 7.4. The final output is a set of 3D keypoint features with associated visual descriptors saved in the feature database for online matching and pose estimation. Nearby feature points are filtered so that proximate points with similar descriptors are merged into one feature. we make modifications as follows.4.2  Extract Keypoint Features Keypoint features and their associated SIFT (Scale Invariant Feature Transform) visual descriptors are extracted in each synthetic view image.1. 7. In a second step. 7.7a. A modified RANSAC method is employed to estimate the camera pose and remove outliers. the 3D point models are rendered onto each 2D image plane using ray casting and Z-buffers to handle occlusions. It is possible that the same feature is reprojected to different 3D coordinates from different synthetic views. its keypoint features are extracted and matched against the 3D keypoints in the feature dataset. the camera pose for an input video frame is estimated by corresponding image keypoints and the back-projected keypoints. Once viewpoints are defined. Each matched feature surface normal is computed and used for clustering features.3  Estimate Camera Pose Given an input image. Experiments show that six viewing directions and three viewing distances are sufficient for facade scenes with one major plane surface. Figure 7. These views uniformly sample viewing directions and logarithmically sample viewing distance. 2002).

7. Using the pose estimated in the previous iteration.7  (a) Virtual viewpoint arrangement and (b) synthetic images produced from a 3D point cloud. we want to maximize the value of N2 [ R | T ] = arg max N 2 (7. Inliers are clustered according to their normal directions so that the inliers with close normal directions are grouped as the same cluster.4. Among all the hypothesized poses. Let N1 and N2 be the number of inliers for the largest two clusters.2 Camera Pose Refinement The estimated initial pose is iteratively refined by progressively incorporating more keypoint features into the camera pose estimation. additional feature correspondences are generated and used to . This avoids the condition where all features lie in a single plane making the calculated pose unstable and sensitive to position errors.4) [R | T ] This promotes solutions with inliers that lie in different planes.166 Fundamentals of Wearable Computers and Augmented Reality (a) (b) FIGURE 7.

In the first iteration. respectively s is a scale factor compensating the reflectance or lighting effects s will take the value so that the Equation 7. The middle column shows the rendered point cloud image aligned with the camera image.5) i where I3D (i) and I2D (i) are descriptors for the ith feature on the projected image and input image. The search size is reduced by half in each iteration to 4 × 4 (16 pixels) as more accurate pose estimates are obtained. The left column of the Figure 7. This indicates that sufficient correspondences have been obtained after three iterations for computing a stable and accurate camera pose. For each feature point.4. The alignment accuracy depends on the accuracy of the camera pose estimate. Harris features are less distinctive but much faster to compute.6) I 3D (i )2 � � So final pose optimization is to minimize the error equation [ R | T ] = arg max E (7. 1 {i} E= ∑ (s ⋅ I 3D (i ) − I 2 D (i ))2 (7.Visual Tracking for Augmented Reality in Natural Environments 167 generate a new improved pose estimate. SIFT features are used. Alignment errors are clearly reduced after each iteration.7) [R | T ] 7.5 is equivalent to E= ∑ i � I 3D (i )2 ⋅ I 2 D (i )2 � 2 � I 2 D (i ) − � (7. H is set to 64 pixels. We search for corresponding points within the neighborhood of H × H pixels. .3 Experiments Experiments evaluate the system’s performances with various real data. A pixel difference image is shown on the right column. The number of iterations needed is usually small. Figure 7. but Harris features are employed in the following iterations to improve processing speed. Reprojection error in the image domain is measured to evaluate the accuracy of camera pose estimation. The pose refinement is accomplished through an optimization process that minimizes an error function of feature descriptors derived from the input image and the projected images of point clouds.8 demonstrates the behaviors of iterative pose estimation and refinements of the tracking system. Initially. a normalized intensity histogram within an 8 × 8 patch is computed as a feature descriptor and used for matching. Our experiments show that measured projection errors often remain constant after three iterations.8 shows the rendered image of the 3D point cloud model from the estimated camera pose.

8  Iterative estimation of camera pose: the left column shows input 3D point cloud model rendered from the estimated camera pose. accurate camera poses are estimated so the virtual models are well aligned with the real scenes.168 Fundamentals of Wearable Computers and Augmented Reality FIGURE 7. Figure 7. After three iterations of pose refinements. . An initial camera pose is automatically obtained from keypoint matches. the middle column shows the model image aligned with the camera image.9 shows another example of tracking and augmentation results for a video image using a 3D point cloud model. and the right column shows the pixel-difference image that illustrates the accuracy of pose estimation.

In particular. Scenes with arbitrary geometry can be captured. These new approaches are based on a tracking via recognition strategy that employs simultaneous tracking and recognition. recognized. tracked.5  SUMMARY AND CONCLUSIONS This chapter describes methods for visual tracking to support augmented realities in natural environments. allowing accurate alignment of the 3D model with the image (d). Within the same framework. Experiments demonstrate the advantages of the integrated techniques in comparison to traditional approaches that treat recognition and tracking as separate processes. two tracking systems are presented. Jonathan Mooser.Visual Tracking for Augmented Reality in Natural Environments (a) (b) (c) (d) 169 FIGURE 7. 7. The first employs robust SFM to build a sparse point cloud model of the target environment. and pose estimation provide the robust motion and camera pose tracking needed for natural settings. Integrated feature matching. and . visual recognition. we relied on the works of Dr.9  A video image (a) and a 3D point cloud model (b) rendered with an initial camera pose. The second system addresses the use of dense point cloud data captured with a laser scanner. and augmented. ACKNOWLEDGMENT Much of this work is the PhD research of members of the Computer Graphics and Immersive Technology (CGIT) lab at the University of Southern California. Final pose is estimated after three iterations (c). Wei Guan. Dr.

2011. J. Enabling large-scale outdoor mixed reality and augmented reality. IEEE Computer Graphics and Applications. B. Nokia. . 2653–2660. U. Toronto. M. I. China. 2013. October 22–25.. 8(47): 1–15. Feiner. Online camera pose estimation in partially known and dynamic scenes. GA. W. U. and Billinghurst. S. Reliable fiducial detection in natural scenes. 60: 91–110. Azuma. and Neumann. M. pp. Grasset. W.. and Neumann.. S. 10(6): 599–612. Robust camera pose estimation using 2D fiducials tracking for real-time augmented reality systems.. Mooser... Hsiao.. Communications and Applications (TOMCCAP). Visualization.. IEEE International Conference on Multimedia and Expo. May 11–14. Singapore. You. and Neumann. p. Basel. REFERENCES Ababsa. 2004.. and Neumann. the National Geospatial-Intelligence Agency (NGA). the Office of Naval Research (ONR). July 30–August 2. Switzerland.. Mooser. Stroila. and Transmission. You. M.. Recent advances in augmented reality. June 16–18. Making specific features less discriminative to improve point-based 3D object recognition. Atlanta. Mooser. IEEE Conference on Computer Vision and Pattern Recognition. S. Korah. 1. M. 2006. and Mallem. 2000. Santa Barbara. V.. September 23–27. S.. Proceedings of European Conference on Computer Vision.. July 9–12. pp. Lowe.. 2004. including the U... 431–435. Comport. U... 2006. Proceedings of the 2004 ACM SIGGRAPH International Conference on Virtual Reality Continuum and Its Applications in Industry. San Francisco. G.. pp.. June 18–20. and White. and Korean Air Corp. Pressigout. R. October 26–29. 391–398.. S. Cho. E. Efficient matchings and mobile augmented reality. 21(6): 34–47. M.. Distinctive image features from scale-invariant keypoints.. PRESENCE: Teleoperators and Virtual Environments.. Parameswaran.. You. A. Jiang. pp.. F. Estimation of camera pose with respect to terrestrial LiDAR data. Asian Conference on Computer Vision. H. and Hebert. FL.. International Symposium on Mixed and Augmented Reality. 1637–1640. Real-time markerless tracking for augmented reality: The virtual visual serving framework. W... 469–480. NASA. November/December 2001. Quan Wang. U. DARPA. 2004. IEEE Workshop on the Applications of Computer Vision (WACV). IEEE International Conference on Multimedia and Expo. D. Marchand.. Fast simultaneous tracking and recognition using incremental keypoint matching. G. 2008. Claus. January 15–17... September 2012. Guan. and Fitzgibbon... 56–65. Camera tracking for augmented reality media. 1–10.. U. Army Research Office (ARO).. Prague. D. S.. B. Murphy. D. You. Airbus. S. 12(4): 615–628. and Neumann. Neumann. IEEE Transactions on Visualization and Computer Graphics. special issue on 3D Mobile Multimedia. and Chaumette... Y. and Pang.. Bleser. A dynamic programming approach to structure from motion in video. and MacIntyre.170 Fundamentals of Wearable Computers and Augmented Reality Dr. S. International Symposium on 3D Data Processing. E. S. Collet. Canada. J. CA. Xian. July 2006. Feiner. June 13–18. Wuest. J. Guan. Ontario. International Journal of Computer Vision. December 2001. S. ACM Transactions on Multimedia Computing. T. F. G. You. CA. Multi-ring fiducial systems for scalable fiducial-tracking augmented reality. pp. pp. Tricodes: A barcode-like fiducial design for augmented reality media. Tampa.S. You.. A. Julier. pp. R. Behringer. Baillot... R. U.. 2010. International Symposium on Mixed and Augmented Reality. We are also grateful for the current and former project sponsors. and Stricker. Y. 2009a. D. 1301–1304. pp. New York..

. October 13–16. P.. and You. R.. Visualization and Transmission (3DIMPVT). A mobile markerless AR system for maintenance and repair.. Stamos. Processing. International Symposium on Mixed and Augmented Reality. Applying robust structure from motion to markerless augmented reality. February 2012. Aanjaneya. October 22–25. 3D city modeling from street-level data for augmented reality applications.. J. pp. I. H. 1–8. Berclaz. 88(2): 94–118. G. V. Kurata. Cambridge.. Shibata. and Allen. 2010. T. pp... Santa Barbara. 2002.. CA.. Hedau.. H.. F. Korea. U. Object detection and pose tracking for augmented reality: Recent approaches. Korah... 238–245. Geometry and texture recovery of scenes of large scale.... pp. Meier. October 13–15. 2012. Uchiyama. and Uchiyama. International Conference on 3D Imaging. Seoul. pp. T. H. Pylvänäinen. Heibel. An intermediate report of Trakmark WG International voluntary activities on establishing benchmark test schemes for AR/MR geometric registration and tracking methods. 105–108. T. K.. Computer Vision and Image Understanding. 125–134. S. D. D. and Wang. Kawasaki.. 18th Korea-Japan Joint Workshop on Frontiers of Computer Vision. and Grzeszczuk. 298–302. and Grollmann. Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR). 1(1): 53–64. B. Neumann. pp.. 2006. Mulloni.. Pose tracking from natural features on mobile phones. T. U. and Schmalstieg. December 7–8. T. P. Neumann.. 2008. You. and Marchand. Reitmayr. September 15–18.K. Modeling... S. . J. Drummond. pp.. 1999. Zurich. Switzerland. Q. M. Japan. J. A... U. 1–8. Wagner. S. IEEE Workshop on Applications of Computer Vision (WACV).. Platonov. H. Natural feature tracking for augmented reality.. UT.. 2009b. International Symposium on Mixed and Augmented Reality. Ikeda. IEEE Transactions on Multimedia. Snowbird.Visual Tracking for Augmented Reality in Natural Environments 171 Mooser.


..3 Map Alignment................................................................................................. 191 8......2 Feature-Based Method....................3 Sensor Integration.9 Discussion..........................7.............................................................5...5............................................................... 182 8................................... 188 8.......................................................................... 187 8....................................................... 179 8.......3 Augmentation Examples........4...................................... 179 8..................................................................... 176 8................1 Image Representation...1 Image-Based Method............ 183 8.........................................4 Pose Update................ 186 8...........8........ 185 8...................................1 Introduction......................5 Tracking the Model................................................2 3D Reconstruction Pipeline..........................................7 Server/Client System Design...............................................1 Speed.......... 184 8..............................................6 Live Keyframe Sampling............................... 182 8................................................................................. 189 8................................................2 Camera Model.............................5....7...2 Ground Plane Determination.............. 192 References......... 180 8.......... 189 8...2 Accuracy Tests with Differential GPS...............3.......................8...................................................................................5......3 Automatic 3D Modeling.....................................................................4........................................................................ 177 8........................8 Urban Visual Modeling and Tracking Jonathan Ventura and Tobias Höllerer CONTENTS 8.......................4 Semiautomatic Geo-Alignment.......................... 181 8............ 186 8..............1 Server/Client System Overview.. 185 8..............5........................................ 190 8..................................................................................................................................6 Tracker Initialization............................................ 181 8.................. 181 8..8......6.......................... 177 8....5 Success Metric...................................................6....................................................................2 Latency Analysis...........................3 Point Correspondence Search..4.... 174 8.....1 Guidelines for Capture..........................................................................................................................1 Image Extraction............................ 192 173 ........................................ 189 8.......................1 Vertical Alignment...........7..........5.......................................... 184 8.............................................. 176 8.......................10 Further Reading..........2....................................2 Outdoor Panoramic Capture.......................3......... 186 8............ 182 8.........8 Evaluation........................................................................

Sections 8. and track large outdoor spaces so that they can be used as environments for mobile-augmented reality (AR) applications. the software on the mobile client device tracks 3D feature points in the live camera view to continuously determine the position and orientation of the device. the system uses the surrounding buildings for accurate device positioning in the same way that printed markers are used indoors.1 INTRODUCTION This chapter explains how to digitally capture. Section 8. Third. Conversely. Device positioning is a common prerequisite for many AR applications. This chapter describes a system that applies these principles to model a large outdoor space and track the position of a camera-equipped mobile device moving in that space. Experimental analysis demonstrates that real-time localization with high accuracy can be achieved from models created using a small panoramic camera. however. The global positioning system (GPS) provides ubiquitous device tracking from satellites. visual detection of flat. except at a larger scale.174 Fundamentals of Wearable Computers and Augmented Reality 8.4 cover the outdoor modeling process. the area is captured in many photographs that cover the environment from all possible viewpoints. as depicted in Figure 8. This chapter presents an alternative approach that treats the built environment like an existing visual marker. Sections 8. The second case is useful for determining the location of a camera with respect to that model. and their 3D positions are precisely determined in an iterative process. at least for a small workspace. as depicted in Figure 8.5 through 8. Having images of the same point from multiple known camera positions allows us to determine the 3D location of the point. but does not guarantee enough accuracy on consumer-level devices for AR applications. The first case is useful for building a 3D model of an environment.2 through 8.1. Indoors. Visual modeling and tracking technology is based on the relationship between points in the scene and cameras that observe them. After computing a tracker initialization on the server.7 describe the outdoor tracking system. observing multiple known points in a single image allows us to determine the position of the camera. the 3D points are aligned with a map of building outlines to provide the reconstruction with a scale in meters and a global position and orientation. In larger spaces. feature points are extracted and matched between images. The three-dimensional (3D) visual model of the environment is used as a database for image-based pose tracking with a handheld camera-equipped tablet. Researchers in the fields of photogrammetry and multiple-view geometry have studied the equations and principles governing these relationships extensively. By detecting and tracking landmark features on the building facades. from this collection of photographs. Outdoors. printed markers has proven to be a very successful method for accurate device positioning. Second. The following sections provide detailed descriptions and evaluations of these system components. model. The process for preparing an outdoor environment for tracking usage in AR applications involves three basic steps.2. First. The resulting 3D reconstruction is stored on a server and transferred to the client device. we cannot require that the environment be covered in printed markers or surrounded by mounted cameras. external tracking systems allow for precise positioning by use of statically mounted cameras that observe objects moving in the space.8 provides .

FIGURE 8. The estimate is generally more accurate when the points are nearer to the camera. The distance between the cameras is called the baseline.Urban Visual Modeling and Tracking 175 FIGURE 8. .1  Triangulation of a 3D point from observations in two cameras.2  Localization of a camera from three 3D point observations. the estimate is more accurate with a larger baseline.

an appropriate sampling rate is about two panoramas per second. and where to take them. which has six cameras. Finally. which means that the images cannot be automatically matched together. meaning that the pictures should be about 4 m apart if the buildings are 10 m away. 2012b). Experiments with the iPad 2 and the Point Grey Ladybug camera have shown that reasonable localization performance can be expected in a range within a quarter of the distance to the buildings from the offline panorama capture point (Ventura and Hollerer. For walking speed in building courtyards.10 gives an annotated reference list for interested readers who would like to explore related work. The scale-invariant feature transform (SIFT) descriptor (Lowe.1. and Section 8. However. . which has just two cameras.176 Fundamentals of Wearable Computers and Augmented Reality quantitative evaluations of the system.1  Guidelines for Capture By simply walking through an environment and capturing panoramas at regular intervals. Alternatively.9 provides a discussion of the overall system design and performance. a tripod or monopod attached to a backpack serves as an easy mounting system. To answer this. This corresponds to having an angle of about 10° between the rays observing a point. The distance between two cameras is called the baseline. images are extracted at a fixed rate in order to have regularly spaced panoramas from the video to use in the reconstruction pipeline. as shown in Figure 8. However. placed back to back. A larger baseline gives a more accurate point triangulation. some care should be taken during the capture process. the matching works better with smaller angles. Examples of such cameras include the professionalgrade Point Grey Ladybug. if the images are too far apart. the environment to be augmented is easily captured in images. A simple rule of thumb is that the optimal distance ratio is about 4/10.2  OUTDOOR PANORAMIC CAPTURE An easy way to capture many different viewpoints of a large environment is to use a panoramic or omnidirectional camera. However. Section 8. the limiting factor is the ability of the image points to be matched. Triangulation of 3D points depends on multiple observations of a point taken from images in different locations. 8. Such a camera captures all viewing angles from one position in a single capture. which we use for matching. if the camera can be remotely triggered. Again. 2004). with very wide-angle lenses.2. When recording panoramic video. then the appearance of the object will change too much. 8. A consumer-grade panorama camera is light enough to be held overhead in one’s hand. The appropriate interval to achieve the desired 4/10 ratio depends on the speed of motion and the distance to the buildings. and the consumer-grade Ricoh Theta. the characteristics of vision-based 3D reconstruction should be taken into account. This depends on both the angle of view and the change in scale. is reported to match well with up to 45° of out-of-plane rotation. A second consideration is the expected distance between the offline-captured panorama images and the user’s location during online use of the AR application. in order to ensure the success of the later reconstruction and augmentation steps. The main issue to consider is how many pictures to take.

since they generally have no usable texture. However. the low field of view in each image hinders the matching and reconstruction process. they nonlinearly distort the perspective view. with 90° horizontal and vertical fields of view in each image. as found on a typical mobile device. This section gives some details about how the pipeline works. in practice. Figure 8. such that the collection of extracted views covers the entire visual field. The faces provide overlapping views.3. which would impact performance when matching to images from a normal perspective camera. 8. This pipeline takes the image sequences as input and outputs the estimated camera positions and 3D triangulated points.Urban Visual Modeling and Tracking 177 8. the image collection is processed in an automatic 3D modeling ­pipeline. (a) FIGURE 8. Mappings such as spherical and cylindrical projection offer a continuous representation of all camera rays in one image. perspective views are extracted from each panorama. The top and bottom of the cube are omitted. However. A typical cube map used as an environment map in rendering uses six images arranged orthogonally.  (Continued) . which increases the likelihood of matching across perspective distortion.3  (a) An example panorama represented in spherical projection. The collection of estimated points is called a point cloud. Eight perspective views per panorama are used to increase image matching performance by ensuring that all directions are covered in a view without severe perspective distortion. perspective images with wider than 90° horizontal field of view are used. Instead of using spherical or cylindrical projection. The views are arranged at equal rotational increments about the vertical camera axis.1 Image Extraction There are several common panoramic image representations that could be used to store the image sequences.3  AUTOMATIC 3D MODELING After canvassing the area to be modeled and collecting imagery from many viewpoints. To address this issue. This representation offers perspective views without distortion.3 shows an image from the panorama camera and its extended cube map representation.

3 (Continued)  (b–c) Images extracted from the panorama using perspective projection. (Continued) . (d–e) Images extracted from the panorama using perspective projection. (f–g) Images extracted from the panorama using perspective projection.178 Fundamentals of Wearable Computers and Augmented Reality (b) (c) (d) (e) (f ) (g) FIGURE 8.

exhaustive pair-wise matching is employed to test all possible correspondences. Matches are verified by finding the essential matrix (Nistér.179 Urban Visual Modeling and Tracking (h) (i) FIGURE 8.3. The relative rotation between perspective views in a single panorama is fixed. This external orientation. the perspective views from the panoramas are processed in an incremental structure-from-motion pipeline. Otherwise. which produces a 3D point cloud from image feature correspondences. without loop closures. reconstruction of an initial pair. and bundle adjustment is performed. 2004).2  3D Reconstruction Pipeline After extraction. 2004) relating two panoramas using a progressive sample consensus (PROSAC) loop (Chum and Matas. However. The SIFT detector and descriptor is used for feature matching (Lowe. With a geo-aligned reconstruction it is then possible to display geo-referenced information such as map. and only the rotation and translation between panoramas is estimated and refined. This pipeline has four major steps: pair-wise panorama matching. match verification. this process cannot recover the external orientation of the reconstruction. If a linear camera path is assumed.4  SEMIAUTOMATIC GEO-ALIGNMENT Image-based reconstruction by itself produces a metric reconstruction that is internally consistent. more points are triangulated. 8. Other applications which do not display geo-referenced information still benefit from geo-alignment. 2005). and the geographic positions of the cameras and 3D points. panoramas are incrementally added. After triangulating points using an initial panorama pair. because it can be used to determine the device’s height off the ground and the scale and orientation with which 3D models should be displayed. . and incremental addition of the remaining panoramas. is very useful for many kinds of AR applications. 8. panoramas are only matched to their neighbors in the sequence. the scale in meters.3 (Continued)  (h–i) Images extracted from the panorama using perspective projection. This is repeated until no more panoramas can be added. meaning the direction of gravity. however.

Each hypothesis is tested against all lines to determine their angular errors with respect to the vanishing point hypothesis. 2010) is applied and all lines with a minimum length of 25% of the image diagonal and an orientation within 45° off-vertical are accepted. After finding the common vertical vanishing point.. The LSD line segment detector (Von Gioi et al.4  Lines on the buildings (in white) are used to determine a common vanishing point and align the vertical axis of the reconstruction with the direction of gravity. This is determined by two rotation angles that transform the reconstruction to make the negative Z-axis aligned with the direction of gravity. 8. 1)T is determined and applied to the reconstruction. the semiautomatic geo-alignment procedure described here is employed to determine the external orientation of the reconstruction. To automatically estimate this alignment. a common vertical vanishing point for all images is determined.180 Fundamentals of Wearable Computers and Augmented Reality To enable these benefits in our AR applications. Vertical vanishing point hypotheses are generated by repeatedly sampling a pair of lines and finding their intersection point.4 shows an example of vertical lines found to be inliers in one image. roughly vertical line segments are extracted from all images. . 0. FIGURE 8.1  Vertical Alignment The first step of the alignment procedure is to determine the vertical orientation of the reconstruction.4. Then. Figure 8. and that there are sufficient upright structures having such lines in the images. The hypothesis with the greatest number of inliers is selected as the common vertical vanishing point for all images. Using only roughly vertical lines makes the assumption that the images are taken with a roughly upright orientation. the rotation which brings this point to vertical (0.

if the point-line distance is greater than 4 m. Each point is assigned to the nearest building wall according to the 2D point-line distance. then the match is discarded. 8. the current camera image. near the lower end of the range. The user interactively rotates.4.2  Ground Plane Determination Once the reconstruction is vertically aligned. the ground plane is determined by considering the Z-coordinate of all 3D points. and reinitialization after failure. The pose prior is provided by the previous tracking iteration. The tracker takes as input a pose prior. the ground plane should be a peak in the histogram of Z values. The rotation. orthographic view of the reconstruction with a map of building outlines from the area. Assuming that the reconstruction contains many points on the ground. automatic nonlinear optimization is applied to determine the best fit between 3D points and building walls. However. The entire optimization procedure is repeated until convergence to find the best point-line assignment and 3D alignment. .6. The ground height will then be manually tuned by inspecting the reconstruction visually and confirming that the estimated ground plane meets the bottom edges of buildings. An initialization for these remaining transformation parameters is determined manually by visually comparing an overhead. 8. and the metric scaling of the reconstruction.5. A simple interactive tool renders the point cloud and building outlines together. which can be freely downloaded from OpenStreetMap. translation. a translation on the X–Y ground plane. and scale parameters are iteratively updated to minimize the point-line distance of all matches.4. Instead. and outputs a pose posterior that estimates the current device position and orientation. translates.Urban Visual Modeling and Tracking 181 8. or the projection of the point onto the line does not lie on the line segment. The continuously operating tracker maintains in real time the pose of the mobile phone or tablet with respect to the model.3 Map Alignment Now there are four remaining degrees of freedom left: a rotation about the vertical axis. the mobile device’s position is determined in real time by identifying and localizing feature points observed by the device’s camera. is provided by the procedures discussed in Section 8. After the user determines a rough initialization. The panoramas for this reconstruction were captured by walking in a straight line through the center of the Graz Hauptplatz courtyard while holding the Ricoh Theta camera overhead.5  TRACKING THE MODEL Given a 3D point cloud reconstruction of the scene. and sensor readings. Tracker initialization. so the absolute minimum value should not be used as the ground height. and scales the reconstruction until the points roughly match the buildings. An example reconstruction aligned to OpenStreetMap data is shown in Figure 8. the height of the ground is initialized to the 80th percentile Z value. using the Huber loss function for robustness to outliers. Erroneous points in the reconstruction might lie under the ground.

5. This model has only one parameter.1 Image Representation We refer the images extracted from the panoramas as keyframes.182 Fundamentals of Wearable Computers and Augmented Reality 150 100 Northing (m) 50 0 –50 –100 –150 –200 –200 –100 0 100 200 Easting (m) FIGURE 8. searches for features in a window around their expected positions. the stack of images at progressively smaller resolutions is stored and used to improve patch sampling during tracker operation. These keyframes are extended by preparing an image pyramid. which is determined in a precalibration step. First.3 Point Correspondence Search At each frame. 8. The center of projection is assumed to be the center of the image.5  Point cloud reconstruction aligned to OpenStreetMap building data. and the black points indicate triangulated 3D points. thus a simple pinhole camera model without a radial distortion term is sufficient to model it. the focal length.2  Camera Model Most mobile phones and tablets have a moderate field-of-view camera with low distortion. 8. The same model is used for the panorama keyframes.5. any points that are predicted to lie behind the camera or outside of the image are culled and not considered in further steps of the current tracking iteration. meaning that the image is repeatedly half-sampled.5. the tracker projects patches from the database of images into the camera frame according to the pose prior. 8. . and then updates the pose using gradient descent to minimize the re-projection error. The gray dots indicate panorama capture locations. which are generated by synthetic warping of the panorama images.

n3)T Ksource and Ktarget are the intrinsic calibration matrices of the source and target images. the search for correspondence is performed over an image pyramid of four levels. which was experimentally found to adequately separate correct and incorrect matches. the 3 × 3 perspective warp is Wi = Ksource (Ri + tivT) Ktarget where v = − D−1 (n1.7.5. n. All correspondences found during patch search are used to update the camera pose. n3. an 8 × 8 pixel template patch is extracted from a keyframe that observes the projected point. A perspective warp is used to compensate for the parallax between the source and the target. giving a measured location xi for point Xi. n2. The location with the best score is accepted if the score exceeds a threshold of 0. We search for this location by computing the normalized cross-correlation between the template patch and patches sampled from the target image at all locations on an 8 × 8 grid around the location given by the pose prior.4 Pose Update After searching for correspondences. If the target and source projection matrices are P = [I | 0] and Pi = [Ri | ti] respectively. The template patch is used to search in the target image for the point’s current projected location in the image. the camera pose estimate is updated to fit the measurements. respectively The determinant of the warp |Wi| gives the amount of scaling between source and target. and the plane p = (n1. For each observed point. To increase robustness to fast movements. D)T. we project its search location down to the zero pyramid level. and its normal.183 Urban Visual Modeling and Tracking For each point that passes the culling test. n2. Ten iterations of gradient descent over an M-estimator are used to minimize the re-projection error of all points: e= ∑m( y − x ) i i 2 i where yi is the projected location of Xi using the current pose estimate m(u) is the Tukey loss function (Huber. This perspective warp is determined by the 3D point. 8. 1981) The parameters of the Tukey loss function are recomputed to update the weights after each of the first five iterations. even if they were not successfully refined to the lowest pyramid level. X. The system also chooses the best level of the source image pyramid according to the resolution required to when sampling the template patch. This warp is computed for all keyframes that observe the point. . This keyframe is called the source. and the camera image is called the target. the keyframe with warp scale closest to one is chosen as the source. where n·X + D = 0. This ensures the best resolution when sampling the template patch.

A solution to this problem is to randomly order the points once at system startup. 8. keyframes are collected from the current video stream and added to the set of images used for patch projection. ordering for the points at each frame. and the number Nnew of points projected from a new keyframe. this can lead to pose jitter when the camera is not moving. The system first performs view frustum culling on all points.5. To ensure an acceptable frame rate. the feature correspondence between two nearby images from the same camera is less noisy. In contrast. However.5 Success Metric After the pose update. a limit Nmax is placed on the number of points Nattempted that the tracker can attempt to find in a single frame. that is. Direct alignment of the mobile device camera image to the panorama keyframes can cause poor feature correspondence which leads to inaccurate or jittery pose estimates. Errors in the patch search can prevent the pose update from correctly converging. or when the distance to the nearest new keyframe rises above 2 m. The system requires that at least 100 points have been successfully tracked (Nfound ≥ 100). the number Nfound of points found to be inliers by the M-estimator is counted. The point selection is performed by choosing some ordering of visible points and selecting the first Nmax points from the ordering to be tracked. This provides a fixed.6 Live Keyframe Sampling A second source of pose inaccuracy is poor feature correspondence. . because of slight errors in patch search. and then selects Nattempted ≤ Nmax points to search for.5. whether the pose posterior matches the true pose of the camera. To address this problem. When the ratio Nnew/Nold drops below 50%. This is most likely because of the difference in imaging characteristics of the two cameras. The tracker preferentially projects patches from the new keyframes. as these lead to more stable pose estimation. this sampling procedure reduces pose jitter in comparison to sampling a new random ordering of points at each frame. The question is. but random. the tracker will reach a steady state where the same subset of points is used for tracking and pose update at each frame. The reason is that using different subsets of the points for tracking may result in slightly different poses found by the gradient descent minimization. and the inlier measurements are associated to the corresponding 3D points to be used for future patch projection. such as focal length and sharpness. live keyframe sampling is incorporated into the tracking system.184 Fundamentals of Wearable Computers and Augmented Reality 8. a new keyframe is sampled. Overall. During tracker operation. which ordering ensures the best tracking performance? One commonly used approach is to randomly shuffle all the visible points at each frame. The result is  that for a static or slowly moving camera. This is an indicator of the success of the tracker. The decision of when to sample a new keyframe is based on the number Nold of points that are projected from a panorama keyframe in the current camera image.

The region of convergence for the tracker is too small to make it feasible to use the GPS and compass reading as an initialization (see Section 8. The task of visual localization is challenging in the outdoor case. A cache of recently seen images is stored along with their known poses.1 Image-Based Method The image-based localization method is relatively simple and is easily implemented. for example. This same down-sampling is applied to the current query image from the camera. Instead. visual localization procedures are employed that are capable of determining the camera pose within a wide range of possible positions. however. 2007). but requires significant computation as well as storage for the descriptor database.5 is fast enough for real-time operation. a tracked image is added to the cache when tracking is successful and the closest keyframe in the cache is more than 1 m different in position or 45° different in orientation. it requires the query camera to be relatively close to a cache image. However.7. The way these methods are combined is explained in the system design overview given in Section 8. Beyond a small amount of translation or rotation. a variant of the small blurry image (SBI) matching procedure is used (Klein and Murray. The pose of the image with the highest correlation score is used as the initialization for the tracker in the next frame. This section presents two different methods for tracker initialization and reinitialization after tracking failure.2). During tracker operation. Thus. The image-based method is fast enough to be computed in real time but is limited in range.6.7.6  TRACKER INITIALIZATION The patch tracking method described in Section 8. When tracking a desk. the system matches the current image to the cache to find pose hypotheses. this method is impractical for outdoor localization in a large space. Then each cache image is compared to the query image using the normalized cross-correlation score (Gonzalez and Wood. This is because the range of possible views is large compared to an indoor setting. The feature-based method offers a more robust solution to the visual localization problem. the camera could be far from the original point of capture. Images in the cache are down-sampled to the fifth image pyramid level (meaning that they are half-sampled four times). but requires an initial position to start the iterative tracking procedure. 2008). . 8. but the building would still be visible because of its size. To find image matches. This mandates localization strategies that are robust to large changes in perspective and scale. When the tracking system fails and enters the lost state. the query frame will not match to any cache image. The image cache is generated during tracker operation and can be saved for reuse in future tracking sessions. the camera can reasonably be expected to move within a small range of distances from the desk surface. In a building courtyard or street-side setting. The SBI localization method is suitably fast even for a large number of cache images. unless a very dense coverage of cache images is acquired.Urban Visual Modeling and Tracking 185 8. The best matching image is used as a pose prior to start the tracker at the next frame.

First. . the nearest-neighbor matching and pose estimation procedure described earlier is performed. there must exist in the database a single view that has enough visual overlap with the query for the procedure to work. the system extracts features and searches for correspondences in the panorama keyframe database. 2003). The top K documents are then subjected to geometric pose verification to find a suitable match..1 Server/Client System Overview The overall system design is illustrated in Figure 8. 1981). An alternative approach is to apply a document retrieval technique (Sivic and Zisserman. 8. For each top-ranked document.7. The point cloud and the keyframes are copied to the mobile client device. Each feature from the query image is matched to its nearest neighbor in the set of all features in the database according to the Euclidean distance between SIFT descriptors. The feature-based localization component is computed on a remote server or computing cloud. The online tracking and image-based reinitialization components are computed directly on the mobile client device. 8.186 Fundamentals of Wearable Computers and Augmented Reality 8. 2009). developed a method to increase the set of views in the database synthetically. 2005) and the three-point absolute pose algorithm (Fischler and Bolles. 2004). as these are relatively lightweight operations that can be computed in real time on such restricted hardware. Then. a robust sampling procedure is used to find a subset of inlier correspondences that support a common pose estimate. Given a query frame. The system’s preparation and its online operation procedure are described in more detail in the following. using only features from the single image that was retrieved.2 Feature-Based Method Alternatively. which increases the range of the image retrieval technique (Irschara et al. Then the camera pose is robustly estimated using the PROSAC procedure (Chum and Matas.6.7  SERVER/CLIENT SYSTEM DESIGN This section gives an explanation of how the various components described earlier are organized into a complete outdoor tracking system for AR applications. However. Irschara et al.6. the descriptors are not needed for tracking and thus do not need to be copied onto the client device. Approximate nearest-neighbor search is performed using a kd-tree for speed (Lowe. 2006) is used to hierarchically organize the descriptors so that each descriptor is identified by its cluster (or word). 2003). where storage and computation are essentially unrestricted. feature matching can be used to extend the range of poses where localization can be achieved. standard tf-idf weighted document matching is applied to order the keyframes by similarity (Sivic and Zisserman. Given a query image. A vocabulary tree (Nistér and Stewenius. the omnidirectional video is processed in the offline reconstruction process to produce the 3D point cloud model with panorama keyframes and feature descriptors. This image retrieval approach scales with database size better than the performing nearest-neighbor matching with entire database.

the computed pose is used to restart tracking. then the system generates a localization query that is sent to the server over the wireless network. the system tries to track the model using the previous pose estimate. then the image cache is searched using the current camera image. While the server processes the query. During this time. the camera might be moved from its query position. When the query response is received. feature-based localization introduces latency between an image query and a pose response. If tracking fails.2 Latency Analysis Due to network communication time and server computation time. First. the system needs some ability to handle an outdated pose estimate from the localization system. The tracking system runs in real time on the client device in the following loop. the system continues attempting to restart tracking using the image cache.187 Urban Visual Modeling and Tracking Omnidirectional video Stored in server Server Offline reconstruction Wireless network Localization request Orientation sensors Camera Localization response Patch projection 3DOF relative rotation Keyframes and 3D points Tracking Video stream Copied to client Live keyframe sampling Dynamic 6DOF absolute pose AR display Mobile device FIGURE 8. The incremental rotation estimate provided by the inertial sensors in the device is preapplied to the previous pose estimate to compensate for fast motion. If this fails. .6  Tracking system overview with server/client design. Tracking is then tried again using the pose prior provided by the best match from the image cache.7. Thus. 8. introducing error in the localization pose estimate.

the maximum translation would be about 1/3 m. Thus. This analysis suggests in general that the complete time for the localization query to be sent. We use a simplified analysis here by considering movement in one dimension. However.7. the maximum rotational pose error is qmax = 1. 8. the inertial sensors in the device are used to maintain an estimate of rotational movement. are too noisy to be used for estimating translation. Given a building that is 12 m away. For the translation case. A similar approach could be applied to estimate translational movement based on accelerometer readings. .1. This as well would be a limitation for localization. the maximum translation is tX /Z = 0.90. Fortunately. a rotational error of qerr degrees will cause a pixel offset of xerr pixels: xerr = f tan(qerr) where f is the focal length parameter of the camera’s intrinsic calibration matrix. the maximum translation tX depends on the distance Z to the observed object: xerr = ftX Z For the iPad 2 camera.03. The system uses an effective search radius of 4·23 = 32 pixels. This limit could be a problem if localization latency is 1 s or more. given the distance a fast-walking user could cover in 1 s. The continuous pose tracker uses a patch search method to find a point given a pose prior. such as the iPad 2. Assuming rotation around the Y-axis (vertical axis). to produce an estimate of the tracker convergence region. The maximum projection error can be used to find the maximum rotational pose error qmax. processed. the accelerometer found in typical consumer devices.3 Sensor Integration To overcome the problem of rotational movement during the localization latency period. The estimated difference in rotation between the localization query and response is preapplied to the localization response before attempting to initialize the tracker. This is because generally the distance to the buildings is such that small translational movements do not cause significant parallax in the image. translational error during the latency period is not an issue in larger environments such as typical urban scenes.188 Fundamentals of Wearable Computers and Augmented Reality The region of convergence of the tracker determines the amount of error in the pose prior that the system can tolerate.55°. and is run over an image pyramid to expand the search region. and the Apple iPad 2 camera used for testing has a focal length of f = 1179. This search occurs over a fixed region around the estimated point projection location. This establishes a maximum pixel error in the projected point location that will still lead to tracker convergence. Timing data from our experiments is given in Section 8. and returned—the localization latency—should be within 1 s. even over a brief period.

02 ms per point). even with a 3G cellular data connection. The model tested has 21 panoramas. However. In practice we have experienced localization times of 2–3 s for a larger model. The total tracking time per frame depends on the total number of points in the model Ntotal. a differential GPS receiver was attached to the iPad 2. and 6823 features. 3691 points. the processing time should be as short as possible to provide a smooth user experience.8. as determined in Section 8.8 EVALUATION This section reports on evaluations of several aspects of the system and shows that it provides sufficient tracking performance to support many kinds of geo-referenced mobile AR applications.2 Accuracy Tests with Differential GPS To test the absolute positional accuracy possible with the system.005 ms per point). The speed of online tracking on the client device was evaluated using an Apple iPad 2 tablet.26 GHz Quad-Core Intel Xeon and 8 GB RAM. Differential GPS receivers use measurements from GPS satellites as well as a correction signal from a nearby base station in order . 8. Feature-based tracking on the mobile device consists of three steps that constitute the majority of computation time per frame: point culling (0. and frame rates of 15–20 fps tracking are achievable. However.033 ms per point). typically the number of points tracked decreases at each successive pyramid search level. the average localization latency is about one and a half seconds. since the processing happens in the background. This gives an approximate tracking time per frame: ttrack = Ntotal · tcull + Ntrack · L(twarp + tsearch) With multithreading on the dual-core iPad 2. the processing time is approximately reduced by half. Average timings were recorded using an Apple Mac Pro with a 2. this gives a maximum tracking time of approximately 117 ms per frame. 1024 tracked points. patch warp (0.Urban Visual Modeling and Tracking 189 8. Transfer time typically takes 30–40 ms using either a wireless or 3G connection. For a model with 3691 points. the processing speed could be greatly improved by using GPU implementations of the feature extraction and pose estimation steps. However.7. so the actual tracking time in practice is lower. This means that the server does not have to respond in real time. the number of points tracked Ntrack. and 4 pyramid levels. 8. and patch search (0. and ideally within 1 s.1 Speed Localization queries are processed on a remote server while the mobile tracker continues running.8. The time to transfer a JPEG-compressed image from the device to the server is not a severe bottleneck. and the number of pyramid levels L. Overall. Most of the computation time is spent on SIFT feature extraction (900 ms) and PROSAC pose estimation (500 ms).2.

The semiautomatic alignment method described in this chapter was used to georegister the model with respect to building outlines from OpenStreetMap.72 m in the easting direction and 0.7  Comparison of the camera position estimates from the visual tracking system with ground truth position estimates from the differential GPS receiver.3 Augmentation Examples Several prototypes have been developed and tested to evaluate the use of our modeling and tracking system for AR applications. Example screen captures from these prototypes are shown in Figure 8. A third prototype was created to test architectural rendering. The system achieved an average error of 0. to increase the realism of the rendering. to attain ground truth positional estimates with accuracy under 10 cm. A second prototype tests the use of video game graphics. The panoramic reconstruction of this area was made from 37 panoramas taken with the Ricoh Theta camera. Because the GPS receiver produces positional readings at a rate of 1 Hz.523 points. This shows that our system provides better accuracy than consumer GPS.5. In this application. A test video with the differential GPS receiver was recorded in the Graz Hauptplatz while observing the Rathaus (City Hall). linear interpolation was used to up-sample the signal to 30 Hz.7. the user can move around to view how the trees would look from different angles.38 m in the northing direction. The resulting reconstruction contains 14. The first prototype is a landscape design application. As trees are placed. 8. Using an assumed position of the sun. a reconstruction of a city street (Branch Street in Arroyo Grande. CA) was created by holding the panorama camera out on the sunroof of a car and driving down the street to capture . In a large courtyard on the UC Santa Barbara campus. the user can place virtual trees on the large grassy area between the buildings.8. accurate shading and shadows are rendered. which has an accuracy of about 3 m with a high-quality receiver. A comparison of the differential GPS track and the positional track created with our system is shown in Figure 8. An overhead view of the point cloud is shown in Figure 8. Here.190 Fundamentals of Wearable Computers and Augmented Reality –10 –30 GPS Tracker –35 Easting (m) Northing (m) –40 –45 –50 –55 –20 –25 –30 –35 –60 –65 GPS Tracker –15 0 500 1000 1500 2000 Frame 2500 –40 0 500 1000 1500 2000 2500 Frame FIGURE 8.8. a landing spaceship is rendered into another building courtyard on the UCSB campus at the spot on the ground where the user touches the screen.

Using simple rendering techniques such as shading and shadowing also helps to improve the perceived realism of the rendered graphics. testing. that confuse the visual localization system and lead to system failure. In addition. Experience with the prototype applications suggests that the pose estimation is of sufficient quality to make objects appear to stick to surfaces. the buildings on either side. (b) A spaceship landing in the courtyard. such that they seem truly attached to a wall or a ground. (c) Virtual lamps affixed to the side of the building. accelerometer. and further development at http://www. Then. . The approach enables high-accuracy tracking at real-time rates with consumer hardware. The major limitation of this approach is that the system is generally restricted to operation from viewpoints where the scene is visually distinctive and able to be recognized by its appearance. 8. gyroscope. many scenes contain repetitive textures. this is not the case. (a) Synthetic trees planted in the grass. One possible solution to these problems would be to further integrate other position and motion sensors. such as texture-less building walls. For many to complement the visual tracker. and the sky or the ground.jventura. rendered with lighting and shadow effects. such as a GPS receiver. a user standing on the sidewalk can add architectural elements such as virtual lamps to the building facades by simply touching the screen at the points on the wall where they should be placed.9 DISCUSSION From these evaluations.8  Example images of the tracking system in use with 3D models rendered over the camera image.191 Urban Visual Modeling and Tracking (a) (b) (c) FIGURE 8. The source code for the system described in this chapter is publicly available for download. such as grids of windows. it can be concluded that visual modeling and tracking offers a compelling solution to device pose estimation for mobile AR applications. and compass.

2005). O. D. In ISMAR’09 Proceedings of the 2009 Eighth IEEE International Symposium on Mixed and Augmented Reality (pp. 2010. who introduced a solution to the camera pose estimation problem as well as the Random Sample Consensus (RANSAC) method for finding a consistent set of observations from noisy data (Fischler and Bolles. geometric verification (Philbin et al. Sattler et al. Irschara. (1981). Snavely et al. 2007). as well as canonical references to learn more about this research area and other approaches to the problem. and Bolles. such as using a wireframe (Klein and Murray. Researchers in AR systems have also considered approaches to outdoor pose tracking that use the camera in combination with other dedicated position and velocity sensors (Oskiper et al. In the AR context. DC: IEEE Computer Society.. where points are triangulated and tracked simultaneously in an efficient manner (Klein and Murray. 2005 (Vol. 2011). Communications of the ACM.. 2012a. pp. (2006). The fundamentals of multiview geometry are discussed extensively in the essential textbook by Hartley and Zisserman (2004). Fischler.192 Fundamentals of Wearable Computers and Augmented Reality 8. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Matching with PROSAC-progressive sample consensus. (2009).. In CVPR 2005. Chum. Li et al. 2012). and Matas. C.. M. The 3D reconstruction pipeline is based on that of Snavely (2008). REFERENCES Arth. Wagner. 2006) or textured 3D model (Reitmayr and Drummond. DC: IEEE Computer Society. D. 2011. Most modern methods rely on the SIFT method to detect feature points and kd-trees for approximate nearest-neighbor feature matching (Lowe. 1. Klopschitz. More recently.. 2004)... 2006). One classic reference for camera pose estimation is that of Fischler and Bolles. A. Camera-based tracking also has a long history of research. M. Washington. R. and Schmalstieg. Wide area localization on mobile phones. Washington. 2009). The document-based approach to image retrieval was introduced by Sivic and Zisserman (2003) and expanded by others to include vocabulary trees (Nistér and Stewenius. and virtual images (Irschara et al. Alternatives to the pointbased approach are possible. 220–226). 2012). J. 2006). The tracking method described in this chapter is adapted from this work. one canonical work is by Lowe who used SIFT descriptors for initialization and tracking (Skrypnyk and Lowe. with modifications to handle cameras arranged in a panoramic rig. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. references are provided so that the interested reader can find more details about this work. 2009.. A. 381–395. the landmark work of Klein and Murray introduced Parallel Tracking and Mapping (PTAM). 73–82). Many researchers have also investigated more scalable approaches to feature matching (Arth et  al. (2005). More details about the system described in this chapter can be found in our research papers (Ventura and Hollerer. 24(6). This reference list is not intended to be exhaustive.. C.b) and Ventura’s doctoral dissertation (Ventura. . PROSAC is a more efficient variant of RANSAC and is applied in this work (Chum and Matas.10  FURTHER READING In this final section. but is instead a starting point for further investigation. 1981).. 2007). 2004).

A.. Germany: Springer-Verlag. In CVPR’06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. University of Washington. DC: IEEE Computer Society. (2004). Berlin. Washington. T. DC: IEEE Computer Society. pp. (2008). W. J. In ISMAR’04 Proceedings of the Third IEEE/ACM International Symposium on Mixed and Augmented Reality (pp. M. New York: John Wiley & Sons. P. 802–815). D.. G. D. Li. T.. . Going out: Robust model-based tracking for outdoor augmented reality. IEEE Conference on Computer Vision and Pattern Recognition. Klein. Leibe. (2007). A.. and Huttenlocher.Urban Visual Modeling and Tracking 193 Gonzalez. Seattle.. 1470–1477). Washington. J. In ICCV’03 Proceedings of the Ninth IEEE International Conference on Computer Vision (Vol. Digital Image Processing. 1–8). (2006). In ISMAR’06: Proceedings of the Fifth IEEE and ACM International Symposium on Mixed and Augmented Reality (pp. In ICCV’11 Proceedings of the 2011 International Conference on Computer Vision (Vol. Multiple View Geometry in Computer Vision. In ECCV’08: Proceedings of the 10th European Conference on Computer Vision: Part II (Vol. G. R. and Zisserman. D. and Kobbelt. Upper Saddle River. Manchester. D. S. 791–804). and Lowe. 2. A. (2007). G. 109–118). O. (2004). In ISMAR’07: Proceedings of the 2007 Sixth IEEE and ACM International Symposium on Mixed and Augmented Reality (pp. 756–777. Klein. 2007 (pp. In ISMAR’12: Proceedings of the 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (pp. Chum. I. C. Washington. In British Machine Vision Conference (BMVC’06). and Drummond. D. and Murray.. Frahm. R. Object retrieval with large vocabularies and fast spatial matching. Location recognition using prioritized feature matching. Washington. (2009). D. Hartley.. 225–234). 2. R.K. Multi-sensor navigation algorithm using monocular camera. DC: IEEE Computer Society.. DC: IEEE Computer Society. Sivic. Sattler. Reitmayr. Oskiper.. M. Cambridge. H.. Irschara. (2010). Huber. Distinctive image features from scale-invariant keypoints. 2599–2606). D. 60(2). WA. and Wood. Snavely. J. J. (2003). K.. J. In ECCV’10: Proceedings of the 11th European Conference on Computer vision: Part II (pp. 2009 (pp. (2012). UK: Cambridge University Press. G. From structure-from-motion point clouds to fast location recognition. recognition and tracking with invariant image features. Washington. 2161–2168). (2007). Fast image-based localization using direct 2D-to-3D matching. Scene modelling. 110–119). and Stewenius. (2004). Snavely. and Zisserman. (2004). G. Klein. Isard. Improving the agility of keyframe-based SLAM. 43). and Murray. DC: IEEE Computer Society. E. An efficient solution to the five-point relative pose problem.: British Machine Vision Association. 71–80). Scene reconstruction and visualization from Internet photo collections. (2011). N. 26(6). 5303 LNCS. IMU and GPS for large scale augmented reality.. International Journal of Computer Vision. D. T. 91–110. A. L. Skrypnyk. Washington. Robust Statistics. R. NJ: Pearson/Prentice Hall. pp. DC: IEEE Computer Society. and Murray.. and Bischof. Washington. H. (2006). and Zisserman. Philbin. In CVPR 2009. Nistér. Zach. Samarasekera. DC: IEEE Computer Society. Parallel tracking and mapping for small AR workspaces. Sivic. IEEE Conference on Computer Vision and Pattern Recognition. DC: IEEE Computer Society. B. Dissertation. Full-3d edge tracking with a particle filter. Washington. pp. Washington DC: IEEE Computer Society. and Kumar. (2006). Lowe. U. Y. Video Google: A text retrieval approach to object matching in videos. IEEE Transactions on Pattern Analysis and Machine Intelligence. Scalable recognition with a vocabulary tree. (1981). Nistér. G. (2008). In CVPR’07. Berlin. Germany: Springer-Verlag.

3–12).). J. Ventura. Structure from motion in urban environments using upright Panoramas. Ventura.. Washington.194 Fundamentals of Wearable Computers and Augmented Reality Snavely. (2012b). and Hollerer. In ISMAR’12 Proceedings of the 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (pp. IEEE Transactions on Pattern Analysis and Machine Intelligence.. Morel. J. and Randall. 25(3). T. T. J. (2011). 17(2). N.-M. J.. Outdoor mobile localization from panoramic imagery. M. T. LSD: A fast line segment detector with a false detection control. 247–248). J. J. Washington. G. 722–732. Wide-area scene mapping for mobile visual tracking. (2012). and Hollerer. 147–156. CA: University of California. ACM Transactions on Graphics (TOG)—Proceedings of ACM SIGGRAPH 2006. Photo tourism: Exploring photo collections in 3D. Santa Barbara. 835–846. DC: IEEE Computer Society. Ed. Ventura. 32(4). (2012a). S. R... (2006). Seitz. Jakubowicz. . Virtual Reality. Von Gioi. and Hollerer. G. (2010). Hollerer. Ventura. In ISMAR’11 Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Reality (pp. and Szeliski. R. Wide-Area Visual Modeling and Tracking for Mobile Augmented Reality (T. DC: IEEE Computer Society.

....... Their computing power has grown enormously and the integration of a wide range of sensors..................1......1.......... etc...... and Software Xin Yang and K........3 Pipelines and Methods................... A variety of radios including cellular broadband........................1 Marker-Based Methods....3........ interact with the Internet...................1 Sensor-Based Object Tracking................................................................ 223 References.. Challenges........ 221 9...3 MAR Apps Based on Hybrid Approaches..............1........................... for example....1 Computer Vision-Based MAR Apps............................................................ 216 9.......... 201 9. Wi-Fi.........1.3.4 Software Development Toolkits..........3. gyroscope................................................1 Visual Object Recognition and Tracking. Tim Cheng CONTENTS 9....................... 197 9......... and exchange their data with................. 199 9............................... 222 9.........1 APPLICATIONS In recent years. accelerometer..........206 9.................T......1 Applications...2 Sensor-Based MAR Apps..200 9.9 Scalable Augmented Reality on Mobile Devices Applications..3... GPS............ 198 9........ Methods. 196 9..... The connectivity of smartphones has also gone through rapid evolution..2 Challenges......................................................................................... has significantly enriched these devices’ functionalities.................................. Bluetooth...........2 Feature-Based Method........3 Hybrid-Based Recognition and Tracking..........2................. and NFC available in today’s smartphones enable users to communicate with other devices.. 223 9............................................................................................. compass... 195 9..3..............................3............................. mobile devices such as smartphones and tablets have experienced phenomenal growth................... 201 9........................... and run their computing tasks in the clouds.............................. 216 9................... These mobile 195 ................5 Conclusions...........................2 Sensor-Based Recognition and Tracking...................1...........

you could highlight any text or figure. which is automatically recognized and used for search on the Web. low-latency data networks.. tracking). . In the following section. With rapid advances in real-time object recognition and tracking capability.. videos.3.e. that can be used anytime and anyplace. and movie posters) and books. To date..  recognition). vision-based MAR apps can significantly improve the user experiences by overlaying related multimedia information or digital operations (e. and use vision-based recognition and tracking algorithms to identify physical objects and further link them to appropriate virtual objects. the user can see flowers that are not actually blooming when pointing the camera to a picture of bare branches or can link handouts with the augmented digital information.1. we highlight some exemplar vision-based MAR apps. and sounds related to the recognized object on the viewfinder in real time to provide the user an augmented sense that mixes reality and virtuality. a conventional visionbased MAR pipeline consists of three main steps: (1) deriving a set of features from a captured image frame and matching it to a database to recognize the object (i.g. For example.g. illustrating the rich and exciting opportunities in these areas. scalable MAR apps can be categorized into three classes according to the underlying techniques they rely on: computer vision-based. Such search results are then displayed on the viewfinder to facilitate the reading. interactive books). (2) tracking the recognized object from frame to frame by matching features of consecutive frames (i. or PROSAC. and hybrid. There are already a very rich collection of mobile AR (MAR) apps for visual guidance in assembly. mouse click) on paper-based advertisements (such as newspapers... Chum et al. Visionbased MAR technology can overlay extra 3D pictures. and (3) building precise coordinate transforms (e. pose estimation). RANdom SAmple Consensus [RANSAC]. In marketing and education.g. contextual informationaugmented personal navigation.1. interactive books. Fischler et al. which has the ability to identify objects and/or locations from a large database and track the pose of a mobile device with respective to the physical objects. 1981. and recognizing millions of worldwide places of interests and providing navigation or contextual information. and powerful multicore application processors (APs) can now run very sophisticated ­augmented reality (AR) applications. A scalable MAR system. vision-based MAR apps are becoming increasingly popular. social network. Broadly. Such a system can enable applications such as nationwide multimedia-enhanced advertisements printed on papers.. education (e. sensors. maintenance and training.196 Fundamentals of Wearable Computers and Augmented Reality handheld devices equipped with cameras.e. 9. etc.1  Computer Vision-Based MAR Apps Vision-based MAR apps rely on images captured by mobile cameras. Specially. Picture the following app for reading a book or newspaper: through the phone camera and touchscreen. we have seen vision-based apps permeating into marketing (e. can support a wide range of MAR apps. Regarding the latter example.g. Some details of visionbased MAR approaches will be presented in Section 9. etc. augmenting millions of book pages in a library. sensor-based. multimedia-augmented advertising). search engine.. and mobile health. multimedia-augmented advertising.e. handbills.. 2005) between the current image frame and the recognized object in the database (i.

zoom camera. Line.. 9. and tag them automatically. The obvious uses for such an app include backcountry pursuits. SocialCamera from Viewdle allows smartphone users to take photos with built-in. GPS. GPS. it will only take a few clicks to share tagged mobile photos with friends through Facebook. A wide range of MAR apps for social mining heavily rely on face recognition technologies. the app recognizes printed text and uses optical character recognition (OCR) to produce a snippet and then translate it into another language. compass heading. surveying. especially in the application area of personal navigation. Spyglass is a similar app for outdoor activities. the key challenge for this app is the ability to estimate object volume (the serving size for nutrition analysis). Google Goggles is a MAR app that conducts searches based on pictures taken by the phone. Using this information. a picture of a movie poster to view reviews or to find tickets at nearby theaters. and gyroscope) to identify and track the geographical position of a mobile device and its orientation. For instance. and then display the information using overlaid text and charts on the viewfinder. points of interests.. attitude. For its image-based translation capability. An exemplar navigation app for outdoor activity is Theodolite which utilizes a phone display as a viewfinder to overlay GPS data (coordinates and elevation). and inclinometer. Its visual search relies on real-time image recognition to identify objects in the picture as the starting point for search. specific contextual information. map. sensor-based MAR apps have become increasingly important and are gaining popularity.g. as well as search and rescue. such as the route to a destination. 2012) from FXPAL.1. With more types and more accurate sensors integrated into each new generation of smartphones. It uses face recognition technology to identify people among the friends in your face database. etc. nearby shops.Scalable Augmented Reality on Mobile Devices 197 systems that facilitate the development of scalable mobile-augmented papers can be found in EMM (Yang et al. It serves as a compass. often referred to as query-by-image. tracking the trajectory of objects. For example. 2010) and Mixpad (Yang et al. a picture of a product’s barcode or a book cover to search for online stores selling the product/book. 2008). Such a query-by-image capability allows users to search for items without typing any text. or a picture of a restaurant menu in French for translation to English.b). can be brought out from a database and overlaid on top of the real-world scene displayed on the viewfinder. accelerometer. It overlays positional information . There already exist several vision-based MAR apps for mobile health. or email. Therefore. While object recognition for most food items are quite feasible. compass. and Mobile Retriever (Liu et al. instant tagging. and time data. NFC. landscaping. FACT (Liao et al. skiing. a user can take a picture of a famous landmark or a painting to search for information about it. analyze their calories and nutrition data. MMS. These apps could recognize food objects on the camera viewfinder. fishing or boating navigation. rangefinder. 2011a.2 Sensor-Based MAR Apps Sensor-based MAR apps rely on the input from sensors (e. Accurate estimation would require knowing the distance between the phone camera and the object.

vision-based method can achieve much better accuracy at a cost of high computational and memory complexity. Navigation thus takes place in real time in the smartphone’s live camera image.1. and inclinometer. 2002. This app solves a key problem that existing navigation systems have—the driver no longer needs to take his/her eyes off the road when looking at the navigations system. 2004. or astronomers. Using this app. Other sensor-based MAR apps which serve for navigation focus on the scenario of car driving. Wikitude Drive is a MAR navigation system for which computer-generated driving instructions are drawn on top of the reality—the real road the user is driving on. 9. InterSense (Naimark et  al. compass. 2007. which offer highly accurate positioning.198 Fundamentals of Wearable Computers and Augmented Reality using available sensors of the phone and could be used as a waypoints tool. Ribo et al. its accuracy is usually low due to the low-cost inertial sensors used in mobile handhelds. However. requiring few complex calculations. On the other hand. or finding constellations. an interactive astronomy guide offering AR of the sky with the actual sky outside. rangefinder. the sky shown on the display will move along to match the physical sky. Uchiyama et al. Sensor-based method can obtain geographical locations and relative pose of a mobile device from built-in inertial sensors. Therefore. This allows for pinpoint precision for tracking satellites. . The user can also search for an object—Jupiter. or a specific constellation. part-time star gazers. to have it instantly come up on the viewfinder. In addition to navigation. Other similar apps include Route 66 Maps + Navigation that provides an amalgamation of comprehensive 3D maps and AR navigation to bring fun and informative experience to drivers and Follow Me that can trace the user’s exact route on road. backed with real-time graphics and a virtual car that leads the user all the way to the destination. users can align the physical sky to the sky shown on the display of the phone/tablet. with the help of the accelerometer and gyroscope sensors. finding stars. sextant. Finding celestial objects becomes easy—simply moving the viewfinder over the object in the real-world view and clicking on it. As the user adjusts the orientation of phone/tablet toward the sky. offering an attractive educational tool for students. A famous example is Star Walk. Reitmayr et al. Then the app will guide the user to adjust the orientation of the device to match it up with the object in the real sky.3 MAR Apps Based on Hybrid Approaches Sensor-based and vision-based methods have distinct advantages and disadvantages. There are several other examples using magnetic and gyro sensors to stabilize the tracking systems (Jiang et al. speedometer. it is attractive for any mobile platform with limited computing and memory resources. For instance. Many scalable MAR apps that demand both high accuracy and high efficiency employ hybrid methods which combine the complementary advantages of both approaches. 2002). a satellite. 2002) is a good example which uses a sensor-based inertial tracker to predict the potential positions of markers and then leverages vision-based image analysis for the candidate regions to refine the results. sensor-based MAR is also applicable to educational applications.

The total amount of memory usage for an indexing structure usually grows linearly with the number of database images. sending the captured image or processed image data (e. For most algorithms. That is. or a large size (e. For a database of a moderate size (e.g. image features) to a server (or a cloud) via the Internet. The design objectives of modern mobile APs require more than just performance. 3. In addition.. the entire indexing structure of a database needs to be loaded and reside in main memory. Priority is often given to other factors such as low power consumption and a small form factor.g.and entry-level phones is even smaller. 2008). and then sending the estimated pose and associated digital data back to the mobile device.g. millions of images). while providing acceptable network access speed for most apps. can help speed up processing via parallel computing (Cheng et al. While Wi-Fi is a built-in feature for almost all mobile devices. the indexing structure itself could easily exhaust memory resources.Scalable Augmented Reality on Mobile Devices 199 9. Although the performance of mobile CPUs has achieved greater than 30× improvement within a short period of recent 5 years (e. cannot support real-time responses for apps demanding a large amount of data transfer. advanced mobile broadband networks still have limited availability in areas not having dense populations. ARM 11 single-core in 2009). which have been built into most APs. today’s mobile CPU cores are still not powerful enough to perform computationally intensive vision tasks such as sophisticated feature extraction and image recognition algorithms. such as Samsung Galaxy S5. Several scalable MAR systems employ the client-server model to handle large databases. performing object recognition and pose estimation on the server side. is limited to 2 GB of SDRAM and the memory size of mid. sensor-based methods can achieve good efficiency but its performance is often limited by . These networks. As mentioned in previous section. 2. For connection to data networks. scalable MAR remains very challenging due to the following reasons: 1. ARM quadcore Cortex A-15 in 2014 vs. In order to realize efficient object recognition.g. but most feature extraction and recognition algorithms are designed to be executed sequentially and cannot fully utilize GPU capabilities. This level of memory sizes is not sufficient for performing local object recognition using a large database. The memory of today’s high-end smartphones. neither anytime. Graphics processing units (GPUs). tens of thousands of images). Terriberry et al.. Moreover. it is very challenging to achieve both good accuracy and high efficiency. and 4G.. today’s mobile devices rely on a combination of mobile broadband networks including 3G. 2013..5G. connection to high-bandwidth access points is still not available anyplace.2 CHALLENGES Despite advances in computer vision and signal processing algorithms as well as mobile hardware. mobile devices have less memory and lower memory bandwidth than desktop systems.

limiting their applicability for apps demanding high recognition rate and tracking accuracy. As a result. In order to do this. . the system identifies objects/ locations of interests captured by the mobile device. temperature. the direction) captured from a mobile handheld device.1  A general pipeline for scalable augmented reality on mobile devices. After recognition. On the other hand.. sensor-based approaches do not provide information about the objects in the camera picture. 9. an image.. Figure 9. a MAR system needs to determine the location and orientation of the mobile device.. . audio data location. vision-based methods usually require significant computation and memory space to process the image which consists of a large number of pixels. The recognizer and the Visual data. altitude. A hybrid approach that integrates vision-based and sensor-based methods can potentially combine their complementary advantages. calculate the relative pose (location and orientation) of the device in real time. pressure. efficiency. and robustness is not a trivial task at all.. The identification process is usually conducted by processing the captured physical data to generate a description delineating the real-world scene. Reality data Cloud Database Object/location identification Identified result Augmented digital data Pose tracking Associated data Relative pose Overlaying virtual data on reality Final scene Cloud FIGURE 9. the system needs to know where the user is and what the user is looking at. More specifically. The database object which best matches the feature of the captured object is considered as the recognized object and an initial pose is generated. the geographical location. designing a fusion solution that optimizes accuracy. and then render virtual objects in the correct place.. lighting.1 illustrates a general pipeline of a scalable MAR system. Given the physical data (e. .200 Fundamentals of Wearable Computers and Augmented Reality the low precision of sensors used in mobile devices. however.3  PIPELINES AND METHODS Augmented reality links proper context with a real-world object/location and adjusts the pose of the virtual data so as to render it at a correct position and with a correct orientation on a real-world image. not to a specific object. to match against a large database and to estimate geometric transformation between the object within a captured image and the recognized database object.g. direction. and then matching the description to a large database. In addition. the presented information could only be related to the direction and position of the device. the movement of the recognized object is tracked.

computer vision. In the following section. The corresponding object of the best match in the database is then reported as the recognized object. a marker-based MAR system consists of three key components. and machine learning have developed a considerable number of object recognition and tracking methods. and tracking of the marker under various imaging conditions.Scalable Augmented Reality on Mobile Devices 201 tracker of AR are executed alternatively to complement each other: the recognizer is activated whenever the tracker fails or new objects/locations occur and the tracker bridges the recognized results and speeds up the process by avoiding unnecessary recognition task that is more computationally expensive than tracking. pattern recognition. These methods use a prebuilt database containing precomputed features for all objects of interests. Therefore. visual features extracted from consecutive frames are derived and matched to track the movement of the object from frame to frame.2: (1) marker detection in which regions that are likely to .1  Marker-Based Methods A good marker design should facilitate a quick and reliable detection. available local storage resources. This is because a system needs at least four pairs of corresponding points between two detected markers from two image frames to estimate the camera pose. an AR system can recognize the associated object and obtain the correct scale and pose of the camera relative to the physical object. Some of successful ones for AR rely on predefined markers that consist of easily detectable patterns.1. as shown in Figure 9. Typically. and tracking the marker using image analysis techniques. most existing MAR systems leverage blackand-white and square-shaped markers for recognition and tracking tasks. are not mutually exclusive and thus can be combined. Previous studies have also concluded that a square is the simplest and most suitable shape for a maker. the system extracts the same type of features from the image and then matches the feature to database features. identifying. For each image frame captured by a camera. 9. we describe the procedure pipeline for each category and give an overview of state-of-the-art methods for each step in the pipeline. methods based on visual object recognition and tracking are of special interests in MAR. identification. and performance requirement. Databases that contain objects for recognition and associated digital data can be stored either in local storage space or in the cloud.3. Marker-based methods and feature-based methods. Another category of approaches for visual object recognition and tracking is based on visual feature extraction and matching. It has been shown that black-and-white markers are much more robust to various photometric changes and background clutters than chromatic markers. depending on the size of databases. Researchers in image processing. For tracking the recognized object. By detecting. Four corner points of a square are sufficient for homography estimation and can be reliably detected as intersections of edge lines. In a marker-based MAR system.1  Visual Object Recognition and Tracking As a camera is a built-in component in most MAR systems. and in this section we focus our discussion on this type of markers.3. each object of interest is associated with a particular marker. having distinct advantages and limitations. 9.

. and Laplacian operator (Haralick 1984). Popular edge detection operators include Canny edge operator (Canny 1986). edge detection is performed on the grayscale image to obtain a list of edge segments. Line fitting is then conducted based on the detected edges. First. which detects intersections of two lines as potential corners. its histogram should be bipolar. regions which are defined by four straight connecting lines and consist of four corners are considered as potential marker candidates.e. Candidates that do not pass the verification procedure are removed as false positives to avoid unnecessary processing for nonmarker regions in the following identification and tracking steps. Another fast verification criterion is based on the histogram of a region. and (3) marker tracking and pose estimation in which four pairs of corner points are used to estimate the homography transformation. imperceptive markers) FIGURE 9.1.1  Marker Detection The goal of marker detection is to find positions of markers. Finally. Based on this criterion. 1992). to delineate the ­boundary of each marker. this region is rejected as a false positive. (2) marker identification in which candidate marker regions are verified and recognized. edges between white and black cells).3. . A simple yet effective verification scheme is based on the size of a candidate marker region. since small regions are either false positives or true markers but are too far away from a camera to achieve reliable pose estimation. As a maker consists of black-and-white colors. Sobel operator (Gonzalez et  al.1. be markers are localized. A  basic marker detection procedure is illustrated by blue blocks in Figure 9. 2D barcode markers have a number of sharp edges inside a marker (i. In addition. we could easily remove false positives which have a relatively uniform histogram. depending on the particular appearance of a marker. that is. 9.2. Then. Second. a system could quickly reject obvious nonmarkers with high confidence. candidate markers are verified by some effective and fastto-compute criteria. For instance.202 Fundamentals of Wearable Computers and Augmented Reality Marker detection Color to gray image conversion Edge detection Line fitting and corner detection Candidate markers detection Captured image Recognized object and camera pose Postverification Detected markers Marker tracking and pose estimation Marker identification (Marker template. rejecting small regions with a limited number of pixels. and to localize corner points of markers in an image. 2D barcode. an RGB image is converted into an intensity image. A heuristic yet efficient criterion is to examine the frequency of intensity changes in two perpendicular directions: if the frequency is below a predefined threshold.2  Illustration of marker-based MAR pipeline.

One major limitation of template matching is that a system needs to match a marker candidate region against all marker templates in the database.3. If the highest similarity value is greater than a threshold. 1988). digital interactions. (Ashby et al. 90°. otherwise. in which each element is linked to corresponding virtual data. it is unrecognized and the system rejects it. For each orientation. the rectified and scaled marker is rotated into four different orientations (0°. Techniques used for marker identification depend on the marker type. Identification of a template marker is typically based on template matching. (a) Template marker. 180°. Specifically.2  Marker Identification A scalable MAR system utilizes a large database. Barcode markers: (b) QR code. etc. After that. comparing the marker image captured from the detection process to a database consists of all marker templates. an overall best match in the database with the highest similarity value to the rotated marker is identified.1. Two popular types of markers widely used for AR are template-based markers and 2D barcode markers. the region is considered as a recognized marker. and 270°). Template markers are black-and-white markers which have a simple image inside a black border (as shown in Figure 9.3  Illustration of 2D markers. (c) DataMatrix.1. The goal of marker identification is to find a matched element from the database for a captured marker so that the system knows which virtual data should be overlaid on top of the current real-world scene. etc. a marker region is first cropped from the captured image and then it is rectified to be a square and scaled to the same size as marker templates in the database.203 Scalable Augmented Reality on Mobile Devices 9. such as the sum of squared differences (SSD). that is. mutual information. Several similarity metrics have been proposed for matching. Matching each (a) (b) (c) (d) FIGURE 9. and (d) PDF417. .3a).

A single QR code symbol can contain up to 7089 numeric characters. the total runtime for template matching is nontrivial and grows linearly with the number of templates in the database. A system identifies a 2D barcode marker by decoding the encoded information in it. the decoding process is performed by sampling the pixel values from the calculated center of each cell.3d) was developed in 1991 by Symbol (recently acquired by Motorola). Popular 2D barcode standards include QR code (Information Technology 2006a). However. QR code (Figure 9. the number of markers that can be correctly and reliably detected could be even smaller than this number. and then resolves the cell values using them. or 1108 bytes. PDF417 (Figure 9.3b) is a 2D barcode created by the Japanese corporation DensoWave in 1994. To speed up the matching process. the maximum number of distinct patterns that can be generated is (16 × 16)2 = 65. QR code is flexible and has large storage capacity. The DataMatrix can encode up to 3116 characters from the entire ASCII character set with extensions. QR codes are very suitable for large-scale MAR apps. The exact data capacity depends on the structure of the data to be encoded. template markers are not suitable for scalable MAR apps which use a large database. the scalability and accuracy of a MAR system could be greatly restricted by the size of quantized markers.204 Fundamentals of Wearable Computers and Augmented Reality pair of marker images involves processing of all corresponding pixels of the two images. The ratio of the widths of the bars (or spaces) to each other encode the information in a PDF417 symbol. As a result.536.. this is due to the internal data compression algorithms used during coding. In other words. 2953 bytes of binary data. The resolved cell value can be either interpreted as a binary number (i. the upper bound on the number of distinct markers is limited to 65. markers are often down-sampled to a small size. DataMatrix (Figure 9. there are many other standards (e. that is. 2710 digits. DataMatrix (Information Technology 2006b). 2D barcode markers are markers consisting of frequently changed black-andwhite data cells and possibly a border or other landmarks (as shown in Figure 9. SPARQCode) that might be used for tracking in some applications too. Due to these limitations. ASCII characters) via a database. Therefore.536 (each pixel can be either 1 or 0). due to the photometric changes. Typically. which were originally developed for logistics and tagging purposes but are also used for AR apps. the printing accuracy and a suitable printer resolution are important for high-quality PDF417 . For a quantized image size of 16 × 16 inside a marker. to reduce the amount of pixels needed to be compared. In practice. 4296 alphanumeric characters.e.g.3c) is another popular barcode marker which is famous for marking small items such as electronic components. A single PDF417 symbol can be considered multiple linear barcode rows stacked above each other.3b through d).g. as the code is intended for high-speed decoding. 16 × 16 or 32 × 32.. prohibiting its usage for a large database. 0 or 1) or can link to more information (e. A single PDF417 symbol can theoretically hold up to 1850 alphanumeric characters. QR is the abbreviation for Quick Response. MaxiCode. inaccuracy in the detection process and other noise sources. and PDF417 (Information Technology 2006c). QR code became popular for mobile tagging applications and is the de facto standard in Japan. Aztec Code. For that reason. In addition to these three standards that will be briefly described in the following section. or 1817 Kanji characters. The DataMatrix barcode is also used in mobile marketing under the name SemaCode..

We define a transformation T between a camera and a marker as • X — •r11 r12 ³ µ ³ •x— ³Y µ ³r21 r22 ³ µ ³yµ = T ³ µ = ³ ³ Z µ ³r31 r32 ³ µ ³ µ ³ ³–1 µ˜ ³–1 µ˜ ³ 0 0 – r13 t x — • X — µ³ µ r23 t y µ ³Y µ µ ³ µ (9. while the slower speed and greater memory usage could be two major issues. The pose of a camera relative to a marker in the real scene can be uniquely determined from a minimum of four corresponding points between the marker in the real scene and the marker on the camera image plane.Scalable Augmented Reality on Mobile Devices 205 symbols. Marker-based object tracking uses the four corners of a square marker.3.1. Moreover. Virtual objects should move and change its pose accordingly with the movement of a mobile camera. 9. which can be reliably detected.e.g.4 Discussions 2D barcode identification directly decodes the information in a marker without demanding enormous amount of computations for image matching. Therefore. they are visually obtrusive and for some outdoor scenarios (e. Furthermore.1.3  Marker Tracking The main idea of AR is to present virtual objects in a real environment as if they were part of it. However. Thus. marker-based methods are sensitive to occlusion. for this purpose. . landmarks) attaching markers to objects is not feasible.1. This also makes PDF417 the least suitable for AR applications where the marker is often under perspective transformation.3. marker-based tracking only needs to detect four corners of a marker and estimate the camera pose according to Equation 9. camera location and orientation) in real time is required in order to render the virtual object in the right scale and perspective..1. Tracking the camera pose (i. Note that the four points used for determining the camera pose need to be coplanar but noncollinear. the system can keep tracking the marker on the image plane by constructing corner correspondences between consecutive frames and computing the transformation matrix between two frames based on the correspondences.1. These limitations may lead to a poor user experience. the barcode marker-based method is time efficient so as to provide real-time performance for many MAR apps.1) r33 t z µ ³ Z µ µ³ µ 0 1 µ˜ ³–1 µ˜ where [X Y Z 1]T is a homography representation for a marker corner’s coordinates in the earth coordinate system [x y 1]T is its projected coordinates on the image plane Once an initial camera pose is obtained.. 9. In addition. Featurebased methods can overcome these limitations. barcode markers need to be printed on or attached to objects beforehand for association with specific contents. 2D barcode markers have a large storage capacity and thus can support applications which require high scalability.

1  Local Feature Extraction The efficiency. . x2. image registration. (d) and (e) are SURF box-filter approximation for Lyy and L xy respectively. and present their latest advances for scalable MAR. local feature extraction is performed for every database image. The database image which has the most matching features with the capture image is considered as the recognized target. In comparison with a global feature representation. An indexing structure which encodes feature descriptors of all database images is constructed. and background clutters. (b) and (c) are the discretized and cropped Gaussian second-order partial derivative in the y-direction and the xy-direction. Corresponding matches are used to track the movement of cameras between frames. it has been demonstrated that local features are more robust to various geometric and photometric transformations. to ensure a more satisfactory user experience. we review state-of-the-art methods for each step.1. An initial camera pose is then estimated based on the corresponding matches using RANSAC or PROSAC algorithms.1. Different from the conventional global feature extraction which generates a single feature vector for an entire image. tracking. and distinctiveness of local feature representation significantly affect the user experience and scalability of a MAR system. robustness.. including object recognition. which selects a set of salient points in an image.4a) have been used for various computer vision apps. and (2) interest point description. many existing MAR systems choose a local feature representation for object recognition and tracking. xd] (a) (b) (c) (d) 1 –1 –1 1 (e) FIGURE 9. Local feature extraction typically consists of two steps: (1) interest point detection. 9. However. In this part. etc. As a result. also referred to as local feature description. respectively. In the following. we review relevant work about interest point detection and description.. also referred to as local feature detection. local feature extraction generates a set of highdimensional feature vectors for an image.. local features of a captured image is first extracted. . there is an enormous breadth and Lyy Local feature example Lxy Dyy Dxy 1 –2 1 X = [x1. occlusion.206 Fundamentals of Wearable Computers and Augmented Reality 9. local features between consecutive frames are compared. In the tracking phase. In the recognition phase.2  Feature-Based Method Local features (an example shown in Figure 9. A typical flow for a local feature-based MAR is as follows. In the offline phase.3.3.2. which transforms a small image patch around a feature point into a vector representation suitable for further processing. each of which is then used to query the database using the indexing structure for finding a matching local feature in the database.4  (a) An exemplar image overlaid with detected local features.

2008. 2006. The final set of feature points is determined after applying a nonmaximum suppression step (i. and (4) quantity. (2) distinctiveness. proposed by Rosten et al. A good detector should provide points that have the following properties: (1) repeatability (or robustness). the high-quality detector. However.Scalable Augmented Reality on Mobile Devices 207 amount for results in this field. The basic idea of FAST is to compare 16 pixels located on the boundary of a circle (radius is 3) around a central point. On the other hand. and algorithm adaptation. (2008). if the response value of a point is the local maximum within a small region. Some recent efforts. With limited space. it requires pose verification to exclude false matches in the matching phase that often incurs a nontrivial runtime. that is. 9.. Lightweight detector: FAST. this point is considered as a feature point). the computational complexity of these detectors is usually very high. given two images of the same object under different image conditions. the neighborhood of a detected point should be sufficiently informative so that the point can be easily distinguished from other detected points.1. However. 2012a). Since the FAST detector only involves a set of intensity comparisons with little arithmetic operations. Some lightweight detectors (Rosten et al. that is. Lowe 2004) have been developed with a primary focus on robustness and distinctiveness.2. several high-quality feature detectors (Bay et al.1. have been made to adapt these feature detection algorithms onto mobile devices and optimize their performance and efficiency for MAR. . we review only the most representative work for the lightweight detector. a high percentage of points on the object can be visible in both images.3. making them inefficient on a mobile device. it is highly efficient. (2006). so that it can be recognized even under partial occlusion. Due to space limitation. These detectors’ ability to accurately localize correct targets from a large database makes them suitable for large-scale object recognition. A wide variety of interest point detectors exist in the literature. for example (Yang et al. we can only afford reviewing a small subset of representative results that are most relevant to the application of scalable MAR. A thorough survey on local feature-based detectors can be found in Tuytelaars et al. each pixel is labeled from integer number 1–16 clockwise: if the intensities of n (n ≥ threshold) consecutive pixels are all higher or all lower than the central pixel.1  Interest Point Detection  An interest point detector is an operator which attributes a saliency score to each pixel of an image and then chooses a subset of pixels with local maximum scores. has become popular recently due to its highly efficient processing pipeline.e. that is. then the central pixel is labeled as a potential feature point and n is defined as the response value for the central pixel. The FAST (features from accelerated segmented test) detector. (3) efficiency. the performance of these detectors is relatively poor. 2006) aim at high efficiency to target applications that demand real-time performance and/or mobile hardware platforms that have limited computing resources. As a result. the detection in a new image should be sufficiently fast to support time-critical applications. a typical image should contain a sufficient number of points to cover the target object. that is.

Specially. High-quality detector: SURF. σ): � L xx ( X . An image patch around each point is rotated to its dominant orientation before computing a feature descriptor. proposed by Bay et al. the entire orientation space is quantized into N histogram bins. σ) H ( X . a SURF detector approximates the Gaussian second-order partial derivatives with a combination of box filter responses (see Figure 9. Dxy. and Dyy and accordingly the approximate Hessian determinant is det( H approx ) = Dxx Dyy − (0. To achieve scale invariance.3) A SURF detector computes Hessian determinant values for every image pixel i over scales using box filters of a successively larger size. SURF relies on gradient histograms to identify a dominant orientation for each detected point.9 Dxy )2 (9. Rublee et  al. 2006. the dominant orientation of a SURF detector is computed as follows. Finally. The approximated derivatives are denoted as Dxx. The SURF (Speeded Up Robust Feature) detector. yielding a determinant pyramid for the entire image. σ) = � �� L xy ( X . Rublee et al. σ) is the convolution of the Gaussian second-order derivative in x direction. σ) � � (9.4b and c). 2008). FAST could incur large responses along edges. σ) �� where X = (x. y) is a pixel location in an Image I σ is a scale factor L xx(X. employed a Harris corner measure to order the FAST feature points and discard those with small responses to the Harris measure. . the bin with the largest responses is utilized to calculate the dominant orientations of interest points. SURF maps it to the corresponding histogram bins and adds its gradient response to these bins.208 Fundamentals of Wearable Computers and Augmented Reality The FAST detector is not invariant to scale changes. Then SURF computes gradient responses of every pixel in a circular neighborhood of an interest point. (2006. 2008). (2011) proposed to employ a scale pyramid of an image and detect FAST feature points at each level in the pyramid. 1998). computed using the integral image technique (Simard et al. Based on the gradient orientation of a pixel. each of which represents a sliding orientation window covering an angle of π/3.2) L yy ( X . similarly for Lyy and L xy (see Figure 9. is one of the most popular high-quality point detectors in the literature. To speed up the process. Then it applies a 3 × 3 × 3 local maximum extraction over the determinant pyramid to select interest points’ locations and corresponding salient scales. leading to a lower repeatability and distinctiveness compared to highquality detectors such as SIFT (Lowe 2004) and SURF (Bay et al. To address this limitation. σ) L xy ( X .4d and e). First. It is scale-invariant and based on the determinant of the Hessian matrix H(X. To achieve rotation invariance.

leading to cache misses and cache line replacements and. SURF point detection involves much more complex computations and. it cannot match SURF’s robustness and distinctiveness. pixels in a single sliding window reside in multiple memory rows (illustrated in Figure 9. yielding a 4× speed gap. There are several techniques for improving SURF’s efficiency by exploiting coherency between consecutive frames (Ta et al. and propose a set of techniques to adapt the SURF algorithm to a mobile platform. incurring expensive memory access. respectively. in turn.5a). Running a FAST detector takes 170 ms on a Motorola Xoom1 and 40 ms on an i5-based laptop. 2012a) is to analyze the causes for a SURF detector’s poor efficiency and large overhead on a mobile platform. The runtime limitation of SURF is further exacerbated when running a SURF detector on a mobile platform. An interesting solution proposed recently (Yang et al. respectively. indicating a 15× speed gap. A SURF detector relies on an integral image and accesses it using a sliding window of successively larger size for different scales. . 2009). To identify a dominant orientation. But a 2D array is stored in a row-based fashion in memory (cache and DRAM).1 Comparison of FAST and SURF Detector on Mobile Device and PC Time Detector FAST detector SURF detector Mobile Device (ms) PC (ms) Speed Up 170 2156 40 143 4× 15× Comparing to FAST. a SURF detector analyzes gradient histogram. two mismatches between the computations used in existing SURF algorithm and common mobile hardware platforms are identified as the sources of significant performance degradation: • Mismatch between data access pattern and small cache size of a mobile platform. The data cache size of a mobile AP. running a SURF detector on them takes 2156 and 143 ms. • Mismatch between a huge amount of data-dependent branches in the algorithm and high pipeline hazard penalty of the mobile platform. is much slower than FAST. typically 32 kB for today’s devices. thus. Algorithm adaptation: Accelerating SURF on mobile devices. 2008). is too small to cache all memory rows for pixels involved in one sliding window. As a result. it usually fails to achieve satisfactory performance for MAR apps that demand high recognition accuracy from a large database and/or handling content with large photometric/geometric changes. Specially. employing GPUs for parallel computing or optimizing various aspects of the implementation (Terriberry et  al. However.1 compares the runtime performance of a FAST detector and a SURF detector running on a mobile device (Motorola Xoom1) and a laptop (Thinkpad T420).209 Scalable Augmented Reality on Mobile Devices TABLE 9. Table 9. Although FAST detector is more efficient than SURF. not in a window-based fashion.

a tiled SURF was proposed in Yang et al. The first solution is to use an alternative implementation..5b) and performs point detection for each tile individually to exploit local spatial coherences and reduce external memory traffic. that is. every pixel around an interesting point is mapped to corresponding histogram bins via a set of branch operations. The total number of pixels involved in this analysis is huge. respectively (Yang et  al.3 compare the runtime cost and the Phone-to-PC runtime ratio between the original SURF and adapted SURF. the entire process involves an enormous amount of data-­dependent branch operations. 2012a). instead of using If-then-Else expressions. while in tiled SURF. . Each color represents data stored in a unique DRAM row. two solutions were proposed in Yang et al. To avoid pipeline hazards penalties. yielding significant performance degradation. (2013a). In the original SURF. Thus. If-thenElse expressions. The second solution is to replace the original gradient histogram method with a branching-free orientation operator based on gradient moments (i. Tables 9. that is.5  Illustration of data locality and access pattern in (a) the original SURF detector and (b) the tiled SURF. GMoment) (Rosin 1999). it incurs high pipeline hazard penalties. but trades memory for speed. However. Consequently. the branch predictor and the speculation of out-of-order execution of an ARM-based mobile CPU core are usually not as advanced as that of a laptop or desktop processor. a sliding window needs to access multiple DRAM rows. During this analysis.e. This solution does not change the functionality and other computations. leading to frequent cache misses. To address the problem caused by the mismatch between data access pattern of SURF and the small cache size of a mobile CPU. a lookup table is used to store the correlations between each orientation and the corresponding histogram bins. which divides an image into tiles (illustrated in Figure 9.2 and 9. (2013a) to remove data-dependent branch operations. The gradient moment–based method may slightly degrade the robustness of a SURF detector but can greatly improve the speed on mobile platforms. all required data within a sliding window can be cached.210 Fundamentals of Wearable Computers and Augmented Reality Locality based on original SURF Locality based on tiled SURF Sliding window Sliding window (a) Image tile (b) FIGURE 9.

Phone-to-PC ratio = Runtime on mobile platform (9. It reduces the Phone-to-PC ratio by 12. The third to fifth rows of Tables 9.4.2 and 9.3 compare the runtime cost and the Phone-to-PC ratio of upright SURF (U-SURF) without and with tiling.3 compare the results of oriented SURF (O-SURF) with branch operations. So alleviating this problem is critical for performance optimization when porting algorithms to a mobile CPU.2 and 9. an HTC Thunderbolt which uses a Scorpion processor. respectively.9% on the three devices. is the runtime of a program running on a mobile CPU divided by that on a desktop CPU.5%–42. tiling can greatly reduce runtime cost by 29%–47%.3 Speed Ratio Comparison on Three Mobile Platforms Phone-to-PC Ratio (×) U-SURF U-SURF tiling O-SURF O-SURF lookup table O-SURF GMoment O-SURF Tiling + GMoment Droid Thunderbolt Xoom1 20 14 54 18 19 13 8 7 17 7 8 7 7 4 15 6 7 3 The Phone-to-PC ratio. defined in Equation 9.2 Runtime Cost Comparison on Three Mobile Platforms Time (ms) Droid Thunderbolt Xoom1 U-SURF U-SURF tiling O-SURF O-SURF lookup table O-SURF GMoment O-SURF Tiling + GMoment 1310 930 7700 4264 1516 1053 525 356 2495 1820 613 404 461 243 2156 1178 519 269 TABLE 9. O-SURF using a lookup table and using GMoment (Rosin 1999).211 Scalable Augmented Reality on Mobile Devices TABLE 9. Results show that using a lookup table or using the GMoment . The first two rows of Tables 9. The reduction in Phone-to-PC ratio indicates that the mismatch between data access pattern and a small cache size of a mobile CPU causes more severe runtime degradation on mobile CPUs than desktop CPUs.4) Runtime on x86-based PC The evaluation experiments were performed on three mobile devices: a Motorola Droid which features an ARM Cortex-A8 processor. As expected. which reflects the speed gap between them. and a Motorola Xoom1 which uses a dual-core ARM Cortex-A9 processor.

In the past decade. The last rows of Tables 9. BRISK (Leutenegger et al. and LDB. In the following section. the two adaptations can reduce the runtime on mobile platforms by 6×–8×. it is still not sufficiently fast for real-time applications running on a mobile device. 2011). 9. thus making feature extraction a truly affordable processing step. most mobile GPU cores do not support CUDA and thus porting an implementation from desktop-based GPUs to mobile GPUs remains a tedious task. SURF: Speeded Up Robust Features. ∑|d x|.212 Fundamentals of Wearable Computers and Augmented Reality method can greatly reduce the overall runtime and the Phone-to-PC ratio on three platforms. The SURF descriptor aims to achieve robustness to lighting variations and small positional shifts by encoding the image information in a localized set of gradient statistics.3. BIREF. proposed by Calonder et  al. 2014a. each image patch is divided into 4 × 4 grid cells. In addition. and LDB (Yang et al. FREAK (Alahi et al. ∑dy. The reduction in the Phone-to-PC ratio further confirms that branch hazard penalty has a much greater runtime impact on a mobile CPU than on a desktop CPU. The BRIEF ­descriptor.2  Local Feature Description  Once a set of interest points has been extracted from an image. However. SIFT and SURF are high-dimensional real-value vectors which demand large storage space and high computing power for matching.2 and 9. SIFT and SURF have successfully demonstrated their good robustness and distinctiveness in a variety of computer vision applications. Recently. 2008) and its variants rBRIEF (Rublee et al. This GPUSURF implementation has been reported to perform feature extraction for a 60 × 480 image at a frame rate upto 20 Hz. Choosing proper implementations or algorithms to avoid such penalties is critical for a mobile task. and a small footprint for storage. the booming development of real-time mobile apps stimulates a rapid development of binary descriptors that are more compact and faster to compute than SURF-like features while maintaining a satisfactory feature quality. In each cell. 2012b. The firstorder derivatives d x and dy can be calculated very efficiently using box filters and integral images.b). 2012). we review three representative descriptors: SURF. the most popular choices for this step have been the SIFT descriptor and the SURF descriptor. to date. The basic idea of BRIEF is to directly generate bit strings by simple binary tests comparing .2. Notable work includes BRIEF (Terriberry et al. Motivated by the success of SURF. Specifically. the computational complexity of SIFT is too high for real-time application with tight time constraints. (2008) that takes advantage of the computational power available in current CUDA-enabled graphics cards. a further optimized version has been proposed in Terriberry et al. SURF computes a set of summary statistics ∑d x. Despite that SURF accelerates SIFT by 2×–3×. However.1.1. BRIEF: Binary robust independent elementary features. and ∑|dy|. their content needs to be encoded in descriptors that are suitable for matching. resulting in a 64-dimensional descriptor. primarily aims at high computational efficiency for construction and matching.3 show the results with the application of both two adaptations to O-SURF: comparing to original SURF. 2011). (2010).

Lack of distinctiveness incurs an enormous number of false matches when matching against a large database. More specifically. x ) < I ( p. a binary test τ is defined and ­performed on a patch p of size S × S as �1 τ( p. y) (9. Specifically.Scalable Augmented Reality on Mobile Devices 213 pixel intensities in an image patch. y)location pairs uniquely defines the binary test set and consequently leads to an nd -dimensional bit string that corresponds to the decimal counterpart of ∑2 1≤i ≤ nd i −1 τ( p. 2011) are very efficient to compute.5) otherwise where I(p. y) = � �0 if I (p. yet offering greater distinctiveness. However. dx and dy.6) By construction. they provide a more complete description than other binary descriptors. the tests of Equation 9. LDB: Local difference binary. yi ) (9.g. To increase the stability and repeatability.. therefore. each of which compares the Iavg. the authors proposed to smooth pixels of every pixel pairs using Gaussian or box filters before performing the binary tests. RANSAC. x. Experimental results demonstrate that the tests that are randomly sampled from an isotropic Gaussian distribution—Gaussian (0. 1981) are usually required to discover and validate matching consensus. the resulting BRIEF descriptors are very sensitive to noises. and to match (simply computing the Hamming distance between descriptors via XOR and bit count operations). 2012. xi . these binary descriptors utilize overly simplified information. Choosing a set of nd (x. The spatial arrangement of binary tests greatly affects the performance of the BRIEF descriptor. Second. The high quality of LDB is achieved through three schemes. 1/25S2) where the origin of the coordinate system is the center of a patch—give the highest recognition rate. First.6a and b). only intensities of a subset of pixels within an image patch. the internal patterns of the image patch is captured through a set of binary tests. Fischler et al. Rublee et al. These runtime advantages make them more suitable for real-time applications and handheld devices. v)T. dx. In Calonder et  al. thus. the authors experimented with five sampling geometries for determining the spatial arrangement. to store. of grid cells within an image patch. and thus have low discriminative ability.  LDB . and dy of a pair of grid cells (illustrated in Figure 9. Expensive postverification methods (e. 2011. Local difference binary (LDB). The average intensity and gradients capture both the DC and AC components of a patch. Leutenegger et al. a binary descriptor. (2010).6 consider only the information at single pixels. that is. x) is the pixel intensity at location x = (u. LDB utilizes average intensity Iavg and first-order gradients. increasing the runtime of the entire process. achieves similar computational speed and robustness as BRIEF and other state-of-the-art binary descriptors. Binary descriptors such as BRIEF and a list of enhanced versions of BRIEF (Alahi et al.

this feature pair is considered a matched pair. optimizing the performance of LDB for a given descriptor length. (a) An image patch is divided into 3 × 3 equalsized grids. 2014b) to select a set of salient bits. Computing LDB is very fast.. . it is discarded as a false positive. and compare I. complying with a geometric model).214 Fundamentals of Wearable Computers and Augmented Reality Iavg1 = 50 Binary test dx1 = 0 dy1 = 0 Iavg2 = 150 dx2 = –32 dy2 = 64 (c) 1 0 Binary test on a pair of grids A patch with 3 × 3 gridding (a) 0 (b) A patch with multiple gridding FIGURE 9. 3 × 3. employs a multiple gridding strategy to encode the structure at different spatial granularities (Figure 9. Coarse-level grids can cancel out high-frequency noise while fine-level grids can capture detailed local patterns.6c).6  Illustration of LDB extraction.2. the similarity between a feature and its NN being above a predetermined threshold. The modified AdaBoost targets the fundamental goal of idea binary descriptors: minimizing distance between matches while maximizing them between mismatches.3. thus enhancing distinctiveness. dx and dy between every unique pair of grids. the average intensity and first-order gradients of each grid cell can be obtained by only four to eight add/subtract operations. LDB leverages a modified AdaBoost method (Yang et  al. and 4 × 4 grids) is applied to capture information at different granularities. (b) Compute the intensity summation (I). If a pair of NNs pass the verification criteria (i. Relying on integral images. The database object which has most matched features to the captured image is considered as the recognized object. otherwise.e. Third. a system matches each feature descriptors on a captured image to database features in order to find its nearest neighbor (NN). gradient in x and y directions (dx and dy) of each patch. 9. (c) Three-level gridding (with 2 × 2.1.2  Local Feature-Based Object Recognition To recognize objects in a captured image.

The key of LSH is a hash function. However. To improve the detection rate of NN search based on LSH. ensuring a satisfactory user experience and scalability for MAR apps. is a popular scheme which verifies local correspondences by checking their homography consistency. BoW: Bag-of-words matching. While multi-probe would result in more matching checks of database descriptors. To find the NN of a query descriptor. 2007). which is designed for general image-matching applications. In the query phase. we first retrieve its matching bucket and then check all the descriptors within the matched bucket using a bruteforce search. In their approach. several approaches have been proposed to compensate the loss of spatial information. each of which leverages a different hash function. Multi-table improves the detection rate of NN search at the cost of higher memory usage and longer matching time. sets of local features are bundled into groups by MSER (Matas et al. descriptors with a common sub-bit-string are casted to the same table bucket. geometric verification (Philbin et  al. hence it may greatly degrade the accuracy. is a widely used technique for approximate NN search. In order to enhance the accuracy for BoW matching. For binary features. it completely ignores the spatial information. BoW matching quantizes local image descriptors into visual words and then computes the image similarity by counting the frequency of co-occurrences of words. Typically. For example. that is. two techniques. determines the upper bound of the Hamming distance among descriptors within the same buckets. and robust geometric constraints are then enforced within each group. Two popular techniques that have been commonly used for large-scale NN matching are locality sensitive hashing (LSH) and bag-of-words (BoW) matching. the query descriptor is hashed into a bucket of every hash table and all descriptors in each of these buckets are then further checked for matching. LSH: Locality sensitive hashing. The multi-table technique stores the database descriptors in several hash tables.Scalable Augmented Reality on Mobile Devices 215 Fast and accurately retrieving the NN of a local feature from a large database is the key to efficient and accurate object recognition. LSH. 2006). BoW matching (Sivic et  al. In addition. namely multi-table and multi-probe. is another scheme to enforce geometric constraints for more accurate BoW matching. 1999). Multi-probe examines both the bucket in which the query descriptor falls and its neighboring buckets. are usually used. which considers approximate global geometric correspondences. Wu et  al. the hash function can simply be a subset of bits from the original bit string. it allows a larger key size and in turn smaller buckets and fewer matches to check per bucket. The scheme partitions the image . 2003) is an effective strategy to reduce memory usage and support fast matching via a scalable indexing scheme such as an inverted file. (Gionis et al. Spatial pyramid matching (Lazebnik et al. the hash key size. The size of the subset. it actually requires fewer hash tables and thus incurs lower memory usage. 2009) for partial-duplicate image detection. which is linearly proportional to the number of hash tables used. 2002) region detected regions. presented a bundling feature matching scheme (Wu et al. which maps similar descriptors into the same bucket of a hash table and different descriptors in different buckets.

their functionalities. The most common IMUs found in these smart devices today include accelerometers. .1. yielding an excessively long runtime. thus when applying them to MAR.2 Sensor-Based Recognition and Tracking Sensor-based method typically leverages the GPS to identify the location of a mobile device and utilizes a compass (or in combination with other sensors) to determine the direction that the device is heading to. Selecting reliable matches from a large correspondence set is challenging. and tracking algorithms based on these sensors.2. We refer readers to Fischler et al.3. the recognition procedure is conducted on the server side or in the cloud where abundant computing and memory resources are available. Since location recognition using the GPS is straightforward. After that. local feature-based method often generates a large amount of correspondences which inevitably could include some outliers. these schemes are very computationally expensive. the device’s motion is tracked based on motion sensors (also known as Inertial Measurement Unit [IMU]). Each of these sensors provides a unique input to the overall tracking system. All these schemes yield more reliable local-region matches by enforcing various geometric constraints. in the following section we mainly focus on common motion sensors used in today’s smart mobile devices. 9. A large number of false positive matches resulting from low-quality features could lead to an enormous amount of iterations in the RANSAC and PROSAC procedures.2. and existing solutions often rely on the RANSAC or PROSAC algorithms to solve this problem. magnetometers. (1981) and Chum et al.3. However. and gyroscopes. (2005) for details of the RANSAC and PROSAC algorithms.1. 9. 9.1  Sensor-Based Object Tracking With recent advances in microelectromechanical systems (MEMS) technology.3  Local Feature-Based Object Tracking A typical flow of local feature-based object tracking is to find corresponding local features on consecutive frames and then estimate the homography transformation between image frames based on local feature matches according to Equation 9. the distance between histogram at each spatial level is weighted and summed together. These IMUs are used by mobile apps for tracking the movement of a mobile device and consequently enabling the device to interact with its surrounding.216 Fundamentals of Wearable Computers and Augmented Reality into increasingly finer subregions and computes histograms of local features found within each subregion. To compute the similarity between two images. IMUs are now commonplace in most smart mobile devices. The quality of local features is essential for the accuracy of local feature matches.3. But different from marker-based tracking and pose estimation. Based on the location and direction of a device. a MAR system could determine which virtual data should be associated with the current scene. The key idea of RANSAC and PROSAC is to iteratively estimate parameters of a transformation model from a set of noisy feature correspondences so that a sufficient number of consensuses can be obtained. which utilize only four reliable corner matches.

they get closer to. This structure can be extended to build a three-axis accelerometer for measuring the displacement along all three axes.3. These types of gyroscopes are relatively cheap to manufacture. hardware. all gyroscopes used in smartphones today still experience a small amount of bias.1.7b. The amount the device flexes is monitored by a set of fingers that are attached to a movable inertial mass and flex with the device.7a illustrates the structure of a one-axis MEMS capacitive accelerometer. which can yield more accurate results. The accelerometer data is acquired by measuring the force exerted on an object which is able to flex up or down. and bias of these sensors and then present tracking algorithms based on the sensor data. the bias itself is a rate. Although latest MEMS gyroscopes have smaller errors than the previous generations.3. There are several types of accelerometers and the type used in mobile devices is the capacitive accelerometer. and move further apart from. Given that the gyroscope measures a rate (change over time). Gyroscopes work off the principles of the Coriolis force. This view is somewhat simplified. which prove to be much more accurate. more advanced gyroscopes are being developed and integrated into new devices. If the device is rotated about the axis defined by the first set of springs.1. and z directions with respect to the surface plane of the mobile device. however.2 Gyroscope A three-axis gyroscope provides a 3D vector which measures the rotational (angular) velocity of a device around three axes of the device’s coordinate system. y. Figure 9. as a bias can also . The proximity of these fingers/plates can create a change in the measured capacitance between multiple fingers/plates. On the contrary. As these fingers/plates move. 9. and are usually implemented within an integrated circuit (IC) using a vibrating mass attached to a set of springs. which can be monitored to measure the displacement of the center inertial mass. 9. Because of its broad application and increasing popularity.2. The gyroscope was first introduced into smartphones by Apple in iPhone4 in June 2010. causing a compression in the second set of springs due to the Coriolis acceleration experienced by the vibrating mass. the inner frame will be pushed away from the axis of rotation. An example of a MEMS gyroscope is depicted in Figure 9. The first Android phone in which a three-axis gyroscope was integrated is Google’s Nexus S in December 2010. Today’s mobile devices are equipped with a three-axis accelerometer which measures the forces in the x.Scalable Augmented Reality on Mobile Devices 217 In the following section. we first briefly review their functionalities.2. The accelerometer reading is a summation of two forces: the gravity force due to the weight of the device and the acceleration force due to the motion of the device. a set of stationary fingers/plates. many advanced inertial navigation systems (INSs) today have begun using optical gyroscopes instead. A gyroscope bias can be envisioned as the rotational velocity observed by the device when it is not in motion.1 Accelerometer An accelerometer measures the acceleration forces exerted on a mobile device. they are often noisy and could introduce significant errors if their measurements are not modeled properly.

7  (a) A typical 1D MEMS capacitive accelerometer and (b) a vibrating mass gyroscope. Fundamentals of Wearable Computers and Augmented Reality Motion.218 Base (substrate) x2 C΄1 C΄2 Spring ks x1 Fixed outer plates Movable plates Inner frame –Vo Resonating mass Vo Mass drive direction Proof mass: movable microstructure Springs Spring ks Base (substrate) (a) Coriolis sense fingers Vx (b) FIGURE 9. x .

we overview the most standard filtering method—Kalman filtering. thus is difficult to compensate for.4  Kalman Filtering for Sensor-Based Tracking The goal of tracking is to obtain the translation and orientation of a device in the 3D earth coordinate system. thus can have many forms and is difficult to compensate for. This bias is caused by the surrounding environment (external to the magnetometer itself) and can cause a wide range of errors in the magnetometer readings. One primary source of a magnetometer’s error is called the magnetometer bias. please refer to Li et al. In addition. For more advanced yet computationally expensive filters such as unscented Kalman filters or particle filters. Each of the three sensors alone can provide the orientation of a mobile device. The AMR magnetometers use a thin strip of a special kind of alloy that changes its resistance whenever there is a change in the surrounding magnetic field. (2007) for details.1. By double integration of motion-induced acceleration force. we can derive the gravity force components along three axes of the mobile device and then subtract the gravity force from the accelerometer data to obtain the motioninduced acceleration force. since each type of sensor data is quite noisy.3.2. This type of bias is caused by any distortions in the magnetic field surrounding the magnetometer. this bias is sensitive to several factors including the temperature and often randomly varies over time.3 Magnetometer The magnetometer measures the strength of the earth’s magnetic field. which is also the algorithm implemented in Android operating system for estimating smartphone’s orientation.2. usually applied to all axes of the magnetometer equally. The errors observed by a hard iron bias are constant offsets. (2013) and Cheon et al.Scalable Augmented Reality on Mobile Devices 219 occur when the device is moving as well. which is a vector pointing toward the magnetic north of the earth. The first is called hard iron bias. The other type of bias commonly experienced by the magnetometer is called soft iron bias. . This bias is not time or space varying and can be compensated for by simply adjusting the readings of the magnetometer by some constant value. 9. However.1. This bias is often estimated as a random variable by many filtering algorithms. however. Once we get the orientation information. This type of bias is primarily caused by devices which produce a magnetic field. In the following. The magnetometer found in most smart devices is primarily one of two possible types: a Hall effect magnetometer or an anisotropic magnetoresistive (AMR) magnetometer.3. relying on a single type of sensor cannot achieve an accurate tracking. they are more expensive. The Hall effect magnetometers are the most common and provide a voltage output in response to the measured field strength and can also sense polarity. AMR magnetometers usually yield much better accuracy. 9. The bias itself can be separated into one of two types. we can obtain the translation of the devices in the 3D earth coordinate system. Many approaches apply a filtering-based method to fuse three types of sensor data for a more reliable and precise tracking result.

Accordingly.8) �⋅� � I 3×3 � � nb � where ω is the rotation velocity around three axes nw and nb models are the gyroscope noise and bias. the filter uses the gyroscope measurement to predict the dynamics of the device rotation. the error state propagation model is given by  � � −[ω×] �δθ � �=� �δ b � � 03×3 − I 3×3 � �δθ � � − I 3×3 �⋅� � + � −03×3 � �δb � � 03×3 03×3 � � nω � (9.9) I 3×3 � where ˆ ×] ⋅ Θ = I 3×3 − [ω ˆ ×] ⋅ Ψ = [ω ˆ | ∆t ) ˆ | ∆t )) sin(| ω (1 − cos(| ω ˆ × ] 2⋅ +[ω ˆ| ˆ| |ω |ω ˆ | ∆t )) ˆ | ∆t − sin(| ω ˆ | ∆t )) (1 − cos(| ω (| ω − [ω×]2 ⋅ − I 3×3∆t ˆ| ˆ |3 |ω |ω ˆ = ω − b is treated as a system input which Note that in the state transition matrix. The gyroscope bias model is usually defined as b = nb.. respectively In most cases. In the predicting phase. (2005) and yields the following state transition matrix: � Θ Φ=� �03×3 Ψ� � (9. an error bias vector δb is defined as small differences between the estimated and the true bias of the device. where nb is an independent white Gaussian distribution along each axis. nw is assumed to be an independent white Gaussian distribution along each axis of the gyroscope input. Similarly. The solution to this differential equation has the closed form solution found in Trawny et al. The gyroscope measurements are thus integrated directly into the state transition equation and used to provide a predicted state estimate. its expected value is given as E[nω] = 03 × 1. four-element orientation vector) and the gyro-bias b(t): • q (t ) — X (t ) = ³ µ (9. ω has already been bias corrected.e. In particular. the state equation is defined using a seven-element state vector consisting of the quaternion q (t ) (i.7) – b(t ) ˜ We define an error angle vector δθ as a small rotation between the estimated and the true orientation of a device in the earth coordinate system.220 Fundamentals of Wearable Computers and Augmented Reality The Kalman filtering process for orientation estimation can be broken down into two primary steps: the prediction step and the updating step. Therefore. .

3  Hybrid-Based Recognition and Tracking At present.221 Scalable Augmented Reality on Mobile Devices In the updating phase. 9. and magnetometer measurements come directly from sampling the IMUs to revise the orientation estimation. 2007). the residual obtained in the measurement model is used to update the quaternion which in turn is used as the filter result. accelerometer. The basic idea behind these methods is to use GPS to identify the position . For instance. On the other hand. the filter combines the previously estimated state with the recorded accelerometer. Since this residual represents the error between the measurement vector and the predicted vector. This approximation is derived in Trawny et al. Each recorded measurement complies with a model which describes its relationship with the estimated states and noise errors of some measurements. The measurement residual is defined as r = zˆ − z.. z0 is a unit vector representation of north in the earth coordinate system. the processing power and memory capacity of mobile devices are still too limited for scalable MAR apps solely relying on sophisticated visual recognition and tracking methods. where zˆ is the input measurement. and GPS) usually lack sufficient accuracy. thus cannot provide satisfactory performance for recognition and tracking tasks. GPS positioning alone is insufficient for AR apps. the built-in sensors (e.g. but it can be combined with a visual tracking method to achieve a desired level of accuracy (Reitmayr et al. it is a close approximation of the error angle vector δθ. gyroscope. magnetometer. the measurement model is of the form: z = REB (q) ⋅ z 0 + nz (9. Specifically. (2005) and defined as r ≈ � �� REB (q) ⋅ z 0 ×�� � �δθ � 0 � � � + nz (9.10) where E[nz] = 0 E[ nz nzT ] = R REB (q) is a rotation matrix from the earth coordinate system to the predicted device coordinate system The rotation matrix is obtained using the propagated quaternion from the process model.3.12) � After the measurement update.11) � �δb � � � which gives a result for the measurement model H: H = � �� REB (q) ⋅ z 0 ×�� � 0 � (9. Several studies proposed to combine these vision-based and sensor-based methods.


Fundamentals of Wearable Computers and Augmented Reality

(i.e., the location on earth) and this information is used to initialize the visual tracking system, which in turn gives the user’s local pose and the view direction. In
Naimark et al. (2002), the authors proposed to combine visual tracking and GPS
for outdoor building visualization. The user can place virtual models on Google
Earth and the app can retrieve and visualize them based on the user’s GPS location.
Another promising direction is to combine vision information with motion sensor
data (i.e., gyroscope, accelerometer, and magnetometer) to provide a more accurate and efficient object tracking. The trend of integrating more sensors into mobile
devices has not stopped yet. For example, Google has just released a new mobile
platform, Tango, which integrates 6 Degree-of-Freedom motion sensors, depth sensors, and high-quality cameras. Amazon has announced their new Fire phone which
includes four cameras tucked into the front corners of the phone, in additional to
other motion sensors. Advances in mobile hardware offer the opportunities to gain
richer contextual information surrounding a mobile device and in turn open a door
for new approaches to best utilizing all available multimodel information.

OpenCV is one of the most popular software development libraries for computer
vision tasks. A mobile version of OpenCV has been released for running on mobile
platforms (OpenCV for Android). Other mature libraries such as Eigen (Eigen main
page) or LAPACK for linear algebra (LAPACK—Linear Algebra PACKage) also
become available for mobile platforms even though the support and the optimization
level are still limited.
Qualcomm has released a mobile-optimized computer vision library, named
FastCV (FastCV main page), which includes the most frequently used vision processing functions and can be used for camera-based mobile apps. The CV functions offered by FastCV include gesture recognition, text recognition and tracking,
and face detection, tracking, and recognition. FastCV can run on most ARM-based
processors but is particularly tuned for Qualcomm’s Snapdragon processor (S2 and
above) and utilizes hardware acceleration to speed up some of the most computerintensive vision functions.
Built on top of FastCV, Qualcomm further offers an MAR software development
kit (SDK), named Vuforia™ (Vuforia main page). Vuforia offers software functions
for app developers that can recognize and maintain a variety of 2D and 3D visual
targets, frame markers, text, and user interactions (e.g., interactions with a virtual
button). In addition, it provides APIs to easily render 3D graphics or video playback
on top of the real scene. To manage visual targets, Vuforia provides two ways to store
target databases: on a mobile device or on the cloud. Device databases do not require
network connectivity for the recognition, and thus can avoid the overhead for data
transfer and are free to use in mobile apps. However, due to the limited storage space
and computing power of mobile devices, device databases can only store a limited
number of targets; so far, the max of targets that can be stored in a device database
is 100. Cloud databases are managed using either the Target Manager UI provided
by Qualcomm or the Vuforia Web Service API. They enable you to host over one
million targets on the cloud. The Vuforia cloud recognition service is an enterprise

Scalable Augmented Reality on Mobile Devices


class solution with various pricing plans determined by your app’s total number of
image recognitions per month. Generally speaking, Vuforia development infrastructure facilitates, and significantly simplifies, the development of MAR apps.

The advancement of mobile technology, in terms of hardware computing power,
seamless connectivity to the cloud, and fast computer vision algorithms, has raised
AR into the mainstream of mobile apps. Following the widespread popularity of a
handful of killer MAR applications already commercially available, it is believed
that MAR will expand exponentially in the next few years. The advent of MAR will
have a profound and lasting impact on the way people use their smartphones and tablets. These emerging MAR apps will turn our everyday world into a fully interactive
digital experience, from which we can see, hear, feel, and even smell the information
in a different way. This emerging direction will push the industry toward truly ubiquitous computing and a technologically converged paradigm.
The scalability, accuracy, and efficiency of the underlying techniques (i.e., object
recognition and tracking) are key factors influencing user experience of MAR apps.
New algorithms in computer vision and pattern recognition, such as lightweight feature extraction, have been developed to provide efficiency and compactness on lowpower mobile devices and meanwhile maintain sufficiently good accuracy. Several
efforts are also made to analyze particular hardware limitations for executing existing recognition and tracking algorithms on mobile devices and explore adaption
techniques to address these limitations. In addition to advances in the development
of lightweight computer vision algorithms, a variety of sensors have been integrated
into modern smartphones, enabling location recognition (e.g., via GPS) and device
tracking (e.g., via gyroscope, accelerometer, and magnetometer) at little computational cost. However, due to large noise of low-cost sensors equipped in today’s
smartphones, the accuracy of location recognition and device tracking is usually
low and cannot meet the requirement for apps which demand high accuracy. Fusing
visual information with sensor data is a promising direction to achieve both high
accuracy and efficiency, and we shall see an increasing amount of research work
along this direction in the near future.

Alahi, A., Ortiz, R., and Vandergheynst, P. 2012. FREAK: Fast retinal keypoint, In Proceedings
of the Computer Vision on Pattern Recognition.
Ashby, F.G. and Perrin, N.A. 1988. Toward a unified theory of similarity and recognition.
Psychological Review, 95:124–150.
Bay, H., Ess, A., Tuytelaars T., and Gool, L.V. 2006. SURF: Speeded-up robust features. In
Proceedings of the European Conference on Computer Vision.
Bay, H., Ess, A., Tuytelaars, T., and Gool L.V. June 2008. Speeded-up robust features.
In Proceedings of the Conference on Vision and Image Understanding, 110(3),
Calonder, M., Lepetit, V., Strecha, C., and Fua, P. 2010. BRIEF: Binary robust independent
elementary features. In Proceedings of the European Conference on Computer Vision.


Fundamentals of Wearable Computers and Augmented Reality

Canny, J. 1986. A computational approach to edge detection. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 8(6):679–698.
Cheng, K.T., Yang, X., and Wang, Y.-C. July 7–9, 2013. Performance optimization of vision
apps on mobile application processor. International Conference on Systems, Signals and
Image Processing (IWSSIP), Bucharest, Romania.
Cheon, Y.J. and Kim, J.H. 2007. Unscented filtering in a unit quaternion space for spacecraft
attitude estimation. In Proceedings of the IEEE International Symposium on Industrial
Electronics, pp. 66–71.
Chum, O. and Matas, J. 2005. Matching with PROSAC—Progressive sample consensus. In
Proceedings of Computer Vision and Pattern Recognition, 1:220–226.
Eigen main page:
FastCV main page:
Fischler, M.A. and Bolles, R.C. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications
of the ACM, 24, 381–395.
Gionis, A., Indyk, P., and Motwani, R. 1999. Similarity search in high dimensions via hashing. In Proceedings of International Conference on Very Large Databases, 25:518–529.
Gonzalez, R. and Woods, R. 1992. Digital Image Processing, Addison Wesley: Reading, MA,
pp. 414–428.
Haralick, R. 1984. Digital step edges from zero crossing of second directional derivatives.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(1):58–68.
Honkamaa, P., Siltanen, S., Jappinen, J., Woodward, C., and Korkalo, O. 2007. Interactive
Outdoor Mobile Augmentation using Markerless Tracking and GPS. Laval, France.
Information Technology—Automatic Identification and Data Capture Techniques—Data
Matrix Bar Code Symbology Specification. 2006a. ISO/IEC 24720:2006. International
Organization for Standardization.
Information Technology—Automatic Identification and Data Capture Techniques—QR
Code 2005 Barcode Symbology Specification. 2006b. ISO/IEC 18004. International
Organization for Standardization.
Information Technology—Automatic Identification and Data Capture Techniques—PDF417
Barcode Symbology Specification. 2006c. ISO/IEC 15438:2006. International
Organization for Standardization.
Jiang, B., Neumann, U., and Suya, Y. March 2004. A robust hybrid tracking system for outdoor
augmented reality. In Proceedings of Virtual Reality, pp. 3–275.
LAPACK—Linear Algebra PACKage:
Lazebnik, S., Schmid, C., and Ponce, J. 2006. Beyond bags of features: Spatial pyramid
matching for recognizing natural scene categories. In Proceedings of Computer Vision
and Recognition Recognition, pp. 2169–2178.
Leutenegger, S., Chli, M., and Siegwart, R. 2011. BRISK: Binary robust invariant scalable
keypoints. In Proceedings of the Computer Vision on Pattern Recognition.
Li, W.W.L, Iltis, R.A., and Win, M.Z. 2013. Integrated IMU and radiolocation-based navigation using a rao-blackwellized particle filter. In Proceedigns of the IEEE International
Conference on Acoustics, Speech and Signal Processing, pp. 5165–5169.
Liao, C.Y., Tang, H., Liu, Q., Chiu, P., and Chen, F. 2010. FACT: Fine-grained cross-media
interaction with documented via a portable hybrid paper-laptop interface. In ACM
Liu, X. and Doermann, D. 2008. Mobile retriever: Access to digital documents from their
physical source. International Journal of Document Analysis and Recognition 11(1):
pp. 19–27.
Lowe, D.G. 2004. Distinctive image features from scale-invariant keypoints. International
Journal of Computer Vision, 60(2):91–110.

Scalable Augmented Reality on Mobile Devices


Matas, J., Chum, O., Urban, M., and Pajdla, T. 2002. Robust wide baseline stereo from maximally stable extremal regions. In Proceedings of British Machine Vision Conference,
pp. 384–396.
Naimark, L. and Foxlin, E. 2002. Circular data matrix fiducial system and robust image
processing for a wearable vision-inertial self-tracker. In Proceedings of International
Symposium on Mixed and Augmented Reality, pp. 27–36.
OpenCV for Android:
Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. 2007. Object retrieval with large
vocabularies and fast spatial matching. Proceedings of Computer Vision and Pattern
Recognition, pp. 1–8.
Reitmayr, G. and Drummond, T.W. 2007. Initialization for visual tracking in urban environments. pp. 161–172.
Ribo, M., Lang, P., Ganster, H., Brandner, M., Stock, C., and Pinz, A. 2002. Hybrid tracking for outdoor augmented reality applications. Computer Graphics and Applications,
IEEE, 22(6):54–63, 178.
Rosin, P.L. 1999. Measuring corner properties. Journal of Computer Vision and Image
Understanding, 73(2):291–307.
Rosten, E. and Drummond, T. 2006. Machine learning for high speed corner detection. In
Proceedings of the European Conference on Computer Vision.
Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. 2011. ORB: An efficient alternative to
SIFT or SURF. In Proceedings of the International Conference on Computer Vision.
Simard, P., Bottou, L., Haffner, P., and LeCun, Y. 1998. Boxlets: A fast convolution algorithm for signal processing and neural networks. In Proceedings of Neural Information
Processing Systems (NIPS).
Sivic, J. and Zisserman, A. 2003. Video google: A text retrieval approach to object
matching in videos. Proceedings of International Conference on Computer Vision,
Ta, D.N., Chen, W.C., Gelfand, N., and Pulli, K. 2009. SURFTrac: Efficient tracking and
continuous object recognition using local feature descriptors. In Proceedings of the
Conference on Vision and Pattern Recognition.
Terriberry, T.B., French, L.M., and Helmsen, J. 2008. GPU accelerating speeded-up robust
features. In Proceedings of the 3D Data Processing, Visualization and Transmission.
Trawny, N. and Roumeliotis, S. 2005. Indirect Kalman Filter for 3D Attitude Estimation. Tech.
Rep. 2. Department of Computing Science and Engineering, University of Minnesota,
Minneapolis, MN.
Tuytelaars, T. and Mikolajczyk, K. 2008. Local invariant feature detectors: A survey. Journal
Foundations and Trends in Computer Graphics and Vision, 3:177–280.
Uchiyama, S., Takemoto, K., Satoh, K., Yamamoto, H., and Tamura, H. 2002. MR platform: A
basic body on which mixed reality applications are built. In Proceedings of International
Symposium on Mixed and Augmented Reality. Vol. 00, p. 246.
Vuforia main page:
Wu, Z., Ke, Q.F., Isard, M., and Sun, J. 2009. Bundling features for large scale partial-duplicate web image search. In Proceedings of Computer Vision and Pattern Recognition,
pp. 25–32.
Yang, X. and Cheng, K.T. June 2012a. Accelerating SURF detector on mobile devices. ACM
International Conference on Multimedia, Nara, Japan.
Yang, X. and Cheng, K.T. 2012b. LDB: An ultrafast feature for scalable augmented reality on
mobile device. In Proceedings of International Symposium on Mixed and Augmented
Reality, pp. 49–57.
Yang, X. and Cheng, K.T. 2014a. Local difference binary for ultrafast distinctive feature description. IEEE Transactions on Pattern Analysis and Machine Intelligence,


Fundamentals of Wearable Computers and Augmented Reality

Yang, X. and Cheng, K.T. 2014b. Learning optimized local difference binaries for scalable
augmented reality on mobile devices. IEEE Transactions on Visualization and Computer
Graphics, 20(6):852–865.
Yang, X., Liao, C.Y., and Liu, Q. 2012. MixPad: Augmenting paper with mice & keyboards
for bimanual, cross-media and fine-grained interaction with documents. In Proceedings
of ACM Multimedia, pp. 1145–1148.
Yang, X., Liao, C.Y., Liu, Q., and Cheng, K.T. 2011a. Minimum correspondence sets for
improving large-scale augmented paper. In Proceedings of International Conference on
Virtual Reality Continuum and Its Applications in Industry.
Yang, X., Liao, C.Y., Liu, Q., and Cheng, K.T. 2011b. Large-scale EMM identification with
geometry-constrained visual word correspondence voting. In Proceedings of ACM
International Conference on Multimedia Retrieval.

10 Taxonomy, Research

Haptic Augmented Reality
Status, and Challenges
Seokhee Jeon, Seungmoon Choi,
and Matthias Harders

10.1 Introduction................................................................................................... 227
10.2 Taxonomies.................................................................................................... 229
10.2.1 Visuo-Haptic Reality–Virtuality Continuum.................................... 229
10.2.2 Artificial Recreation and Augmented Perception.............................. 232
10.2.3 Within- and Between-Property Augmentation.................................. 233
10.3 Components Required for Haptic AR........................................................... 234
10.3.1 Interface for Haptic AR..................................................................... 234
10.3.2 Registration between Real and Virtual Stimuli................................. 236
10.3.3 Rendering Algorithm for Augmentation........................................... 237
10.3.4 Models for Haptic AR....................................................................... 238
10.4 Stiffness Modulation..................................................................................... 239
10.4.1 Haptic AR Interface...........................................................................240
10.4.2 Stiffness Modulation in Single-Contact Interaction..........................240
10.4.3 Stiffness Modulation in Two-Contact Squeezing.............................. 243
10.5 Application: Palpating Virtual Inclusion in Phantom with Two Contacts......245
10.5.1 Rendering Algorithm......................................................................... 247
10.6 Friction Modulation.......................................................................................248
10.7 Open Research Topics................................................................................... 249
10.8 Conclusions.................................................................................................... 250
References............................................................................................................... 251

This chapter introduces an emerging research field in augmented reality (AR), called
haptic AR. As AR enables a real space to be transformed to a semi-virtual space by
providing a user with the mixed sensations of real and virtual objects, haptic AR does
the same for the sense of touch; a user can touch a real object, a virtual object, or a real
object augmented with virtual touch. Visual AR is a relatively mature technology and
is being applied to diverse practical applications such as surgical training, industrial
manufacturing, and entertainment (Azuma et al. 2001). In contrast, the technology


Fundamentals of Wearable Computers and Augmented Reality

for haptic AR is quite recent and poses a great number of new research problems
ranging from modeling to rendering in terms of both hardware and software.
Haptic AR promises great potential to enrich user interaction in various applications.
For example, suppose that a user is holding a pen-shaped magic tool in the hand, which
allows the user to touch and explore a virtual vase overlaid on a real table. Besides, the
user may draw a picture on the table with an augmented feel of using a paint brush on
a smooth piece of paper, or using a marker on a stiff white board. In a more practical
setting, medical students can practice cancer palpation skills by exploring a phantom
body while trying to find virtual tumors that are rendered inside the body. A consumertargeted application can be found in online stores. Consumers can see clothes displayed
on the touchscreen of a tablet computer and feel their textures with bare fingers, for
which the textural and frictional properties of the touchscreen are modulated to those
of the clothes. Another prominent example is augmentation or guidance of motor skills
by means of external haptic (force or vibrotactile) feedback, for example, shared control or motor learning of complex skills such as driving and calligraphy. Creating such
haptic modulations belongs to the realm of haptic AR. Although we have a long way to
go in order to realize all the envisioned applications of haptic AR, some representative
examples that have been developed in recent years are shown in Figure 10.1.

Virtual tumor


Haptic device




FIGURE 10.1  Representative applications of haptic AR. (a) AR-based open surgery simulator introduced. (From Harders, M. et  al., IEEE Trans. Visual. Comput. Graph., 15, 138,
2009.) (b) Haptic AR breast tumor palpation system. (From Jeon, S. and Harders, M., IEEE
Trans. Haptics, 99, 1, 2014.) (c) Texture modeling and rendering based on contact acceleration data. (Reprinted from Romano, J.M. and Kuchenbecker, K.J., IEEE Trans. Haptics, 5,
109, 2011. With permission.) (d) Conceptual illustration of the haptic AR drawing example.

Haptic Augmented Reality


In this chapter, we first address three taxonomies for haptic AR based on a composite visuo-haptic reality–virtuality continuum, a functional aspect of haptic AR
applications, and the subject of augmentation (Section 10.2). A number of studies
related to haptic AR are reviewed and classified based on the three taxonomies.
Based on the review, associated research issues along with components needed for
a haptic AR system are elucidated in Section 10.3. Sections 10.4 through 10.6 introduce our approach for the augmentation of real object stiffness and friction, in the
interaction with one or two contact points. A discussion of the open research issues
for haptic AR is provided in Section 10.7, followed by brief conclusions in Section
10.8. We hope that this chapter could prompt more research interest in this exciting,
yet unexplored, area of haptic AR.

10.2.1  Visuo-Haptic Reality–Virtuality Continuum
General concepts associated with AR, or more generally, mixed reality (MR) were
defined earlier by Milgram and Colquhoun Jr. (1999) using the reality–virtuality
continuum shown in Figure 10.2a. The continuum includes all possible combinations
of purely real and virtual environments, with the intermediate area corresponding to
MR. Whether an environment is closer to reality or virtuality depends on the amount
of overlay or augmentation that the computer system needs to perform; the more augmentation performed, the closer to virtuality. This criterion allows MR to be further
classified into AR (e.g., a heads-up display in an aircraft cockpit) and augmented
virtuality (e.g., a computer game employing a virtual dancer with the face image
of a famous actress). We, however, note that the current literature does not strictly
discriminate the two terms, and uses AR and MR interchangeably.
Extending the concept, we can define a similar reality–virtuality continuum for
the sense of touch and construct a visuo-haptic reality–virtuality continuum by compositing the two unimodal continua shown in Figure 10.2b. This continuum can be
valuable for building the taxonomy of haptic MR. In Figure 10.2b, the whole visuohaptic continuum is classified into nine categories, and each category is named in an
abbreviated form. The shaded regions belong to the realm of MR. In what follows, we
review the concepts and instances associated with each category, with more attention
to those of MR. Note that the continuum for touch includes all kinds of haptic feedback and does not depend on the specific types of haptic sensations (e.g., kinesthetic,
tactile, or thermal) or interaction paradigms (e.g., tool-mediated or bare-handed).
In the composite continuum, the left column has the three categories of h­ aptic
reality, vR-hR, vMR-hR, and vV-hR, where the corresponding environments provide only real haptic sensations. Among them, the simplest category is vR-hR,
which represents purely real environments without any synthetic stimuli. The other
end, vV-hR, refers to the conventional visual virtual environments with real touch,
for example, using a tangible prop to interact with virtual objects. Environments
between the two ends belong to vMR-hR, in which a user sees mixed objects but
still touches real objects. A typical example is the so-called tangible AR that has
been actively studied in the visual AR community. In tangible AR, a real prop held


Fundamentals of Wearable Computers and Augmented Reality

Visual virtuality
Visual mixed reality
Visual reality

Degree of virtuality in vision

Mixed reality
Augmented reality
Augmented virtuality
Reality–virtuality continuum
Real environment
Virtual environment










Haptic reality

Haptic mixed reality

Haptic virtuality

Degree of virtuality in touch

FIGURE 10.2  Reality–virtuality continuum extended to encompass touch. (Figures taken
from Jeon, S. and Choi, S., Presence Teleop. Virt. Environ., 18, 387, 2009. With permission.)
(a) Original reality–virtuality continuum. (From Milgram, P. and Colquhoun, H. Jr., A taxonomy of real and virtual world display integration, in Mixed Reality—Merging Real and
Virtual Worlds, Y.O.A.H. Tamura (ed.), Springer-Verlag, Berlin, Germany, 1999, pp. 1–16.)
(b) Composite visuo-haptic reality–virtuality continuum. (Jeon, S. and Choi, S., Presence
Teleop. Virt. Environ., 18, 387, 2009.) Shaded areas in the composite continuum represent the
realm of mixed reality.

in the hand is usually used as a tangible interface for visually mixed environments
(e.g., the MagicBook in Billinghurst et al. 2001), and its haptic property is regarded
unimportant for the applications. Another example is the projection augmented
model. A computer-generated image is projected onto a real physical model to create
a ­realistic-looking object, and the model can be touched by the bare hand (e.g., see
Bennett and Stevens 2006). Since the material property (e.g., texture) of the real
object may not agree with its visually augmented model, haptic properties are usually incorrectly displayed in this application.
The categories in the right column of the composite continuum, vR-hV, vMR-hV,
and vV-hV, are for haptic virtuality, corresponding to environments with only virtual
haptic sensations, and have received the most attention from the haptics research
community. Robot-assisted motor rehabilitation can be an example of vR-hV where

Haptic Augmented Reality


synthetic haptic feedback is provided in a real visual environment, while an interactive virtual simulator is an instance of vV-hV where the sensory information of both
modalities is virtual. In the intermediate category, vMR-hV, purely virtual haptic
objects are placed in a visually mixed environment, and are rendered using a haptic interface on the basis of the conventional haptic rendering methods for virtual
objects. Earlier attempts in this category focused on how to integrate haptic rendering of virtual objects into the existing visual AR framework, and they identified
the precise registration between the haptic and the visual coordinate frame as a key
issue (Adcock et al. 2003, Vallino and Brown 1999). For this topic, Kim et al. (2006)
applied an adaptive low-pass filter to reduce the trembling error of a low-cost visionbased tracker using ARToolkit, and upsampled the tracking data for use with 1 kHz
haptic rendering (Kim et al. 2006). Bianchi et al. further improved the registration
accuracy via intensive calibration of a vision-based object tracker (Bianchi et  al.
2006a,b). Their latest work explored the potential of visuo-haptic AR technology
for medical training with their highly stable and accurate AR system (Harders et al.
2009). Ott et  al. also applied the HMD-based visuo-haptic framework to training
processes in industry and demonstrated its potential (Ott et  al. 2007). In applications, a half mirror was often used for constructing a visuo-haptic framework due to
the better collocation of visual and haptic feedback, for example, ImmersiveTouch
(Luciano et  al. 2005), Reachin Display (Reachin Technology), PARIS display
(Johnson et al. 2000), and SenseGraphics 3D-IW (SenseGraphics). Such frameworks
were, for instance, applied to cranial implant design (Scharver et al. 2004) or MR
painting application (Sandor et al. 2007).
The last categories for haptic MR, vR-hMR, vMR-hMR, and vV-hMR, with which
the rest of this chapter is concerned, lie in the middle column of the composite continuum. A common characteristic of haptic MR is that synthetic haptic signals that
are generated by a haptic interface modulate or augment stimuli that occur due to a
contact between a real object and a haptic interface medium, that is, a tool or a body
part. The VisHap system (Ye et al. 2003) is an instance of ­vR-hMR that provides
mixed haptic sensations in a real environment. In this system, some properties of a
virtual object (e.g., shape and stiffness) are rendered by a haptic device, while others
(e.g., texture and friction) are supplied by a real prop attached at the end-effector of
the device. Other examples in this category are the SmartTool (Nojima et al. 2002)
and SmartTouch systems (Kajimoto et al. 2004). They utilized various sensors (optical and electrical conductivity sensors) to capture real signals that could hardly be
perceived by the bare hand, transformed the signals into haptic information, and then
delivered them to the user in order to facilitate certain tasks (e.g., peeling off the
white from the yolk in an egg). The MicroTactus system (Yao et al. 2004) is another
example of vR-hMR, which detects and magnifies acceleration signals caused by
the interaction of a pen-type probe with a real object. The system was shown to
improve the performance of tissue boundary detection in arthroscopic surgical training. A similar pen-type haptic AR system, Ubi-Pen (Kyung and Lee 2009), embedded miniaturized texture and vibrotactile displays in the pen, adding realistic tactile
feedback for interaction with a touch screen in mobile devices.
On the other hand, environments in vV-hMR use synthetic visual stimuli. For example, Borst et al. investigated the utility of haptic MR in a visual virtual environment


Fundamentals of Wearable Computers and Augmented Reality

by adding synthetic force to a passive haptic response for a panel control task (Borst
and Volz 2005). Their results showed that mixed force feedback was better than synthetic force alone in terms of task performance and user preference. In vMR-hMR,
both modalities rely on mixed stimuli. Ha et al. installed a vibrator in a real tangible
prop to produce virtual vibrotactile sensations in addition to the real haptic information of the prop in a visually mixed environment (Ha et al. 2007). They demonstrated
that the virtual vibrotactile feedback enhances immersion for an AR-based handheld
game. Bayart et al. introduced a teleoperation framework where force measured at
the remote site is presented at the master side with additional virtual force and mixed
imagery (Bayart et al. 2007, 2008). In particular, they tried to modulate a certain real
haptic property with virtual force feedback for a hole-patching task and a painting
application, unlike most of the related studies introduced earlier.
Several remarks need to be made. First, the vast majority of related work, except
(Bayart et  al. 2008, Borst and Volz 2005, Nojima et  al. 2002), has used the term
haptic AR without distinguishing vMR-hV and hMR, although research issues associated with the two categories are fundamentally different. Second, haptic MR can
be further classified to haptic AR and haptic augmented virtuality using the same
criterion of visual MR. All of the research instances of hMR introduced earlier correspond to haptic AR, since little knowledge regarding an environment is managed
by the computer for haptic augmentation. However, despite its potential, attempts to
develop systematic and general computational algorithms for haptic AR have been
scanty. An instance of haptic augmented virtuality can be haptic rendering systems
that use haptic signals captured from a real object (e.g., see Hoever et  al. 2009,
Okamura et al. 2001, Pai et al. 2001, Romano and Kuchenbecker 2011) in addition
to virtual object rendering, although such a concept has not been formalized before.
Third, although the taxonomy is defined for composite visuo-haptic configurations,
a unimodal case (e.g., no haptic or visual feedback) can also be mapped to the corresponding 1D continuum on the axes in Figure 10.2b.

10.2.2 Artificial Recreation and Augmented Perception
The taxonomy described in the previous section is based on the visuo-haptic ­reality–
virtuality continuum, thereby elucidating the nature of stimuli provided to users
and associated research issues. Also useful is a taxonomy that specifies the aims of
­augmentation. Hugues et al. (2011) defined two functional categories for visual AR:
artificial recreation (or environment) and augmented perception, which can be also
applied to hMR category in Figure 10.2. This is in line with the terms used by Bayart
and Kheddar (2006)—haptic enhancing and enhanced haptics, respectively.
In artificial recreation, haptic augmentation is used to provide a realistic presentation
of physical entities by exploiting the crucial advantage of AR, that is, more efficient
and realistic construction of an immersive environment, compared to VR. Artificial
recreation can be further classified into two sub-categories. It can be either for realistic reproduction of a specific physical environment, for example, the texture display
example of clothes described in Section 10.1, or for creating a nonexisting environment,
for example, the tumor palpation example in Jeon et al. (2012). The latter is a particularly
important area for haptic AR, since it maximizes the advantages of both VR and AR.

Haptic Augmented Reality


In contrast, augmented perception aims at utilizing touch as an additional channel
for transferring useful information that can assist decision-making. Since realism is
no longer a concern, the form of virtual haptic stimuli in this category significantly
varies depending on the target usage. For example, one of the simplest forms is
vibration alerts. Synthetic vibratory signals, while mixed with other haptic attributes of the environment, are a powerful means of conveying timing information,
for example, mobile phone alarms, driving hazard warnings (Chun et al. 2013), and
rhythmic guidance (Lee et al. 2012b). Recently, many researchers also tried to use
vibration for spatial information (e.g., Lee and Choi 2014, Sreng et al. 2008) and discrete categorical information (e.g., haptic icon [Rovers and van Essen 2004, Ternes
and MacLean 2008]).
Force feedback is another widely used form for augmentation in this category.
The most common example is virtual fixtures used for haptic guidance. They add
guiding or preventing forces to the operator’s movement while she/he performs a
motor task, in order to improve the safety, accuracy, and speed of task execution
(Abbott et  al. 2007). The term was originally coined in Rosenberg (1993), and it
has been applied to various areas, for example, a manual milling tool (Zoran and
Paradiso 2012), the SmartTool (Nojima et al. 2002), or surgical assistance systems
(Li et al. 2007).
There have also been attempts that faithfully follow the original meaning of
augmentation of reality. The aforementioned MicroTactus system (Yao et  al.
2004) is one example. Sometimes, augmentation is done by mapping nonhaptic
information into haptic cues for the purpose of data perceptualization, for example, color information mapped to tactile stimuli (Kajimoto et al. 2004). Another
interesting concept is diminished reality, which hides reality, for example, removing the surface haptic texture of a physical object (Ochiai et al. 2014). This concept of diminished reality can also be applied to hand tremor cancellation in
surgical operations (Gopal et  al. 2013, Mitchell et  al. 2007). Lastly, in a broad
sense, exoskeletal suits are also an example of augmentation through mixing real
and virtual force.

10.2.3  Within- and Between-Property Augmentation
Various physical properties, such as shape, stiffness, friction, viscosity, and surface
texture, contribute to haptic perception. Depending on the haptic AR scenario, some
object properties may remain intact while the rest may be subject to augmentation.
Here, the augmentation may occur within a property, for example, mixing real and
virtual stiffness for rendering harder virtual nodules inside a tissue phantom (Jeon
et al. 2012), or it may be between different properties, for example, adding virtual
stiffness to real surface textures (Yokokohji et  al. 1999) or vice versa (Borst and
Volz 2005).
This distinction is particularly useful for gauging the degree, accuracy, and type
of registration needed for augmentation. Consequently, this taxonomy allows the
developer to quantify the amount of environment modeling necessary for registration in preprocessing and rendering steps. The next section further describes issues
and requirements for registration and environment modeling for haptic AR.


Fundamentals of Wearable Computers and Augmented Reality

TABLE 10.1
Classification of Related Studies Using the Composite Taxonomy


Artificial Recreation

Augmented Perception

Borst and Volz (2005)
Jeon and Choi (2009)
Jeon and Choi (2011)
Jeon et al. (2012)
Jeon et al. (2011)
Jeon and Harders (2014)
SoIanki and Raja (2010)
Gerling and Thomas (2005)
Kurita et al. (2009)
Hachisu et al. (2012)
Bayart et al. (2008)
Bayart et al. (2007)
Fukumoto and Sugimura (2001)
Iwata et al. (2001)
Minamizawa et al. (2007)
Park et al. (2011)
Ye et al. (2003)
Yokokohji et al. (1999)
Frey et al. (2006)
Parkes et al. (2009)
Ha et al. (2006)
Romano and Kuchenbecker (2011)

Abbott and Okamura (2003)
Bose et al. (1992)
Gopal et al. (2013)
Kajimoto et al. (2004)
Mitchell et al. (2007)
Nojima et al. (2002)
Ochiai et al. (2014)
Yao et al. (2004)
Yang et al. (2008)
Lee et al. (2012a)
Brewster and Brown (2004)
Brown and Kaaresoja (2006)
Kim and Kim (2012)
Kyung and Lee (2009)
Lee and Choi (2014)
Powell and O’Malley (2011)
Rosenberg (1993)
Spence and Ho (2008)
Zoran and Paradiso (2012)
Grosshauser and Hermann (2009)

Further, the last two taxonomies are combined to construct a composite taxonomy,
and all relevant literature in the hMR category is classified using this taxonomy in
Table 10.1. Note that most of the haptic AR systems have both within- and betweenproperty characteristics to some degree. For clear classification, we only examined
key augmentation features in Table 10.1.

10.3.1 Interface for Haptic AR
A haptic AR framework inherently involves interactions with real e­ nvironments. There­
fore, three systems—a haptic interface, a human operator, and a real e­ nvironment—
react to each other through an interaction tool, leading to tridirectional interaction as
shown in Figure 10.3.
During interaction, the interaction tool is coupled with the three components,
and this coupling is the core for the realization of haptic AR, that is, merging the
real and the virtual. Through this coupled tool, relevant physical signals from


Haptic Augmented Reality
Real environment
based on

Coupled when in contact





Haptic rendering system



Human user

FIGURE 10.3  Tridirectional interaction in haptic AR.

both the real environment and the haptic interface are mixed and transmitted to
the user. Therefore, designing this feel-through tool is of substantial importance
in designing a haptic AR interface.
The feel-through can be either direct or indirect. Direct feel-through, analogous
to optical see-through in visual AR, transmits relevant physical signals directly to
the user via a mechanically coupled implement. In contrast, in indirect feel-through
(similar to video see-through), relevant physical signals are sensed, modeled, and
synthetically reconstructed for the user to feel, for example, in master–slave teleoperation. In direct feel-through, preserving the realism of a real environment and
mixing real and virtual stimuli is relatively easy, but real signals must be compensated for with great care for augmentation. To this end, the system may need to
employ very accurate real response estimation methods for active compensation
or special hardware for passive compensation, for example, using a ball bearing
tip to remove friction (Jeon and Choi 2010) and using a deformable tip to compensate for real contact vibration (Hachisu et al. 2012). On the contrary, in indirect
feel-through, modulating real signals is easier since all the final stimuli are synthesized, but more sophisticated hardware is required for transparent rendering of
virtual stimuli with high realism.
Different kinds of coupling may exist. Mechanical coupling is a typical example,
a force feedback haptic stylus instrumented with a contact tip, for example (Jeon and
Choi 2011). Other forms such as thermal coupling and electric coupling are also possible depending on the target property. In between-property augmentation, coupling
may not be very tight, for example, only position data and timing are shared (Borst
and Volz 2005).
Haptic AR tools can come in many different forms. In addition to typical styli,
very thin sheath-type tools are also used, for example, sensors on one side and


Fundamentals of Wearable Computers and Augmented Reality

actuators on the other side of a sheath (Nojima et al. 2002). Sometimes a real object
itself is a tool, for example, when both sensing and actuation modules are embedded
in a tangible marker (Ha et al. 2006).
A tool and coupling for haptic AR needs to be very carefully designed. Each of
the three components involved in the interaction requires a proper attachment to the
tool, appropriate sensing and actuation capability, and eventually, all of these should
be compactly integrated into the tool in a way that it can be appropriately used by
a user. To this end, the form factors of the sensors, attachment joints, and actuation
parts should be carefully designed to maximize the reliability of sensing and actuation while maintaining a sufficient degree of freedom of movement.

10.3.2  Registration between Real and Virtual Stimuli
An AR system generally faces two registration problems between real and virtual
environments: spatial and temporal registrations. Virtual and real stimuli must be
spatially and temporally aligned with each other with high accuracy and robustness.
In visual AR, proper alignment of virtual graphics (usually in 3D) on real video
streams has been a major research issue (Feng et al. 2008). Tracking an AR camera,
a user, and real objects and localizing them in a world coordinate frame are the core
technical problems (Harders et al. 2009).
In haptic AR, virtual and real haptic stimuli also have to be spatially and
temporally aligned, for example, adding a virtual force at the right position and
at the right moment. While sharing the same principle, registration in haptic AR
sometimes has different technical requirements. In many haptic AR scenarios,
an area of interest for touching is very small (even one or a couple of points),
and touch usually occurs via a tool. Therefore, large area tracking used in visual
AR is not necessary, and tracking can be simplified, for example, detecting the
moment and location of contact between a haptic tool and a real object using a
mechanical tracker. However, tracking data are directly used for haptic rendering in many cases, so the update rate and accuracy of tracking should be carefully considered.
In addition to such basic position and timing registration, other forms of spatial
and temporal quantities related to the target augmentation property often require
adequate alignment. For example, in order to augment stiffness, the direction of
force for virtual stiffness must be aligned with the response force direction from
real stiffness. Another example is an AR pulse simulator where the frequency and
phase of a virtual heart beat should match with those of the real one. These alignments usually can be done by acquiring the real quantity through sophisticated realtime sensing and/or estimation modules and setting corresponding virtual values
to them. Examining and designing such property-related registration is one of the
major research issues in developing haptic AR systems.
The requirements of this property-related registration largely depend on an application area, a target augmentation property, and physical signals involved. However, the
within/between-property taxonomy can provide some clues for judging what kinds of
and how accurate registration is needed, as the taxonomy gives the degree of association

This step usually estimates the spatial and temporal state of the tool and the real environment and then conducts the registration as indicated in Section 10. in betweenproperty augmentation. Computational procedures in this step largely depend on the categories of haptic AR (Table 10.4 for how we have approached this issue). Signal processing can also be applied to the sensor values. 10. In between-property augmentation. For artificial recreation. this estimation process is not required in general. often with the estimation of real properties based on sensors and environment models (see Section 10. doubling the amplitude of measured contact vibration (Yao et al. In addition. Step 4 sends commands to the haptic AR interface to display the feedback ­calculated in Step 3. (3) merging stimuli. or both (see Section 10. This estimation can be done either using a model already identified in a preprocessing step or by real-time estimation of the property using sensor values. Sometimes we need techniques for controlling the hardware for the precise delivery of stimuli. Depending on the result of this step. and (4) displaying the stimuli. the registration may be of lesser accuracy in this case.3  Rendering Algorithm for Augmentation A rendering frame of an AR system consists of (1) sensing the real environment. within-property augmentation often requires an estimation of the properties of a real object in order to compensate for or augment it. and providing virtual properties is simpler. augmented perception may need to derive the target signal based on purely sensed signals and/or using simpler rules.3.Haptic Augmented Reality 237 between virtual and real signals.1). Thus. 2004). Steps 2 and 3 are the core parts for haptic AR. property-related registration and contact detection between the tool and real objects. However. Step 1 prepares data for steps 2 and 3 by sensing variables from the real environment. and thus virtual signals related to a target property need to be exactly aligned with corresponding real signals for harmonious merging and smooth transition along the line between real and virtual. (2) real–virtual registration.3. different properties are usually treated separately. For instance. for example. This needs very sophisticated registration. . the system decides whether to proceed to step 3 or go back to step 1 in this frame.4 for examples).3. Step 2 conducts a registration process based on the sensed data and pre-­ identified models (see Section 10. and virtual signals of one target property do not have to be closely associated with real signals of the other properties. modulating the feel of a brush in the AR drawing example first needs the compensation of the real tension and friction of the manipulandum. In the case of within-property augmentation. mixing happens in a single property. However. The following paragraphs overview the steps for haptic AR. this step simulates the behaviors of the properties involved in the rendering using physically based models. for example. however.4 for more details). Step 3 is dedicated to the actual calculation of virtual feedback (in direct feelthrough) or mixed feedback (in indirect feel-through).2.3.

Each category in Table 10. which has the ­following requirements. which is usually built in preprocessing. First. For summary. in Section 10.1 has different requirements for models. systems in the between-property category may have to use very accurate registration and estimation models.3). Balancing the amount of modeling and complexity of rendering algorithm is important. For example.3. which is the same for haptic VR rendering. Employing such a geometry model makes rendering simpler since conventional rendering algorithms for haptic VR can be readily applied. The estimation often has challenging ­accuracy requirements while still preserving efficiency for real-time performance. estimating physical responses has been extensively studied in robotics and mechatronics for the purpose of environment modeling and/or compensation. The last model is for the estimation of real signals in order for modulation in Step  3 of the rendering (Section 10. For properties such as stiffness and friction.2 outlines the rendering and registration characteristics of the categories in the two taxonomies. for example. However.3. In general. The most common example is the geometry model of real objects for contact and surface normal detection.3. acquiring and using such models should be minimized in order to fully utilize the general advantage of AR: efficient construction of a realistic and immersive environment without extensive modeling. while those in the augmenting perception category may suffice with simpler model for simulation.3. In addition to geometry models. In most cases. In the following sections. using the same hardware for both identification and rendering is preferred for the usability of the system.2 and 10. haptic AR requires predefined models for three different purposes.4. Lastly. models are needed for simulating the responses of the signals associated with rendering properties.238 Fundamentals of Wearable Computers and Augmented Reality 10. .4 Models for Haptic AR As aforementioned in Sections 10. there are two approaches for this: openloop model-based estimation and closed-loop sensor-based estimation. targeting a system that can modulate stiffness and friction of a real object by systematically adding virtual stiffness and friction.3. Table 10.3. very quick identification is mandatory for scenarios in which real objects for interaction frequently change. One of the research issues is how to adapt those techniques for use in haptic AR. they include some degree of simplification to fulfill the real-time requirement of haptic rendering. while merging between properties may not need models for registration and estimation. The identification process should also be feasible for the application. The second model is for real–virtual registration (Step 2 in Section 10. The estimation should be perceptually accurate since the errors in estimation can be directly fed into the final stimuli. we estimate local geometry and local deformation near to the contact point based on a simplified environment model that is identified in preprocessing in order for stiffness direction registration.3). Such computational models have been extensively studied in haptics and virtual reality. property augmentation sometimes needs models for the estimation of real information. Furthermore. Systems in the artificial recreation category may need more sophisticated models for both simulation and estimation. we introduce example algorithms for haptic AR.

which is one of the most important properties for rendering the shape and hardness of an object.g. Indirect Feel-Through • Transparent haptic rendering algorithm and interface needed. • Rendering: algorithms for haptic VR can be applied.4. Category Models required Artificial Recreation • Models for physics simulation. focusing on grasping and squeezing (Section 10. 10.3). Objects made of plastic (e. while preserving plausible perceptual quality. not the entire environment. Category Rendering Direct Feel-Through • Real-time compensation of real property needed. steel) are out of scope due to either complex material behavior or the performance limitations of current haptic devices. • Registration: only basic position and timing registration needed. We aim at providing a user with augmented stiffness by adding virtual force feedback when interacting with real objects. . The second step extended the first system to two-point manipulation. or contour following (Section 10.2 Characteristics of the Categories Category Within Property Between Property Registration and rendering • Registration: position and timing registration as well as propertyrelated registration needed. The first step was single-point interaction supporting typical exploratory patterns..4  STIFFNESS MODULATION We initiated our endeavor toward haptic AR with the augmentation or modulation of real object stiffness. We took two steps for this goal. This topic can be categorized into artificial recreation and within-property augmentation. homogeneous dynamic material responses are assumed for real objects. are required. We summarize a series of our major results on stiffness modulation (Jeon and Choi 2008.g. stroking. 2009. such as tapping. 2010. or high stiffness material (e. brittle (e..4. which potentially leads to greater simplicity in application development. • Rendering includes estimation and compensation of real signals and merging of them with virtual signals. for example. clay). This preserves a crucial advantage of AR. glass). Our system requires a minimal amount of prior information such as the dynamics model of a real object. the geometric model of a real object. used for registration. • Sometimes models for registration and compensation.g. only models for the objects of interest. Our augmentation methods emphasize minimizing the need for prior knowledge and preprocessing. 2011. Our framework considers elastic objects with moderate stiffness for interaction.. Augmented Perception • Models for registration and compensation. In addition.2). Jeon and Harders 2012) in the following sections.239 Haptic Augmented Reality TABLE 10.

1) { } The reaction force fr(t) during contact can be decomposed into two orthogonal force components.4). deform the object surface and result in a reaction force fr(t).2 Stiffness Modulation in Single-Contact Interaction Suppose that a user indents a real object with a probing tool. Let the apparent stiffness of the object at time t be k(t).2) where frn (t ) is the result of object elasticity in the normal direction frt (t ) is the frictional tangential force . The goal of stiffness augmentation is to systematically change the stiffness that the user perceives k(t) to a desired stiffness k(t ) by providing virtual force to the user. as shown in Figure 10. M. PHANToM premium model 1. 141–146. fd(t). This is the stiffness that the user perceives when no additional virtual force is rendered. Inc. As shown in Figure 10. (Reprinted from Jeon.240 Fundamentals of Wearable Computers and Augmented Reality PHANToM Premium 1. such that fr ( t ) = − fh ( t ) + fd ( t ) .. (10.4  Haptic AR interface. pp. This allows the system to measure the reaction force from a real object that is equal to the sum of the force from the haptic interface and the force from the user’s hand.5: fr ( t ) = frn ( t ) + frt ( t ) ..1  Haptic AR Interface We constructed a haptic AR interface using two general impedance-type haptic interfaces (Geomagic. the force that the haptic device exerts to the tool. model Nano17) attached between the tool tip and the gimbal joints at the last link of the PHANToM. Extending haptic augmented reality: Modulating stiffness during two-point squeezing.) 10. (10.4. 2012. in Proceedings of the Haptics Symposium. With permission. and the force from the user’s hand.5). fh(t).5 NANO17 force sensor FIGURE 10. This makes the object deform.5 PHANToM Premium 1.4. The tool is instrumented with a 3D force/torque sensor (ATI Industrial Automation.5. each of which has a customdesigned tool for interaction with a real object (see Figure 10. and Harders. 10. S. two force components. and the user feels a reaction force.

First. We first identify the friction and deformation dynamics of a real object in a preprocessing step. as described in Mahvash and Okamura (2006). we carry out two preprocessing steps. 337.5  Variables for single-contact stiffness modulation. It also includes a velocity-dependent term to cope with viscous friction.241 Haptic Augmented Reality Original surface fr fr Deformed surface fh fd frt f rn pc un x p FIGURE 10. the force that a user should feel is: fh ( t ) = k ( t ) x(t )u n (t ). In Step 1. the force that the haptic device needs to exert is fd ( t ) = − fr ( t ) − k ( t ) x(t )u n (t ). 2011. Environ. With permission. S. The following section describes how we address these four steps. (2) measurement of the reaction force fr(t). Step 2 is also simply done with the force sensor attached to the probing tool. we use force sensor readings for contact detection since the entire ­geometry of the real environment is not available. If we denote the unit vector in the direction of frn (t ) by un(t) and the target modulation stiffness by k(t ).. S. Before augmentation. p(t). and (4) control of the device-rendered force fd(t) to produce the desired force fd (t ). and Choi.4) This equation indicates the tasks that a stiffness modulation algorithm has to do in every loop: (1) detection of the contact between the haptic tool and the real object for spatial and temporal registration. which represents the distance between the haptic interface tool position. and the original nondeformed position pc(t) of a contacted particle on the object surface. (10. The original Dahl model is transformed to an ­equivalent ­discrete-time difference equation. Virt. A collision is regarded to have occurred when forces sensed during interaction exceed a threshold. and use them later during rendering to estimate the known variables for merging real and virtual forces. The procedure for friction . the f­riction between the real object and the tool tip is identified using the Dahl friction model (Jeon and Choi 2011). See Jeon and Choi (2011) for details.) Let x(t) be the displacement caused by the elastic force component. Presence Teleop. (Reprinted from Jeon.3). (3) estimation of the direction un(t) and magnitude x(t) of the resulting deformation for stiffness augmentation. 20. as well as to compensate for the weight and dynamic effects of the tool.3) Using (10.. (10. Step 3 is the key process for stiffness modulation. we developed algorithms to suppress noise. To increase the accuracy. The details of this process are summarized in the following section.

242 Fundamentals of Wearable Computers and Augmented Reality identification adapts the divide-and-conquer strategy by performing identification separately for the presliding and the sliding regime. The data bins for the presliding regime are used to identify the parameters that define behavior at almost zero velocity. and reaction force magnitude are collected through repeated presses and releases of a deformable sample in the normal direction. First. The magnitude of frt (t ) is estimated using the identified Dahl model. which is then sent to the haptic AR interface. un(t) becomes: un (t ) = fr ( t ) − frt ( t ) fr ( t ) − frt ( t ) . normal force. which decouples the nonlinear identification problem to two linear problems.2 indicates that the response force fr(t) consists of two perpendicular force components: frn (t ) and frt (t ). which is found by projecting Δp(t) onto un(t−Δt) and subtracting it from Δp(t). The second preprocessing step is for identifying the deformation dynamics of the real object. Its direction is derived from the tangent vector at the current contact point p(t). The assumption of material homogeneity allows us to directly approximate it from the inverse of the Hunt–Crossley model identified previously. The data are passed to a recursive least-squares algorithm for an iterative estimation of the Hunt–Crossley model parameters (Haddadi and Hashtrudi-Zaad 2008). Equation 10. fd (t ) is calculated using (10.6) is frt ( t ). and then are divided into presliding and sliding bins according to the lateral displacement. using the estimated un(t) and x(t). while the others are used for Coulomb and viscous parameters. The next part is the estimation of x(t).4). the data triples consisting of displacement. velocity. and friction force are collected during manual stroking. In particular. the deformation direction un(t) and the magnitude of the deformation x(t) are estimated.5) m where k and b are stiffness and damping constants m is a constant exponent (usually 1 < m < 2) For identification. We use the Hunt–Crossley model (Hunt and Crossley 1975) to account for nonlinearity. (10. Finally. Data for lateral displacement. In Jeon and Choi (2011). the perceptual quality of modulated stiffness evaluated in a . the following computational process is executed in every ­haptic rendering frame. Since un(t) is the unit vector of frn (t ). The former is derived as follows. two variables.6) The known variable in (10. The model determines the response force magnitude given displacement x(t) and velocity x (t ) by ( f (t ) = k x (t ) ) m ( + b x (t ) ) x ( t ) . For rendering. we assessed the physical performance of each component and the perceptual performance of the final rendering result using various real samples. velocity. (10.

we moved to a more challenging scenario: stiffness modulation in two-contact squeezing (Jeon and Harders 2012). resulting in fr .8) Since the displacement and the force along the squeezing direction contribute to stiffness perception.* ( t ) .6. We also do not take inertial effects into account. weight forces fw.* ( t ) + fw.* (t).1 fw.* ( t ) . in Proceedings of the Haptics Symposium. (Reprinted from Jeon. (10.* (t) for modulation. Then. we assume that the object is fully lifted from the ground and the contact points do not change without slip. fw(t) in Figure 10.7) fr. 2012. pp. these three force components deform the object and make reaction force fr. At each contact point.* ( t ) + fw.2 fr fr.* (t) to hold and squeeze the object (* is either 1 or 2 depending on the contact point) and the haptic interfaces exert forces fd.6.1 fr.243 Haptic Augmented Reality psychophysical experiment showed that rendering errors were less than the human discriminability of stiffness. We developed new algorithms to provide stiffness augmentation while grasping and squeezing an object with two probing tools. (10.* (t) and a force component in squeezing direction fsqz.1 fw.3 Stiffness Modulation in Two-Contact Squeezing After confirming the potential of the concept.* (t) can be further decomposed to pure weight fw. M.* ( t ) . and Harders. 10.2 fd fh fw p2 pc.* ( t ) = fsqz.* ( t ) + fd . 141–146.* ( t ) = fh.* ( t ) + fd . an additional force due to the object weight.7) can be rewritten as fsqz.) .. In this system.9) x1u1 x2u2 pc. During lifting an object.2 l p1 fsqz. With permission. S. Extending haptic augmented reality: Modulating stiffness during two-point squeezing.1 fsqz. When the user applies forces fh. (10.6  Variables for two-contact stiffness modulation. the force component of interest is fsqz. This demonstrates that our system can provide perceptually convincing stiffness modulation.* (t): fr .* (t) are also present at the two contact points. (10.4.* ( t ) = fh.* (t) as shown in Figure 10.2 Deformed surface FIGURE 10. is involved in the system.

(10.13) ( ) Then. but also the weight.11) Here again. fsqz(t) can be calculated by subtracting the effect of the weight along l(t) from fr↓sqz(t): fsqz ( t ) = fr ↓sqz ( t ) − fw↓sqz ( t ) .11) indicates that we need to estimate the displacement x*(t) and the deformation direction u*(t) at each contact point.* (t) is determined as follows.244 Fundamentals of Wearable Computers and Augmented Reality To make the user feel the desired stiffness k(t ). (10.1 ( t ) ⋅ u l ( t ) .* ( t ) = fsqz.1(t) and fsqz.*(t) (= u*(t) = p1 (t )p2 (t ) or p2 (t )p1 (t ) . includes not only the two squeezing forces. Second.1 ( t ) + fw.*(t) and the tool tip positions p*(t).2 ( t ) = 0. the squeezing force at each contact point can be derived based on the first observation: fsqz.10) results in the virtual force for the haptic interfaces to render for the desired augmentation: fd . To this end. but the directions are the opposite (fsqz. fr↓ sqz (t ) = fr .* ( t ) − k ( t ) x* ( t ) u* ( t ) .2(t) are the same. fh. Thus. each squeezing force falls on the line connecting the two contact locations.10) where x*(t) represents the displacement along the squeezing direction and u*(t) is the unit vector toward the direction of that deformation.2 ( t ) = fw. (10. The known variables are the reaction forces fr.2 (t ) ⋅ u l (t ) .9) and (10. (10.2(t)). the following three observations about an object held in the steady state are utilized.1(t) = −fsqz. Third.12) where f w↓sqz(t) can be derived based on the third observation such that fw↓sqz ( t ) = fr .1 ( t ) = fsqz. The magnitude of fsqz.1 ( t ) + fr . The sum of the reaction forces along the l(t) direction. also see l(t) in Figure 10.5 fsqz ( t ) . The first second LLLLLLLLL I and LLLLLLLLL I observations provide the directions of fsqz.* ( t ) = k ( t ) x* ( t ) u* ( t ) . the total weight of the object is equal to the sum of the two reaction force vectors:  fr .1 (t ) ⋅ u l (t ) + fr .6).2 ( t ) . (10. fsqz.1 ( t ) + fr . Combining (10.*(t).14) . the magnitudes of the two squeezing forces fsqz. First. (10.

To our knowledge. such as the force vectors involved in the algorithm. The goal of the system is to render forces that give an illusion of a harder inclusion in the mock-up. x1(t) is equal to x2(t).5 APPLICATION: PALPATING VIRTUAL INCLUSION IN PHANTOM WITH TWO CONTACTS This section introduces an example of the applications of our stiffness modulation framework.2(t) in Figure 10. Let the distance between the two initial contact points on the non-deformed surface (pc. we also evaluated the system performance through simulations and a psychophysical experiment. We developed algorithms for rendering a stiffer inclusion in a physical tissue phantom during manipulations at more than one contact location. the evaluation indicated that our system can provide physically and perceptually sound stiffness augmentation. Steps for the estimation of the displacement x* (t) in (10.*(t) are the reaction forces from the real environment to which the system adds virtual force feedback fT.7 shows exemplar snapshots. weight (gray arrow). (10.1(t) and pc.8.11) are as follows.245 Haptic Augmented Reality FIGURE 10.8. Overall.15) ( ) where d(t) is p1 (t )p2 (t ) . It is constant over time due to the no-slip assumption. fR. We used the visual system to display information related to haptic augmentation. this is among the first system that can augment both visual and haptic sensations. In addition. Reaction force (dark gray arrow).*(t) stemming from the simulated tumor . respectively. and the displacements can be derived by x1 ( t ) = x2 ( t ) = 0. In Jeon and Harders (2012).6) be d0. The basic concept is depicted in Figure 10. Figure 10. the system has further been integrated with a visual AR framework (Harders et  al.11). and haptic device force (light gray arrow) are depicted.5 d0 − d ( t ) . All the unknown variables are now estimated and the final virtual force can be calculated using (10. 2009). Assuming homogeneity. taken from Jeon and Harders (2014). In Figure 10. Examples with increased stiffness (virtual forces oppose squeezing) and decreased stiffness (virtual forces assist squeezing) are shown on left and right. 10.7  Example snapshot of visuo-haptic augmentation.

NaturalPoint. IEEE Trans. and Harders. pT . 2014. First. the force f TE (t) can be decomposed into fR (t) and fT (t).16) Here. a contact dynamics model representing the pure response of the inclusion is identified using the data captured during palpating a physical mock-up. A two-step. p H ).. M. The final combined forces fH.1 fT. The hardware setup we used is the same as the one shown in Figure 10.8. x ) represents the magnitude of fR (t). We use the same identification procedure described in Section 10.2 fT. Then.8  Overall configuration of rendering stiffer inclusion in real mock-up. This gives us the state vector when palpating the tumor-embedded model ( xTE . S. This model is denoted by f = H NT ( x.* ( t ) . 1. we obtain the data from the inclusion-embedded sample by manually poking along a line from pTs to pT0 (see Figure 10.4.9 for the involved quantities). To this end. x ).* ( t ) = fR. x TE ) .1 fH. With permission.17) . Our approach is to extract the difference between the responses of a sample with a stiffer inclusion (inclusion-embedded) and a sample without it (no-inclusion).4.* ( t ) + fT .* (t) enable a user to feel augmented sensations of the stiffer inclusion. Haptics. Inc. we first identify the Hunt–Crossley model using the no-inclusion model.2 fR. given as fH . This time. (10. The first preprocessing step is for identifying the overall contact force resulting purely from an inclusion (inclusion-only case) as a function of the distance between the inclusion and the contact point.246 Fundamentals of Wearable Computers and Augmented Reality fH.* (t) in real-time. x ) and by computing differences using fT ( xTE . x TE ) = fTE − H NT ( xTE . another dynamics model is constructed to capture the movement characteristics of the inclusion in response to external forces. As depicted in Figure 10. x TE ) to H NT ( x. x TE . we also record the position changes of pT using a position tracking system (TrackIR. Both models are then used in rendering to determine fT.2. Then.1 fR.2 Silicone tissue mock up Virtual tumor FIGURE 10.*(t) is the key for creating a sound illusion. Since f = H NT ( x. 99. the magnitude of fT (t) can be obtained by passing all data pairs ( xTE . (10.) with the consideration of the mutual effects between the contacts. fTE . (Reprinted from Jeon. estimating and simulating fT.). measurement-based approach is taken to model the dynamic behavior of the inclusion. The procedures are detailed in the following paragraphs.

Therefore. and the initial distance from pTs to pT0 be l0.* to pH. Haptics.5. With permission. 99. we first scale the current indentation distance to match those during the recording: l* (t ) = (l0. IEEE Trans.9  Variables for inclusion model identification. vector triples (d. Gy (d y . and Gz (dz . d y ). S. Nonlinear changes of d(t) with respect to an external force fT (t) can be approximated using again the Hunt–Crossley model.* ( t ) = fT . In the second step. 10.* ( t ) − pT ( t ) | Equation 10.* (t) is from the inclusion position to the tool tip. Forces from multiple contacts deform the model as shown in Figure 10.. To this end. a new response model with respect to l(t) can be derived. denoted by Gx (d x .* ( t ) p H . which is denoted as HT (l.* (t )) l0 . (Reprinted from Jeon. After determining d(t) using a position tracker and fT (t) using our rendering algorithm described in the next subsection. poking into the direction of pT. l(t) = l0 −lHT (t). should be approximated during the rendering. (10.* (t) and pT (t). This represents the inclusion-only force response at the single contact point pTs.247 Haptic Augmented Reality Original surface pTs l0 Deformed surface lHT Displaced tumor pH Tool tip pT0 Initial tumor pT d FIGURE 10. The force causing this movement is the same as the inclusion response at the user’s hand fT. 1. becomes a relative displacement toward the inclusion.* − lHT .10 and displace the contact point from pHs. l). the direction of fT. Then. such that fT . fT ) are employed to identify three Hunt–Crossley models for the three Cartesian directions.* (t) is derived based on HT. d z ). 2014.18) | p H . M.* (t) and the inclusion from pT0 to pT (t). the inclusion movement in response to external forces is characterized. and Harders. fT ).* (t) in (10. f T. (10.* .1  Rendering Algorithm Rendering begins with making a contact with the no-inclusion model. By using the data triples (l.19) l0. Let the distance from pH(t) to pT (t) be lHT (t). d .18 indicates that the unknown values. the difference. l. f T.) f T (t) can be expressed as a function of the distance between the inclusion and the tool tip.15).* ( t ) − pT ( t ) . d x ).

f T.*(t) is approximated by fT . and then eventually pT (t). Overall. we compared the simulation results of our algorithm with actual measurement data recorded from eight different real mock-ups via various interaction methods. Taking the inverse of Gx. The specific goal of this work is to alter real friction force freal(t) such that a user perceives a modulated friction force ftarg(t) that mimics the response of a surface made of a different desired target material when the user strokes the real surface .y. Finally.1 lHT.z allows us to approximate d(t) by ∑ 1/ m n � � fT .*(t) is determined using (10.2 pT0 d pT Tool tip 2 Initial tumor Deformed surface FIGURE 10. fT.10  Variables for inclusion augmentation rendering. Haptics. IEEE Trans.248 Fundamentals of Wearable Computers and Augmented Reality l0.1 Tool tip 1 Displaced tumor pH. and Harders.. In Jeon and Harders (2014). (10.*pT with respect to the reference deformation. which is directly sent to the haptic AR interface. the force simulation errors were less than the force perception thresholds in most cases.) Then.i (t ) � � *=1 di ( t ) = � � � k + b d (t ) � � � i = x.* (t ) = HT (l* (t ). We take a similar approach for the update of d(t). we introduce simple and effective algorithms for estimating and altering inherent friction between a tool tip and a surface to desired friction. With permission. 2014. 2011). S.1 pTs pHs. 1. Here.*. (Reprinted from Jeon. 99. y. 10.18).1 Original surface l0 pH.6  FRICTION MODULATION Our next target was the modulation of surface friction (Jeon et al. inclusion movements and the mutual effects between contacts are captured and simulated with reasonable accuracy. l* (t )). z.20) where n is the number of contact points m is the exponential parameter in the Hunt–Crossley model Finally. We also use the same hardware setup for friction modulation. M. we can obtain a linearly-normalized indentation length along p H .

249 Haptic Augmented Reality Real friction freal Virtual modulation friction fmod Target friction ftarg FIGURE 10. The model is then used to calculate ftarg(t) using the tool tip position and velocity and the normal contact force during augmented rendering. Such objects show much more complicated deformation and friction . 2011. S. in Proceedings of the IEEE World Haptics Conference (WHC). the task reduces to: (1) simulation of the desired friction response ftarg(t) and (2) measurement of the real friction force freal(t). We tested the accuracy of our friction identification and modulation algorithms with four distinctive surfaces (Jeon et al. The modulation force is sent to the haptic interface for force control. freal(t) can be easily derived from force sensor readings after a noise reduction process. For the Dahl model parameter identification.7  OPEN RESEARCH TOPICS In spite of our and other groups’ endeavor for haptic AR. For the simulation of the desired friction force ftarg(t) during rendering. Given the real friction and the target friction.11. As illustrated in Figure 10. awaiting persistent research on many intriguing and challenging topics. a user repeatedly strokes the target surface with the probe tip attached to the PHANToM. we identify the modified Dahl model describing the friction of a target surface. meaning that the material characteristics of the objects are identical regardless of contact point.) with a tool. Extensions to haptic augmented reality: Modulating friction and weight.2. The identification procedure is the same as that given in Section 10. 227–232. this field is still young and immature.20).21) Thus. this is done by adding a modulation friction force f mod(t) to the real friction force: fmod ( t ) = ftarg ( t ) − freal ( t ) . the friction was modulated to a target surface without perceptually significant errors. However. 2011). our work regarding stiffness modulation has focused on homogeneous soft real objects. The results showed that regardless of the base surface.11  Variables for friction modulation. most natural deformable objects exhibit inhomogeneity. et al. For instance. the appropriate modulation force that needs to be rendered by the device is finally computed using (10.. With permission.4. (10. pp. (Reprinted from Jeon. 10.

with an initial result that allows the user to model the shape of a soft object using a haptic interface without the need for other devices (Yim and Choi 2012). as long as deformation is properly handled. 10. Temperature modulation is likely to be more challenging. This functionality requires very complicated haptic interfaces that provide multiple. 2000). texture. Lang and Andrews (2011). (2011). As such. This has been one direction of our research. and Romano and Kuchenbecker (2011) for various models. Such extension will enlarge the application area of haptic AR by the great extent. This functionality can greatly improve the realism of AR applications. This must be extended to those which allow for the use of bare hands. 2007) for a review of texture perception relevant to haptic rendering. All of these studies pertained to haptic VR rendering. especially due to the difficulty of integrating a temperature display to the fingertip that touches real objects. 2005). and temperature. the work of Kuchenbecker and her colleagues has the best feasibility for application to haptic AR. 2014). Friction is expected to be relatively easier in both modeling and rendering for haptic AR. we need methods to augment friction. functional. a great amount of research has been devoted to haptic perception and rendering of surface texture. Guruswamy et  al. we moved to . palpation training on a real phantom that includes virtual organs and lumps. for example.250 Fundamentals of Wearable Computers and Augmented Reality behaviors. independent forces with a very large degrees of freedom (see Barbagli et  al. The last critical topic we wish to mention is texture. and friction play an important role (Hollins et al. as well as very sophisticated deformable body rendering algorithms that take into account the interplay between multiple contacts. all of surface microgeometry and material’s elasticity. 1993. Campion and Hayward (2007) for passive rendering of virtual textures. Texture is also one of the most complex issues because of the multiple perceptual dimensions involved in texture perception. and Fritz and Barner (1996). Among these. 2005. viscosity. Another important topic is that for multi-finger interaction. To this end. We first outlined the conceptual. Then. and approaches that are based on more in-depth contact mechanics are necessary for appropriate modeling and augmentation. See Choi and Tan (2004a.8 CONCLUSIONS This chapter overviewed the emerging AR paradigm for the sense of touch. and technical aspects of this new paradigm with three taxonomies and thorough review of existing literature. we have begun to examine the feasibility of sensing not only contact force but also contact pressure in a compact device and its utility for haptic AR (Kim et al. Our work has used a handheld tool for the exploration of real objects. they have developed a high-quality texture rendering system that overlays artificial vibrations on a touchscreen to deliver the textures of real samples (Romano and Kuchenbecker 2011) and an open database of textures (Culbertson et al. Research effort on this topic is still ongoing even for haptic VR.b. This research can be a cornerstone for the modeling and augmentation of real textures. Regarding material properties. or at least very similar cases such as thin thimbles enclosing fingertips. Texture is one of the most salient material properties and determines the identifying tactual characteristics of an object (Katz 1925). 2014).

-Y. Haptic augmented reality taxonomy: Haptic enhancing and enhanced haptics. and K. High-fidelity visuohaptic interaction with virtual objects in multi-modal AR systems. G. 2798–2805. Proceedings of the IEEE and ACM International Symposium on Mixed and Augmented Reality. and C. Japan. Bayart. S. S. Thukral. 169–168. Lecture Notes in Computer Science (EuroHaptics 2008) 5024:776–785. S. M. F. October 29. pp. Kheddar. B. Bayart. Lecture Notes on Computer Science (Virtual Reality.. Knoerlein. R. 2005.Haptic Augmented Reality 251 recent attempts for realizing haptic AR. Lastly. 1–2. S. where hardware and algorithms for augmenting the stiffness and friction of a real object were detailed.. Gunn. Stevens. Adcock. pp. 2003. D. 2006a. A. Azuma. and B. Kalra. Taipei. 1067–1068. 1992–November 1 1992. Kheddar. and S. Proceedings of Augmented Reality Toolkit Workshop. R. Paris. Behringer. and M. A. C. Volz.. C. Abbott.. Drif. and H. Tokyo. Okamura. B. 2001. M. Harders. 2006b. A. and R. . Thrun. In Robotics Research. REFERENCES Abbott. B. B. International Journal of Robotics Research 24 (9):703–716. Y. 2005. 49–64. 187– 196. where stiffer virtual inclusions are rendered in a real tissue mock-up. We hope that our endeavor introduced in this chapter will pave the way to more diverse and mature researches in the exciting field of haptic AR. USA. A. Hutchins. 2007. A multirate approach to haptic interaction with deformable objects single and multipoint contacts. Santa Barbara. Virtual fixture architectures for telemanipulation. Billinghurst. Prattichizzo. Okamura. Proceedings of the IEEE International Conference on Robotics and Automation. Baillot. Presence: Teleoperators and Virtual Environments 14 (6):677–696. and A. Poupyrev. B. Knoerlein. P. IEEE Computer Graphics & Applications 21 (6):34–47. Szekely. pp. G. 2006. France. K. Evaluation of a haptic mixed reality system for interactions with a virtual control panel. and B. we elucidate several challenges and future research topics in this research area. Szekely. S. These frameworks are applied to medical training of palpation. Feiner. and I. J. Recent advances in augmented reality. and A.. Durrant-Whyte. Kheddar. pp. Jung. Marayong. H. Paris. R. J. Proceedings of EuroHaptics... 641–644. Harders.. B. Sood. France. Springer-Verlag: Berlin. Taiwan. Force feedback virtual painting on real objects: A paradigm of augmented reality haptics. Presence: Teleoperators and Virtual Environments 15 (4):419–437. Borst. and J. 2007.. IEEE Computer Graphics & Applications 21 (3):6–8. Didier. 2001. Bianchi. G. 1992. Tremor compensation for robotics assisted microsurgery. France. Engineering in Medicine and Biology Society. 14th Annual International Conference of the IEEE. G. eds. pp. Bayart. 2006. Didier. Barbagli. and A. MacIntyre. 2008. The MagicBook—Moving seamlessly between reality and virtuality. Salisbury. Germany. Haptic virtual fixtures for robot-assisted manipulation.. Paris. Proceedings of EuroHaptics. and A. A. pp. E. K. Augmented reality haptics: Using ARToolKit for display of haptic applications. Visuo-haptic blending applied to a tele-touch-diagnosis application. J. Julier. Bennett. Y. HCII 2007) 4563: 617–626. Kato. M. 1992. The effect that the visual and haptic problems associated with touching a projection augmented model have on object-presence. 2003. and M. Anand. Brooks. High precision augmented reality haptics. W. Bose. Guha. Bianchi.

II. Tan. Transportation Research Part F: Traffic Psychology and Behaviour 21:231–241. S. and H. 121–122. Lecture Notes in Computer Science (Edutainment 2007) 4469:152–161. T. Tan. and T. G. Effects of collision detection algorithm. Y. Tan.. Hachisu. Feel who’s talking: Using tactons for mobile phone alerts. Y. Grosshauser. 1996. J. 2004b. 2008. I. Ryu. Kumar. I. interaction and display: A review of ten years of ISMAR. M. Houston.. Lecture Notes in Computer Science (ICAT 2006) 4282:207–216. Presence: Teleoperators and Virtual Environments 14 (4):463–481. TX. pp. Woo. J.. P. Toward realistic haptic rendering of surface textures. Chun. Augmentation of material property by modulating vibration resulting from tapping. Kuchenbecker. 2005. J. pp. P. 310–312. Active click: Tactile feedback for touch panels. 2004. and G. Z. and T. Stochastic models for haptic texture. Seo. 2007. IEEE Transactions on Instrumentation and Measurement 60 (1):93–103. J. Augmented. Feng.-S. S. Montréal. Presence: Teleoperators and Virtual Environment 15 (5):570–587. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. S. and H.. pp.. L. J. Perceived instability of virtual haptic texture. Campion. Brown. Usability test of immersion for augmented reality based product design. J. Park. . Duh. G. 2007. and L. J. Hoogen. Hermann. Fukushima. Kajimoto. 2001. On the synthesis of haptic textures. Choi.. IEEE Transactions on Robotics 24 (3):527–536. and S. M. G. and M. 2012.. W. J. pp. S. Proceedings of International Conference on Advanced Electronic Systems. and H. Canada. Woo. Proceeding of the Annual SIGCHI Conference on Human Factors in Computing Systems. and T. H. V. Delgado. MA. Gerling. Augmented haptics—An interactive feedback system for musicians. R. IIR filter models of haptic vibration textures. and K. Physical interaction with a virtual knee joint—The 9 DOF haptic display of the Munich knee joint simulator. IEEE Computer Graphics & Applications (Special Issue on Haptic Rendering—Beyond Visual Computing) 24 (2):40–47. Efficacy of haptic blind spot warnings applied through a steering wheel or a seatbelt. Human Factors 47 (3):670–681. Chang. 604–609. 319–325. and W. Presence: Teleoperators and Virtual Environments 16 (3):263–278. Cambridge. Gopal. E. 2007. Perceived instability of haptic virtual texture. pp. 2009. Tremor acquisition and reduction for robotic surgical applications. 2011. Proceedings of the IEEE Haptics Symposium. Choi. L. Z. Kim. pp.252 Fundamentals of Wearable Computers and Augmented Reality Brewster. 193–202. III. Thomas. Kaaresoja. Fukumoto. and K. pp. Tan. pulsating tactile feedback facilitates simulator training of clinical breast examinations. WA. Culbertson. Guruswamy. and A. Dunedin. India. One hundred data-driven haptic texture models and open-source methods for rendering on 3D objects. Sugimura. Lecture Notes in Computer Science (EuroHaptics 2012) 7282:173–180. Riener. Lee. S. Enhancing immersiveness in AR-based product design. H. Lecture Notes in Computer Science (HAID 2012) 5763:100–108. L. Fritz. Billinghurst. Choi. 2006. Choi. J. Burgkart. H. Sato. S. 34–44. Frey. Experimental studies. Trends in augmented reality tracking. B. L. 2006. M. M. Lang. S. Perceived instability of virtual haptic texture. Proceedings of SPIE’s International Symposium on Intelligent Systems and Advanced Manufacturing— Telemanipulator and Telepresence Technologies III.. Z. and R. New Zealand. 2013. Brown. and H. S. Presence: Teleoperators and Virtual Environment 13 (4):395–415. Kumar. J. Z. Hayward. 15–23. 2013. 2006. T. Ha. Seattle. and V. Han. Ha. Effect of update rate. UK. Tactons: Structured tactile messages for non-visual information display. 2005. Pilani. Barner. Proceedings of the IEEE/ACM International Symposium of Mixed and Augmented Reality. T. Z. M. Choi. Proceedings of the Australasian User Interface Conference. S. T. and W. and W. Boston.. Lee. 2014. and H. 2004a. Bachhal.

J. M. Chicago. IEEE Transactions on Visualization and Computer Graphics 15 (1):138–149. A. Z. Bianchi.. Jeon. Proceedings of the Haptics Symposium. Qiu. Proceedings of the IEEE International Conference on Intelligent Robots and Systems. S. R.. 273–280. Choi. Jeon. B. Proceedings of the ACM Symposium on Immersive Projection Technology. Stiffness modulation for haptic augmented reality: Extension to 3D interaction. Kim. Modulating real object stiffness for haptic augmented reality.-C. S. D. 2000. Cha. R. Calibration. Kim. P. pp. Choi. IEEE Transactions on Haptics 5 (1):77–84. and M. Fuchs. Perceptual dimensions of tactile surfaced texture: A multidimensional scaling analysis.. Hollins. and D. ed. 2006. J. Hugues. pp. 981–986. CA. N.. and W. Hillsdale. 2011. K. and G. 1993. pp. S. Project FEELEX: Adding haptic surface to graphics. Los Angeles. Kajimoto. Johnson. Kim. and M. A novel test-bed for immersive and interactive broadcasting production using augmented reality and haptics. Jeon.-Y. S. Jeon. Data-driven haptic rendering-from viscous fluids to visco-elastic solids. S. 2004. Jeon. 2012. Canada. SmartTouch: Electric skin to touch the untouchable. Choi. G. 2014. pp. 2008. and K.. DeFanti. S. 2012. Choi. IL. Mahalik. and S. Kim. D. S. 469–476. S. Hunt. Proceedings of the Haptics Symposium. pp. S. Turkey. Hollins.. IEEE Computer Graphics & Applications 24 (1):36–43. Kawakami. H.. Tachi. Nannipieri. Inami. Waltham. Eom. J. Perception & Psychophysics 54:697–705. and S. D. S. and M. and S. IEICE Transactions on Information and Systems E89-D (1):106–110. Choi. and M. Jeon. 2000. Harders. Hashtrudi-Zaad. 2001. Contact force decomposition using tactile information for haptic augmented reality. Metzger. Rendering virtual tumors in real tissue mock-ups using haptic augmented reality. S. H. The World of Touch. Developing the PARIS: Using the CAVE to prototype a new VR display. M. J. 2008. and B. registration. 2012. Presence: Teleoperators and Virtual Environments 18 (5):387–408. 2011. S. Young. O. H. Vancouver. IEEE Transactions on Haptics 5 (1):14–20. Individual differences in perceptual space for tactile textures: Evidence from multidimensional scaling. 2009. 1242–1247. Kim. Knoerlein. Faldowski. Dawe. S. Harders. Jeon. 2011.. Choi.. R. 2014. F. 2009. 1925. and synchronization for high precision augmented reality haptics. New augmented reality taxonomy: Technologies and features of augmented environment. Harders. C. Istanbul. Thongrong.. Extending haptic augmented reality: Modulating stiffness during two-point squeezing. Lecture Notes on Computer Science (EuroHaptics 2008) 5024:609–618.. Rao. 227–232. Vibrotactile rendering for a traveling vibrotactile wave based on a haptic processor. J. and O. A. G. B. Harders. 141–146. Pape. 1975. S. G. pp. Young. Harders. . Haptic augmented reality: Taxonomy and an example of stiffness modulation. P. Katz. Proceedings of the IEEE/RSJ International Conference on Robots and Systems. T. MA. Haptic tumor augmentation: Exploring multi-point interaction. Proceedings of the IEEE World Haptics Conference (WHC). Sandin.. Yano. K.Haptic Augmented Reality 253 Haddadi. ASME Journal of Applied Mechanics 42:440–445. Springer-Verlag: Berlin. Ahn. Extensions to haptic augmented reality: Modulating friction and weight. A new method for online parameter estimation of hunt-crossley environment dynamic models. S. H. pp. Real stiffness augmentation for haptic augmented reality. M. Presence: Teleoperators and Virtual Environments 20 (4):337–370. 47–63. and M. G. and R. Nice. Chung. IEEE Transactions on Haptics 2:15–27. Iwata. S. Nakaizumi. NJ: Lawrence Erlbaum Associates. Karlof. Crossley. 2009. Proceedings of ACM SIGGRAPH. and S. Harders. Coefficient of restitution interpreted as damping in vibroimpact. Kosa. Szekely. Szekely. K. In Handbook of Augmented Reality. and J. Germany. France. Ryu. and F. M. Jeon. and S. IEEE Transactions on Haptics 99 (Preprints):1–1. Bensmäi. 2010. N. Kawamura. Perception & Psychophysics 62 (8):1534–1544. Hoever. P. Furht. and F. Choi. and F.

R. Proceedings of the IEEE International Conference on Robotics and Automation. 297–300. 2001. and R. P. Nagata. J. Minamizawa. Reality-based models for vibration feedback in virtual environments. and S. Proceedings of the International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI). FL. Japan Mitchell.. Vibrotactile guidance for drumming learning: Method and perceptual assessment. S. B.-U. 11–20. Kyoto. Diminished haptics: Towards digital transformation of real world textures. Germany. Haptic augmented reality interface using the real force response of an object. D. Lee. Dawe. 2007. and J. T. Mahvash. Nojima. Kazanzides. H. Sweden.254 Fundamentals of Wearable Computers and Augmented Reality Kurita. H. Kawakami.. pp. D.. 2007. C. Richmond. 83–86. Stockholm. 2005. O. Kim. Koo. Wearable haptic display to present gravity sensation. TX. The technical trend of the exoskeleton robot system for human power assistance. Measurement-based modeling of contact forces and textures for haptic rendering. K. FL. G. Japan. Proceedings of IEEE Haptics Symposium. Luciano. pp. Proceedings of the IEEE Haptics Symposium. W. Lecture Notes on Computer Science (Eurohaptics 2014. J. 2014. I. and M. Tactile effect design and evaluation for virtual buttons on a mobile device Touchscreen. and C. T. Scanning physical interaction behavior of 3D objects. Tachi. Li. Park. pp. Vancouver. Proceedings of International Conference on Human-Computer Interaction. Lee. J. International Journal of Precision Engineering and Manufacturing 13 (8):1491–1497. 3268–3273. K. Rome. Ikeda.. M. Y. Haptic feedback in mixed-reality environment. Friction compensation for a force-feedback telerobotic system. Taylor. J. I. Choi. 133–138.. Proceedings of the Annual Conference on ACM Computer Graphics and Interactive Techniques. J. Proceedings of the ACM Virtual Reality Software and Technology.. Sekiguchi. pp. A taxonomy of real and virtual world display integration. Proceedings of the World Haptics Conference. H. T.. M. Kim. pp. Tachi. pp. M. IEEE Transactions on Robotics 23 (1):4–19. Y. Iordachita. and K. Inami. and G. James. Tsukuba. Hager. A. Lloyd. J. Ubi-Pen: A haptic interface with texture and vibrotactile display. J. 2012. NV. 2009. Okamura. 2009. A. Los Angeles. K. 2011. Springer-Verlag: Berlin. Andrews.. 2007. Dennerlein. Ott. Rekimoto. T. P. H. Han. 2011. 2001. Italy. and F. Milgram. Guidance methods for bimanual timing tasks. and A. K. 2012a. IEEE/ASME Transactions on Mechatronics 6 (3):245–252. ed. 409–417. Tamura. 1999. and S.  Taylor. Handa. D. E. L. Choi. 87–96. pp. Tamaki. Pai. A. Lee. Proceedings of the IEEE Virtual Reality Conference. Kajimoto. Canada. CA. Kyung. Okamura. Yau. and S. M. R. Thalmann. N. Colquhoun. Lee. Houston.-Y. Hong. M. H. Jr. Hwang. L. 2007. Orlando. Development and application of a new steady-hand manipulator for retinal surgery. Joung. Design of the ImmersiveTouch™: A  high-performance haptic augmented virtual reality system. S. The Visual Computer: International Journal of Computer Graphics 23 (9):843–849.. 623–629. 2002. Han. IEEE Computer Graphics and Applications 29 (1):24–32. M. Lang. P. pp. The SmartTool: A system for augmented reality of haptics. 2014. and J. Hoshi. and H. Ogasawara. 1–16. and S. pp.. K. Banerjee. L. Spatial motion constraints using virtual fixtures generated by anatomy. Vexo. and R.. . M. Ochiai. 147–152. M. Orlando. Proceedings of the IEEE International Conference on Robotics and Automation. IEEE Transactions on Visualization and Computer Graphics 17 (3):380–391.. K. J. van den Doel. A. Ishii. and M. Cutkosky.. Florea. G. Kapoor. by Y. J. Sa. 67–72. Part I) LNCS 8618: pp. In Mixed Reality—Merging Real and Virtual Worlds. and S. Takasaki. pp. T. Las Vegas. 2006. Lang. and S. Choi. D.

Proceedings of the IEEE World Haptics Conference. J. S. pp. 2004. Ternes. Proceedings of the World Haptics Conference. Y. 2010. O’Malley. Proceedings of the IEEE Virtual Reality Annual International Symposium. and S. Haptics in augmented reality. S. Lecture Notes on Computer Science (EuroHaptics 2008) 5024:199–208. Japan. Yokokohji. . Brown. 3425–3431. 2008. 498–503. E. and A. pp. Proceedings of the IEEE Haptics Symposium. Romano. Studies in Health Technology and Informatics 142:244–246. Lecuyer. Dong. and J. Creating realistic virtual textures from contact acceleration data. N. Canada. and C. A. V. D. Virtual fixtures: Perceptual tools for telerobotic manipulation. Proceedings of the IEEE Conference on Biomedical Engineering and Sciences. Uchiyama. Choi. Lecture Notes on Computer Science (EuroHaptics 2008) 5024:589–598. Forrest. Ye. A. A mixed reality simulator for feline abdominal palpation training in veterinary medicine. 2015. Yao.. Presence: Teleoperators and Virtual Environments 8 (4):412–434. Communications of the ACM 47 (8):32–38. Zoran. 2011. 2011. pp. SoIanki. Tactile and multisensory spatial warning signals for drivers. Designing large sets of haptic icons with rhythm. Vallino. Using vibration patterns to provide impact position information in haptic manipulation of virtual objects. J. Visuo-haptic systems: Half-mirrors considered harmful. K. Proceedings of the Institution of Mechanical Engineers. The FreeD—A handheld digital milling device for craft and fabrication. A review of exoskeleton-type systems and their key technologies. and T.reachin. Ellis. D. C. and H. Haptic based augmented reality simulator for training clinical breast examination. and Y. Baillie. 3–4. Part C: Journal of Mechanical Engineering Science 222 (8):1599–1612. M. Florence. Proceedings of the IEEE International Conference on Multimedia Computing and Systems. E. http://www. and J. 1999. pp. C. pp. 479–484. Accessed March 4. 195–200. Corso. C.-Y. Scharver. Ho. L. and K. Chen. Johnson. Kanade. R. Proceedings of the ACM Symposium on User Interface Software and Technology. Canada. 2004.. VisHap: Augmented reality combining haptics and vision. Leigh. J. WYSIWYF display: A visual/haptic interface to virtual environment. Kuala Lumpur. pp. 76–82.sensegraphics. van Essen. Rosenberg. 3D-IW. and V. Sreng. 427– 432. 2004. Hayward. R. Zhang. 2015.. L. L. 2012. Vancouver. M. 2008. N. 292–297. and C. Reachin Technology. IEEE Transactions on Haptics 5 (2):109–119.. and R. Istanbul. and S. J. MacLean. Hollis. I. D. SenseGraphics. Accessed on March 4. Design and evaluation of hapticons for enriched instant messaging. and H. Y. Kuchenbecker. 1993. Reachin Display. Rovers. Designing cranial implants in a haptic augmented reality environment. Powell. Raja. Evenhouse. Paradiso. Andriot.. Lecture Notes on Computer Science (MICCAI) 3217:89–96. Proceedings of Eurohaptics. Efficacy of shared-control guidance paradigms for robot-mediated training. Proceedings of the IEEE International Conference on Systems. Okamura. 2009. R. Zhang. G. M. A. Tsukuba. Yim. Toronto. Spence. B. pp. G. A tactile magnification instrument for minimally invasive surgery. Man and Cybernetics.. H. Germany. and C. Italy. Sandor. J. A. 2007. IEEE Transactions on Haptics 1 (2):121–129. 2003. J. 1999. 2012. Yang. Washington. 2008. http://www. 2008. R. Shape modeling of soft real objects using force-feedback haptic interface. 265–269. and M.Haptic Augmented Reality 255 Parkes.. pp.. Turkey. Yamamoto. Munich. Malaysia. C. and K.


Section III Augmented Reality .


.................................... TV........... 1994)... I refer to experiences that superimpose or composite virtual content in 3D space directly over the real world.....11 Location-Based Mixed and Augmented Reality Storytelling Ronald Azuma CONTENTS 11.................... In the future..............4 Remembering... such as medical surgeries or the maintenance and assembly of equipment.. Later......... 269 11................................3 Reskinning............... In traditional media...................... whether those are places............ AR applications focused on professional usages that aided perception of a task that needed to be done in a complex 3D environment............. or objects.......... because in the early days of AR.......... in real time (Azuma................ By AR.................... 265 11....... access to MR and AR technologies became democratized through marker and image-based tracking via cameras attached to desktop and laptop computers..... entertainment..............................5 Conclusion................................. This focus was logical.......... This enabled almost everyone to run certain forms of MR and AR on devices that they already bought for other purposes besides AR..... and tablets. 259 11....2 Reinforcing.. the content is entirely virtual and disconnected from 259 . 272 References............... Initially..................... movies.. the equipment required was so specialized and expensive that only professional applications seemed economically viable.............................. and educational purposes............... radio.............................................. this chapter also includes a broader range of MR experiences that blend real and virtual in some manner but may not require precise alignment between the two (Milgram and Kishino..... smartphones........ 1997)................................. such as books........... these experiences will advance to the point of establishing new forms of media that rely upon the combination of real and virtual to tell stories in new and compelling ways.................... However...1 MOTIVATION One of the ultimate uses of mixed reality (MR) and augmented reality (AR) will be to enable new forms of storytelling that enable virtual content to be connected in meaningful ways to particular locations.............. and video games..... 274 11....... Today............... we see a variety of MR and AR applications that target the mass market for advertising.. people..............1 Motivation..... 261 11...............................

in the long run.260 Fundamentals of Wearable Computers and Augmented Reality the location where the content is experienced. requiring storytellers to also become familiar with the capabilities and limitations of the underlying technologies. Harrigan and Wardrip-Fruin (2007). and performance art. Therefore. the importance of storytelling runs much deeper than that. in different ways than traditional media. Stories can contain lessons. or providing entertainment or an artistic experience. (2009). with new advantages and disadvantages compared against established media. I discuss both a fundamental challenge limiting this new type of media. pervasive games. partially because AR and MR technologies are still evolving and also because the creators build or adapt custom systems to fit their particular experiences. This chapter also focuses on location-based storytelling experiences that generally occur outside a participant’s home and. The reason this chapter does not cover AR books is my hypothesis about what will make AR storytelling compelling: the combination of real and virtual must be meaningful and powerful. crossmedia and trans-media experiences. This lack of standard platforms increases the challenge of telling stories with AR and MR technologies. While I attempt to discuss a representative sample of approaches in  this area. There is no single platform or system that dominates. one of the ultimate uses of MR and AR technologies will be as a new form of location-based media that enables new storytelling experiences. While almost everyone enjoys good stories as a form of entertainment. The bulk of this chapter provides an overview of various experiences or concepts that provide a glimmer of the potential inherent here. so it does not cover all previous work in this field. along with what an ultimate payoff might be. I provide a hypothesis of why this might be a powerful new form of media. puzzle hunts. solving a puzzle. operate only at specific sites. A commercial example of an AR book is Anomaly (Brittenham and Haberlin. and any advancements in media technology that enable people to tell stories in new and potentially more compelling ways can have profound impact. MR and AR storytelling experiences will offer new ways to tell stories. A wide variety of platforms and systems run the AR and MR storytelling experiences that I cover in this chapter. Telling a story is an important method of education and instruction. 2012). I firmly believe that. where the core of the experience requires both the real . please see Benford and Giannachi (2011). A system that runs inside the controlled environment of a museum can be very different from mobile systems that bike riders carry with them as they ride through a city. codified bits of wisdom that are passed on in a memorable and enjoyable form. This chapter focuses on MR and AR systems that explicitly attempt to tell a story to the participants. and specify three approaches for achieving this potential. Technological developments that make the story clearer and more memorable can aid retention and understanding. The goal of this chapter is to discuss location-based MR and AR storytelling. and Montola et al. in many cases. Storytelling is fundamentally important. For a broader discussion of these other types of experiences. To conclude the chapter. MR and AR storytelling is a particular subset of a much broader area of locationbased experiences that include ARGs (alternate reality games). rather than ones where the focus is on playing a game. I do not cover the case of augmenting a real book with 3D virtual content to supplement the story that exists in the traditional book. this is not a comprehensive survey.

Gettysburg is also a real location. on the actual spot where the event happened. but reality is not a core part of the content. Many AR experiences fall into the latter case. you can travel there and see the battlefield yourself. or poster. called Gettysburg. superb cinematography. with little contributed by the augmentations. on July 3 and 4. Therefore. stone fences. and perhaps part of the user interface. And if you do this. Because the Union won the American Civil War. Pickett’s Charge was the culmination of the Battle of Gettysburg. whether that is an object. If you are so inclined. 1863. is a powerful experience. if the core of the experience comes solely from virtual content. the AR augmentations attempt to complement the power that is inherent in reality itself to form a new type of experience that is more compelling than either the virtual content or reality by themselves. or location. A student wishing to learn about that battle could watch a 1993 movie. and parts of it were even filmed on the site of the battle. You will not see any reenactments of the battle or other virtual recreations that take you back to that fateful time in 1863. Let me provide a conceptual example. then simply being there. you will see large grassy fields. DVD cases. if you know why that spot is important in American history. then there is no point in using AR. In the case of books. and movie posters. what is compelling about reality is not the book. Reality then becomes a backdrop that forms the context of the experience. and many monuments. then the augmentation part is only a novelty and it will not be a viable new form of media. the AR storytelling strategy is to select a real environment. It is the virtual content represented or embodied by those objects. indivisible country rather than separated into two . These two events are generally considered to be the turning point of that war. an experience that augments a book or movie poster with virtual augmentations derives all its power from purely virtual content. Yet. not the physical objects. person. Conversely. (For the reader unfamiliar with the American Civil War. and feeling overcome by emotion. in person.Location-Based Mixed and Augmented Reality Storytelling 261 and virtual components. If the experience is based on reality by itself. 11. However. thousands of Civil War reenactors. that is inherently compelling by itself.2 REINFORCING In reinforcing. The compelling content resides in the ideas in the books and the movies themselves. I remember standing at the spot of Pickett’s Charge. a compelling soundtrack. Then. I discuss these three in more detail and describe examples and concepts of each approach. DVD. without augmentation. the United States became one unified. Let’s assume the goal of an experience is to educate a participant about the Battle of Gettysburg. I hypothesize that there are at least three approaches for AR storytelling where both real and virtual form critical parts of the experience: • Reinforcing • Reskinning • Remembering In the next three sections. on the Union side. The Confederates lost both Gettysburg and Vicksburg. which had star actors.

262 Fundamentals of Wearable Computers and Augmented Reality or more countries. And the story concluded by stating that his brother was now forever at home. as if sketched against the skyline with a grease pencil. emplaced in the actual location where the battle took place. he would never be lost because they told him which direction was south. Even on mobile phones. perhaps nearly photorealistic representation of the Twin Towers. it invites the participant to not only capture and upload that augmented image. The first was the choice of how to render the buildings. against the New York City skyline (Figure 11. you will see many stories that participants have chosen to record and share. One that I have never forgotten tells how the author grew up in Manhattan. If you are near or in Manhattan. This experience runs on a mobile phone. The second aspect is that after this application augments Manhattan with the outline of the towers. Why did the participant choose to take this picture? What does it mean to the photographer? If you go to the 110 Stories website. which was designed by Brian August (August. it renders only the outline of the buildings. The virtual content embodied by the film Gettysburg. My favorite example of this strategy is 110 Stories. the application uses the compass and tilt sensors in the phone to render an outline of where the Twin Towers in the World Trade Center should be. we have graphics hardware that could render a detailed. Instead. While this reduces the realism.) An AR storytelling experience that was located on the battlefield itself might be able to draw from the best of both real and virtual to provide a new type of compelling experience.1). but to also submit a brief story. . And they are supposed to be there. They would always guide him home. and that he missed him. there are two design decisions that make 110 Stories particularly poignant. But the application does not do that. and his brother told him that as long as he could see the Twin Towers. because it matches the message that this experience tries to send: That the towers are no longer there. FIGURE 11. could be more powerful than experiencing either independently. To me. 110Stories—What’s Your Story?). I believe this makes the experience more effective. with the implication being that his brother was in the Twin Towers on September 11.1  110 Stories.

Their experiences told stories about the history of the Columbia University campus. where another person selected the content to be played based upon the participant’s context and actions on the control device. Dow Day uses the augmented reality and interactive storytelling (ARIS) platform. Because this is a historic cemetery. participants experience stories about people who are buried in Oakland Cemetery. The creators describe the historic locations as imbued with an aura that can make such experiences compelling. In Columbia’s example. As a final example. web pages. The Westwood Experience by Nokia . The Streetmuseum application enables images or videos to appear to be augmented over real backgrounds in the participant’s surrounding environment. I will discuss The Westwood Experience. and other media describing past events (Höllerer et al. At Columbia University. in which MR techniques were used to enable a broad range of virtual content to be seen in the context of the real locations where events actually happened. a location-based MR experience that I worked on (Azuma. While set up as a game that requires participants to learn how journalists perform their job. the relatives of the occupants. describing a student revolt. Londinium App). 1999). In April 2011.Location-Based Mixed and Augmented Reality Storytelling 263 In the Voices of Oakland experience. they released a mobile application that uses Layar as the platform for augmenting a user’s view of the city with historical photos aligned with the real background (PhillyHistory. Dow Day is an AR experience in which participants role play journalists in the year 1967. A similar effort was started by PhillyHistory. and imagery based on archaeological finds from that time period (Meyers.. the creators could not modify anything about the environment. but with options to explore additional content if the participant chooses. Hayes (2011) provides further information about these and other commercial projects. video. Steven Feiner’s group developed a technique called Situated Documentaries. Professional actors provided the narration of the character voices. Due to limitations in tracking. Commercial versions of the Situated Documentaries technique now exist. 2011). they built a location-based experience where a participant would walk around the Columbia campus to see audio. videos. investigating student protests at the University of Wisconsin–Madison against the Dow Chemical company for manufacturing napalm used in the Vietnam War (Squire et al.. 2011). images. 3D augmentations. where these virtual materials were tied to the real locations where the events happened. a Wizard of Oz approach was taken. they took care to develop an experience that was appropriately serious in tone and respectful of the cemetery. and how one campus building was the former location of the Bloomingdale Asylum. he or she can experience audio. As the participant walks to designated sites in the city. The virtual augmentations consisted of audio played when the participants were at the correct spots. The core experience is a singular linear story that takes participants along a designated path through the cemetery. nor add markers or other elements to aid tracking. and all other stakeholders. it also serves to engage them in learning about this specific historical era and events through personal stories and testimonies. 2007).org. the oldest cemetery in Atlanta (Dow et al. a repository of historical images of Philadelphia.. 2005). The Streetmuseum Londinium app from the Museum of London provides a historical view of London in Roman times (Museum of London. how students used tunnels to occupy buildings that were guarded by police.

The emotions that participants reported feeling at this spot were different than if they simply read a tourist guidebook and then visited the cemetery. . the participants assembled in a theater where they met an actor portraying the honorary Mayor of Westwood. First. many of their names once as familiar as hers. They will meet her.. They were given mobile phones and earphones and left the theater on their own. they experienced effects that visualized the town in the year 1949. and the reason why this experience was located in Westwood. In brief. The woman he talked about was a real person. better known to the world by her other name: Marilyn Monroe. the building where they spent the night together. loved. this experience surprised most of the participants by ending in a cemetery. In certain sequences from the newsreel. Wither et al. In contrast to Voices of Oakland. and lost. a movie star… In the end. which the participants knew would be in a cemetery. a jewelry store where he bought a ring. Then they heard a story of a striking young woman that the protagonist met. some who lived their lives as publicly as she did. meaningful and one-to-one correlation between the crypts they see in the surrounding environment and the footage that plays on the mobile device. the participants can see a clear. nor even a 3D virtual model augmented in space. We used a Situated Documentaries technique at this spot.264 Fundamentals of Wearable Computers and Augmented Reality Research Center Hollywood. comes at the end. CA. particularly because this one is small and hidden away behind numerous high-rise buildings. Combined with the somber music that was composed specifically for this spot. and the way they will meet her is not by seeing a video or image of her on their mobile phones. They were guided to a specific crypt. I come to see her every so often. 2010). the real person. The payoff. a place of real people and real endings. The power comes when the participants realize how this is going to happen. the effect was a powerful and poignant coda to this experience. they were left to contemplate her life before becoming a movie star. She’s between engagements now. I’m taking you to her now. guided by clues to specific points in the town. By experiencing a story about her prior to visiting her crypt. as many others do. by visiting Westwood Village Memorial Park Cemetery. where they discover that the woman in the story was Norma Jeane Baker. and to wonder if the story they just experienced might have been real. ‘resting’ as actors sometime say. showing newsreel footage of her funeral. I know you’ll be mindful of the customs proper to the place we find her. linear story that the participants experienced as they walked the streets of Westwood. when our protagonist informs us that now he wants us to meet the woman he just told us about in his story. It was an experiment testing a variety of MR effects to enhance a location-based. She’s not alone. attempting to turn the clock back to that time period. that is south of the UCLA campus. The narration at this point sets the expectation of how the power of this place will affect their thoughts and behaviors: She became what she said she would. but among many others. Norma came back to Westwood too. and the last spot where he saw her as she disappeared in a taxi. We conducted this experience in December 2009 and January 2010 in a part of Westwood. They experienced this story at the locations where these events were supposed to have happened: at the café where they met.

there may not be anything particularly special or evocative about the real location. In The Westwood Experience. the strategy is to remake reality to suit the purposes of the story you wish to tell. and he wrote the story to incorporate real elements in the town. many different experiences might be built for different locations around the world. In this book. such as a jewelry store.. A specific experience does not scale. Reinforcing as a strategy has strengths and weaknesses. Reality is either something that the creators specifically set up and then augment. we walked around Westwood with the writer. Rainbows End is a Hugo-award winning science fiction book written by Vinge (2006) that provides one ultimate concept of reskinning. or reward the participant for finding locations that work well for the experience. nearly perfect AR is ubiquitously available to people who can operate the latest wearable computing systems. the experience is tied to a specific location. A story that was disrespectful of that reality or that provided experiences inappropriate to that location would at best fail to harness the power of that real place. the real world does some of the work of providing the meaningful experience. which means experiences based on reskinning can potentially scale to operate in most arbitrary real locations. the experience does not rely solely upon the virtual content by itself. Unlike reinforcing. or the experience is designed to recharacterize whatever real surroundings exist. the story must complement the reality that exists at the chosen site. it cannot be experienced at any arbitrary location. most of the power from the experience must now come from the virtual content and how it adapts and exploits the real world to fit that virtual content. And we were very aware that our experience ended in a real cemetery.Location-Based Mixed and Augmented Reality Storytelling 265 Many AR storytelling experiences that rely on the reinforcing strategy use the technique of connecting the story to the past. the story must be tied to the characteristics of the real location. 2002) was an early effort to develop a platform for augmenting archaeological sites. and at worst would be offensive. Each Belief Circle has a particular theme. When a user chooses to . Reinforcing requires the story to appropriately complement reality. One cannot tell any arbitrary story and expect reinforcing to work. Since reality itself is compelling on its own. I believe there are examples demonstrating that this strategy can succeed. However. In 110 Stories and The Westwood Experience. instead.3 RESKINNING In reskinning. On the positive side. On the negative side. It may be easier to design and build virtual content that complements reality rather than virtual content that must shoulder the entire burden of being compelling by itself. there are Belief Circles. However. A person wishing to participate in 110 Stories must travel to Manhattan. Being able to increase the range of stories that can be told through new AR and MR effects will require advancements in our ability to track and augment historic outdoor environments. Within this world. 11. which are persistent virtual worlds that are linked to and overlaid upon real locations. Furthermore. The Archeoguide project (Vlahakis et al. but rather only the one it was designed for. such as a fantasy world set in medieval times. which use displays embedded in contact lenses and tracking provided by a vast infrastructure of smart motes that permeate almost all inhabited locations.

2). The evaluation did not directly attempt to measure whether AR Façade was more engaging or compelling than Façade by itself. so there are few examples of reskinning.2  The real set of the apartment in AR Façade. The experience is not a linear story. directly. the creators can receive micropayments. Façade supports free text entry so that the participant can type in anything to converse with the two virtual characters while walking around freely in the virtual environment. . and those often rely upon real environments that were specially created to support the needs of the story. However. and people on bicycles might instead appear to be knights on horseback. in which they built a real set that replicated the apartment that is the setting of this experience (Figure 11. Depending on what the participant does or says. in a medieval Belief Circle. to our real world and uses the principle of reskinning to change reality to fit the needs of the virtual content and experience. Façade by itself is a virtual environment that runs on a PC and monitor. instead of typing in what they would say. he or she sees the surrounding world changed to fit the theme. For example. called AR Façade. Some chose to quit early rather FIGURE 11. 2007). and participants wore a wearable AR system and walked around the real set to see virtual representations of the couple (Dow et al. one to one. Rather than relying upon voice recognition. we do not currently have ubiquitous tracking and sensing. human operators working behind the scenes then typed in what the participants said into the system. various story beats are triggered and experienced. Unlike the world of Rainbows End.266 Fundamentals of Wearable Computers and Augmented Reality subscribe to a Belief Circle. and when others view that content. For example. there was evidence that AR Façade did affect some participants emotionally.. walking up to or commenting on a particular object or picture will trigger certain narrative sequences. The goal was to provide the participants a greater sense of actually occupying a real apartment and interacting more naturally with the apartment and its virtual inhabitants. co-located virtual world that links directly. Two examples are AR Façade and Half Real. participants now simply said what they wanted. Façade is an interactive story experience where the participant plays the role of a dinner guest visiting a couple whose marriage is just about to break apart (Mateas and Stern. 2008). For example. nearby real buildings might appear to be castles and huts. A Belief Circle has a large group of people who subscribe to it and create custom content. Researchers built an AR version. We can view a Belief Circle as a persistent.

Half Real is a theatrical murder mystery play that uses spatial AR to merge real actors and a physical set with virtual content and to engage the audience with interactive situations where the audience members vote on how an investigation proceeds (Marner et  al. enabling a system to track off of that space and augment it more realistically.3). This system can also handle dynamic changes in the environment. Each audience member had a ZigZag handheld controller to vote when prompted. Applying the reskinning technique outside of controlled real environments.Location-Based Mixed and Augmented Reality Storytelling 267 than participate in an experience that was an uncomfortable social situation that they were expected to take an active role in. Half Real completed a tour in South Australia and subsequently played for a 3-week. Future possibilities include using the augmentations to change the appearances of the actors themselves. The creators had to work out numerous system issues to provide the reliability. Actors were actively tracked so that virtual labels could be attached to them.3  The Intel Scavenger Bot demonstration at CES 2014. The real set is painted white so that projective textures can change the appearance of the set during the performance. and transportability required of a professional stage production. requires AR systems that can detect and understand the real world. 2011). sold out run in Melbourne. the Scavenger Bot demonstration that Intel showed at consumer electronics show (CES) 2014 showed a system that could scan a previously unknown tabletop environment and then change its appearance by applying a different skin upon the environment (Figure 11. Kinect Fusion exploits the depth-sensing capabilities of the Kinect to build a volumetric representation of a real environment. rather than just the set and the backgrounds. 2012). such as running to follow one of the virtual characters when she leaves. For example. reskinning a real environment with a virtual grid pattern. Such a system represents a step along the directions needed for AR and MR systems to more commonly enable the reskinning technique. robustness... . such as those of AR Façade and Half Real. showing visible signs of surprise and emotional connection. Others became highly engaged. with correct occlusions (Newcombe et al. While we are FIGURE 11.

. and doing that in a multimodal way so that he felt. not for the purpose of entertainment. True reskinning will require systems that can detect and recognize the semantic characteristics of the environment and objects. when they walk through that area of the theme park and experience the attractions and shops there. People suffering from aphasia are impaired in their ability to communicate due to severe brain injuries. these models generally lack semantic understanding. At the end. written by Lewis Carroll (Moreno et al. 2005). read. The University of Central Florida provided an example of reskinning the interior of a museum to better engage visitors with the exhibits. eating a bagel. 2001). Preliminary results from this project indicate that immersive storytelling in an MR environment may enable some patients to reconnect with their abilities to tell stories. Alice’s Adventures in New Media was an early AR narrative experiment that leveraged the world of Alice in Wonderland. then a new experience that leverages that same content is not starting from scratch. Visitors navigate a virtual rover vehicle to collect specimens around the museum. The Wizarding World of Harry Potter was sufficiently popular that Universal expanded it in the summer of 2014. touching countertops. it is an example of this leveraging strategy. and skeletons of ancient sea creatures on display then came to life (Hughes et al. This is an example of applying reskinning to evoke stories out of a patient in the pursuit of a serious goal: helping patients recover their own abilities to communicate. they see an animation of one dinosaur grabbing a pterodactyl out of the air and holding it in its mouth. and smelled familiar sensations. so a key strategy may be to exploit virtual content that participants are already familiar with. and it can be quite successful. and a doctor involved in this project testifies that this breakthrough would not have been possible in a purely virtual environment or without the augmentations provided in the MR environment. heard. they draw from their memories and previous knowledge of this fantasy world.. While this is not explicitly an example of AR or MR storytelling. Reskinning relies most on the power of the experience coming from the virtual content. What appeared to be critical was building a real environment (a kitchen) that could be augmented in a variety of ways to elucidate familiar previous experiences from the patient: making coffee.. Since most visitors are already familiar with the Harry Potter books or films. The Aphasia House project is an exciting new application of MR storytelling which enables patients suffering from traumatic brain injury to tell their own personal stories to therapists. They may lose the ability to speak. visitors saw the museum interior transformed to be underneath the sea. or write. rather than the real environment. One example of this is the Wizarding World of Harry Potter at the Universal Studios Orlando theme park in Florida.268 Fundamentals of Wearable Computers and Augmented Reality now seeing systems that can detect and model the real world. or otherwise experienced the virtual content. When this content is created by professional storytellers and audiences who have already read the books. which then transforms back to the real world where the visitors see the real fossil of that dinosaur with the pterodactyl in its mouth. In the MR Sea Creatures experience in the Orlando Science Center. seen the  movies. Such leveraging is the basis of many crossmedia or trans-media approaches. 2014). It has an advantage in that the audience already finds the virtual content compelling. but as a critical part of guiding a doctor in determining how to treat a patient (Stapleton.

For example. I could revisit the site of my .4 REMEMBERING In remembering. Intel ran a series of AR demonstrations based upon the steampunk fantasy world of Leviathan.4  A Leviathan demonstration in the Intel booth at CES 2014. which affected the narrative snippets.4). we brought virtual representations of the Leviathan and other creatures inspired by the book into the real environment. or the virtual content by itself. At CES 2014. where mankind discovered genetic engineering very early. I was part of a large team of people who created and ran these demonstrations. written by Westerfeld (2009). generally at the particular place where those memories and stories happened. The participant could interact with the characters by performing various actions such as serving and sipping tea. replacing dirigibles. during World War I. In this system. In our demonstrations. Leviathan at CES 2014). both during the Intel CEO’s keynote presentation and in the Intel booth on the CES show floor (Figure 11. the AR storytelling strategy is to draw upon memories and retell those stories. Therefore.Location-Based Mixed and Augmented Reality Storytelling 269 FIGURE 11. The belief is that combining the memories and stories with the actual real location can result in a new experience that is more powerful than the real location by itself. and people chose to fabricate new types of living things to suit their purposes. in some countries a biological revolution supplanted the industrial revolution. While these demonstrations did not tell stories by themselves. they served as an inspiration of how this leveraging strategy could result in compelling new storytelling media when applied through the reskinning strategy of AR and MR storytelling. The world of Leviathan is set in an alternate Earth. These AR demonstrations were intended to inspire visitors about the potential for AR storytelling that used this leveraging strategy (Azuma. For example. the Leviathan itself is an enormous flying airship in the form of a whale. a participant sat at a table and saw three other characters from the book. 11.

One person might remember participating in the Free Speech Movement at that spot. from their particular points of view. When he or she does so. but he or she also hears the inner thoughts on the character sitting in that chair. but there are some differences. the participant sees the other jurors from the perspective of the juror who is sitting in the chair he or she is occupying (Figure 11. and yet another has memories of Pet Hugs sessions where students could hug therapy dogs to reduce their stress. and that constrains the experiences based on reinforcing to conform to that meaning. the potential stories and memories can vary greatly. Inspired by the drama Twelve Angry Men. . When seated at a particular chair in a table. is generally more personal and individual. For example. the deliberation continues but the participant now hears and sees things from a different juror’s perspective and hears that juror’s inner thoughts.. The strategy of remembering is similar to reinforcing. While I have photos and videos of that event. With this approach. could be home to a wide variety of memories and stories. Berkeley. one juror with liberal leanings sees another African-American juror as a potential ally but the third juror as prejudiced. Remembering.5). Four Angry Men. written by Reginald Rose. are experimental AR narrative demonstrations that enable participants to access and experience the memories and thoughts of characters in a narrative. communicating my personal story of that day and what that meant to me might be done in a more powerful manner as an AR or MR experience. The participant is free at any time to switch seats. these experiences place the participant in the viewpoint of a jury member deliberating on a case. the site of the Battle of Gettysburg draws its power from a specific event. For example. For example. Even when divorced from a particular location. While the initial juror heard the prejudiced juror as FIGURE 11. The locations used in the reinforcing approach have particular meanings and power that most people agree upon and know.5  Three Angry Men. merging that virtual content with the actual location where my wedding occurred. The participant not only hears what the other jurors say and what his or her persona is saying. Sproul Plaza on the campus of the University of California. while another knows it as the place where he first met his future spouse. even at the same location. Three Angry Men (MacIntyre et al. When the participant moves to the seat of the prejudiced juror.270 Fundamentals of Wearable Computers and Augmented Reality wedding ceremony and see the gazebo where that occurred. seeing two other jurors from one point of view. 2003) and its successor. the entire experience changes. memories and viewpoints by themselves can make compelling experiences. While interpretations might vary. the meaning is shared and agreed to by almost all participants. in contrast.

if a bit frustrated. and its well preserved medieval city center through MR storytelling techniques (Ballagas et al. participants indirectly explore the historical city center. Some participants found that using stories in this manner injected life into a historical tour that otherwise might have been dry and boring. In this experience. the person takes a photo of himself or herself with the park in the background .. The participants learn about these characters and their stories and perform tasks such as carrying a love letter to another character inhabiting a different location in town. as if mapping diary entries to specific spots in the city. If the person thinks the answer is insufficient. who have requests to make of the participants and stories to tell them. Participants went to computer terminals at the Royal Opera House. the personal geography of how that park maps to the chosen young person. with the goal of learning history in a more entertaining and enjoyable manner. 2008). For example. Each person has a key question that he or she wanted help in answering. REXplorer was a system that encouraged participants to explore and learn about the historic town of Regensburg. one instruction asked a participant to find a spot that his or her father would like and to talk about that. For example. at a first-person level. at the spot where those occurred (Benford and Giannachi. But if the person finds the answer intriguing. The stories give clues for answering that person’s question. to select one of the young people and then explored the park virtually. and how stories and memories change based on personal perspectives and biases. 2008). contributing their own personal stories. Germany. Rider Spoke was conducted in 10 cities across the world. they heard stories relating to that person. he or she can invite the participant for a private chat or phone call. perhaps. The participant can then track down the person in the park and attempt to answer the question. Rider Spoke is a location-based experience in which bike riders were encouraged to record personal stories and memories associated with particular locations. ensuring that each location had unique content. You Get Me was a 2008 experience in which participants selected one of eight young people to hear his or her stories and. To the prejudiced juror. And in the conclusion. Although REXplorer is primarily a game that asks participants to go on quests to specific locations within the city center. participants were not just passive consumers of content but active generators. about 5 miles away from the park. it motivates these quests through virtual characters. he or she can reject it and force the participant to explore more of the personal geography and stories. The virtual content consisted of the audio recordings. one person arranged stories around a swimming pool in which she nearly drowned. By performing these tasks. ghosts who used to inhabit the town. The system provoked the participants to leave significant and evocative memories. Even the appearances of the jurors change depending on the viewpoint. 2011). the prejudiced juror hears himself as reasonable. Riders could add recordings only in spots that did not already have content associated with that location. the African-American juror’s appearance and behaviors transform to conform to his biases. make a connection to that person (Blast Theory. Three Angry Men provides an example of how AR storytelling could communicate. which has been called the Rashomon effect after Akira Kurosawa’s film.Location-Based Mixed and Augmented Reality Storytelling 271 loud and unreasonable. As the participant moved in the virtual representation of the park. The eight young people had communication and tracking equipment and walked around a park.

Why would someone get off the couch and instead participate in these new location-based media? The answer is that AR and MR storytelling experiences must become compelling enough to convince participants that this effort is worthwhile. AR and MR storytelling might leverage such situations. money. Advancing the technology of moving pictures from those early days into the art form of cinema that we know today required progress on many fronts. It takes little effort to turn on the TV and watch a show on the DVR. what our goals and aspirations are. One of the most important challenges in AR and MR storytelling is motivating people to make the necessary effort to participate in these location-based media. Initially.W. and comfortable couches. to one of eight real people with real concerns and real stories. in terms of time.272 Fundamentals of Wearable Computers and Augmented Reality and sends it to the participant. augmenting . via MR storytelling techniques. Some stories might be of interest only to your family or a close circle of friends. overlaid on a real park that each person uniquely maps as his or her own personal geography. Griffith. go to theme parks. see a movie. It reminds me of an early phase of the development of motion pictures. In AR and MR storytelling. Sergei Eisenstein. our ability to implement storytelling experiences based on the remembering approach is constrained. etc. Those experiences are attractive enough that people willingly spend the extra effort to participate in those. we do not yet have the equivalents of the early pioneers in cinema. I feel that as a form of media. it is still very early in its development. and business models. such as Buster Keaton. In comparison. 11. which requires effort and costs resources. connecting the participant in an intimate manner. aimed at a mass market audience. and D. as this chapter has discussed. but the potential is there to enable individuals to create and make available their own personal stories for others to see in the context where they occurred. But that does not make them any less important. the stories that are the most important to us or to others are the ones that represent ourselves. These future pioneers will need to overcome some of the core challenges of this new form of media while simultaneously unlocking its potential. one can watch a film or see a TV show almost anywhere. Since we do not have ubiquitous tracking systems with the desired accuracies needed to build indoor/outdoor AR and MR experiences. see a sporting event in a stadium. or only to specialized audiences. game consoles. Not all stories have to be written by professional storytellers. visit a museum.5 CONCLUSION AR storytelling is still in an early. not just in technology. where some of the first movies featured footage of moving trains. design. travel to distant sites on vacation. exploratory phase. people still leave home to go to a movie theater. While there have been many initial experiments. These experiences generally require participants to leave their homes and travel to particular locations or venues. Sometimes. but also in art. or play a video game in one’s house. You Get Me is a compelling experience. and where we came from. who we are. etc. Despite our TV sets. showing what the technology could do.

Now. political. While compelling. such as film. how you view the world and make decisions. AR and MR storytelling experiences have the potential to change how we view the world. or any other dimension. and to in turn change our belief systems and values. The people who knew about these events and who chose to attend at the specific locations and dates were rewarded with access to locations that the general public normally cannot enter. I now view the world differently than I did prior to this incident.Location-Based Mixed and Augmented Reality Storytelling 273 those experiences that are already proven to draw people out of their homes. We know that traditional media. I look forward to this day. I am sensitive to the locations of ramps. As the medium develops. plays. I look forward to such experiences being sufficient by themselves to attract participants. 2013). and the Lilly Belle caboose car on one of the railroad trains. . • He now requires a powered wheelchair to travel anywhere. I can give an example of the desired impact through something that happened to me through real life experience: • A friend of mine. such as that of my friend. These locations included the Club 33 private club. compelling. and if this impact is powerful enough that it actually changes your own belief system. But if an experience can change me. elevators. have this power and there are examples in each where people have found those stories memorable. and life altering. This provided a series of experiences that Disney fans could participate in. A more general approach toward achieving compelling experiences will be to realize the potential inherent in the medium to see the world around you through the eyes. to make us see the world from a different perspective. I would not think twice about curbs or stairs or other things that are insurmountable obstacles to my friend. that exploit the new potentials in this form of media. because I have traveled with him to many events. Before. an ultimate expression of the potential of AR and MR storytelling is if it can cause you to view the world in a different way. in a way similar to what I just described. When we have equivalent examples in AR and MR storytelling. This different perspective can be cultural. What would be a payoff that would make people eager to participate in ­location-based experiences? The Walt Disney Company provided an example in a 2013 Alternate Reality Game called The Optimist (Andersen. and mindset of another person. culminating in an elaborate puzzle hunt that took place at the 2013 D23 Expo and in Disneyland. had a stroke. and other items that provide wheelchair access. historical. For Disney fans. and books. that is proof that experience is compelling. this is a specific approach that requires special locations and does not generalize or scale to most situations. viewpoint. To me. then we will know that it has matured sufficiently to stand equally with established media. social. visiting these locations provided highly desirable and special experiences. Walt Disney’s apartment above the fire station on Main Street. who worked on several projects with me. ones they would remember and forever cherish.

December 3–5. MA: MIT Press. August. D. Lee. Harrigan.anthropology. Performing Mixed Reality. Brittenham. October 18–19.. 81–86. Three angry men: An augmented-reality experiment in point-of-view drama.html (accessed May 2. 355–385. IEEE Computer Graphics and Applications. S. CA. Gardiner. 2008. 2014). In Pervasive Computing 2008. In Proceedings of the 2008 International Conference on Advances in Computer Entertainment Technology. D. 2014). pp. Thomas. Atlanta. Mateas. J. P. 1321–1329. 1997. 2007. In Second Person: Role-Playing and Story in Games and Playable Media. A. A taxonomy of mixed reality visual displays. July 23. Benford. G. November 5–8. Kuntze. Darmstadt. 2007. http:// www. 1994.. and N. Bolter..blasttheory. and G. 244–261. 1999. 2008. 2012. pp. 79–86. 51–60. Azuma. (accessed May 5. P. A survey of augmented reality. 110Stories—What’s your story? http://110stories. 2014). 2014). In Proceedings of the 2005 ACM SIGCHI International Conference on Advances in Computer Entertainment Technology. and B. C. R. and E. Exploring spatial narratives and mixed reality experiences in Oakland cemetery.. In Proceedings of the 3rd IEEE International Symposium on Wearable Computers 1999. http://ronal dazuma. Höllerer. Dow. pp. Bolter. June 15–17.. 2005. http://chi. (accessed May 12. 2005. Gandy. GA. Vaughn et al. Wired. (accessed May 5. Leviathan at CES 2014. Cambridge. 2014). Feiner. S. The Westwood Experience by Nokia Research Center Hollywood.274 Fundamentals of Wearable Computers and Augmented Reality REFERENCES San Francisco. Mixed reality in education. M. S. entertainment and training. (accessed May 5. E77-D(12). C.. Giannachi. 6(4). Sydney. Yokohama. In Proceedings of the First International Conference on Technologies for Interactive Digital Storytelling and Entertainment. Walz. Hughes. Cambridge. R. a pervasive game for tourists. 373–380. B. Azuma. 183–207. Haren. Pavlik. You Get Me. Exploring interactivity and augmented reality in theater: A case study of half real. and F. and B. 2003. . Mateas. 2013. B. and A. Writing Façade: A case study in procedural authorship. Styles of play in immersive and interactive story: Case studies from a gallery installation of AR Façade. 24–30. S. Oezbek. MacIntyre. IEICE Transactions on Information­ reality/ (accessed May 5. Dow. R. T. Cambridge. M. Ballagas. MA: MIT Press. Gaming tourism: Lessons from evaluating REXplorer. 230– Milgram. Situated documentaries: Embedding multimedia presentations in the real world. M.). pp. In IEEE International Symposium on Mixed and Augmented Reality 2012. pp. 2014). Anomaly Publishing. and S. Germany. Smith. S. Second Person: Role-Playing and Story in Games and Playable Media. J. Harrigan and N. and M. M.msu. MacIntyre. 2012. K. “The Optimist” draws fans into fictionalized Disney history. Hughes. eds. Cultural Heritage Informatics Initiative. D. Kishino. Presence: Teleoperators and Virtual Environments.html (accessed May 2. MacIntyre. pp. J. 2011. MA: MIT Press. Wardrip-Fruin. http://www. and M. Stern. Revealing Londinium Under London: New AR App. Hayes. Arts. C. New South Wales. March 24–26. P. B.. Wardrip-Fruin (eds. R. Australia. Azuma. Japan. Stapleton. Meyers. http://www. 2011. and J. Anomaly. 2014). pp. S. May 19–22. B. Marner. Blast Theory. 25(6). Transmedia futures: Situated documentary via augmented reality. Media and Humanities Proceedings.

2009. and J. 149–152. V. Newcombe. B. Media and Humanities Proceedings. Pervasive Games: Theory and Design. Rotterdam. Seoul. Museum of London. In CAST01. C. Germany. Izadi. 2006. N.. http://www. Allen. 2007. and A. Samanta et al. . September 21–22.. October 26–29. Archeoguide: An augmented reality guide for archaeological sites. J. 2011. (accessed May 5. Moreno. 2010. Wherever you go. Korea. Switzerland. E.. New York: Simon Pulse.azavea. MacIntyre. pp. In The Educational Design and Use of Simulation Computer Games. Basel. 52–60. In IEEE International Symposium on Mixed and Augmented Reality 2010. Matthews et al. An Azavea and City of Philadelphia Department of Records White Paper. Bonn. Burlington. Wither. M. Sense Publishing. (accessed May 5. V. KinectFusion: Real-time dense surface mapping and tracking. Ioannidis. 2001. Alice’s adventures in new media: An exploration of interactive narratives in augmented reality. Jan. 2014). The Netherlands. Squire. R. F. October 13–16. Westerfeld. 2009. http://www. 39–46. Implementing Mobile Augmented Reality Technology for Viewing Historic Images. 127–136. S. J. there you are: Place-based augmented reality games for learning. PhillyHistory. Londinium App. IEEE Computer Graphics and Applications. J. Bolter.Location-Based Mixed and Augmented Reality Storytelling 275 Montola. In Proceedings of IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2011. http://simiosys.museumoflondon. 22(5). Hilliges et al.. New York: Tor Doherty Associates. Karigiannis et (accessed June 16. R. Developing stories that heal—A collaboration between Simiosys and the Aphasia house. Vinge. 2011.. Stapleton. O. 2002. pp.. 2014). K. 265–296. The Westwood experience: Connecting story to locations via mixed reality. S. V. pp. Leviathan. pp. J. M. MA: Morgan Kaufmann Publishers. Vlahakis. Waern. 2014). Rainbows Streetmuseum-Londinium/home.


....4 Head Tracking. 289 12...3......2..................................... 292 12..............3....................................... 285 12.......1 Directionalization and Localization...........................2 Source to Sink Chain.................. 278 12.....3....1 Auditory Dimensions...2.....1.....280 12..................... 278 12......... 294 12.4 Spatial Audio Augmented Reality.......... 287 12................... 289 12......3. and Everyware Michael Cohen CONTENTS 12..........................................................1 Capabilities....2 Whereware for Augmented Reality....3..........1..2 Spatial Reverberation...................3 Dynamic Responsiveness..............3........... 278 12.......1 Position = Location and Orientation.........................3..1................................................. ABC........ 301 277 .............. Wearware...................1 Mobile and Wearable Auditory Interfaces.............................1.....3 Spatial Sound............................... 291 12............12 Dimensions of Spatial Sound and Interface Styles of Audio Augmented Reality Whereware............5 Broadband Wireless Network Connectivity: 4G...........300 12.....2...........1.................3 Distance Effects...1..................................... and SDR...... 295 12................................................ 295 12.......................1..1 Introduction and Overview...2.. MIMO....1............................................... 295 12.................2 Whereware: Spatial Dimensions.....3 Wearware and Everyware: Source and Sink Dimensions...................................1..........................................................300 12.............................2 Form Factors............1......................1......................3............................................ 289 12................................................................................................................. Changing Pose = Translation and Rotation................................ 283 12............................4 Stereotelephony............

. such as contemporary smartphones and tablets.. reflections and reverberation...... including psychoacoustic bases of spatial hearing and techniques for creating and displaying augmented audio.............................. 301 12... or synthesized.......... Modern applications are synchronous: dynamic (interactive.........2. and reception....... are especially leveraged by such distributed capabilities..................3.........3.............. Durlach and Mavor 1995..1  Auditory Dimensions VR and AR (Barfield and Furness III 1995.... runtime).. 301 12..304 12...............and position-­sensitive interfaces. A common technique in combination of real and virtual audio is to model soundscapes as compositions of sound sources and cascaded filters that process them..... then recorded or buffered...............302 12... This chapter reviews spatial sound in the context of interactive multimedia and virtual reality (VR) and augmented reality (AR)....) Generally.......... and pervasiveness of public interfaces............ Sound and audio..304 References. space..... including spatial sound and augmented audio..... is used instead of listener to distinguish it from an actual human...... 12........ Figure 12...302 12... spatialization.... distance fall-off and filtering.. directionalization.....3........................3...... 12.................... The effects of .... including synthesis...2 Special Displays. (The term sink............2 Parametric Ultrasonics. and online (­networked)... The theory and practice of spatial sound are surveyed. and finally transmitted or rendered......... and sinks....3......2.............1............2......... such as speaker arrays...2 Source to Sink Chain To begin................. intensity. reify information by giving it a realistic manifestation....4 Concluding Remarks....... stages in an audio AR (AAR) system are reviewed... but parameterization of the entire source-sink cascade..... processed... This section surveys spatial and nonspatial dimensions of sound..1 Binaural Hearing Aids.............. radiation...........278 Fundamentals of Wearable Computers and Augmented Reality 12...... a complete rendering of which models sources............ including allowing designation of multiple sinks for a single user... Stanney 2002).2..........1 illustrates an entire chain...........1........... signals are received.. Auditory displays deserve full citizenship in the user interface... Wearware and everyware refer respectively to portability of personal sound terminals. realtime (updates reflected immediately)...... denoting the dual of a source.. as explained in the ­following chapter. including careful use of the often muddled jargon... captured...4 Information Furniture.............1  INTRODUCTION AND OVERVIEW Time is the core of multimedia. Shilling and Shinn-Cunningham 2002. Whereware is explained as a class of ­location........ Rendering of virtual sound sources with respect to sinks in a modeled space is parameterized by such characteristics as position..... usually with coordinated multimodal input and output. and diffraction around obstacles. not just modulation of location of virtual sound sources............3 Distributed Spatial Sound....... 303 12........................

a ­coefficient which modulates (multiplies) a raw signal sequence.or cloud-served streaming Parameters: Location: direction (heading) and distance Directivity Directional tone color Mute/muzzle (solo) Motion Space (medium) Radiation/propagation Sound field modeling: Spreading loss and distance attenuation Atmospheric effects Humidity. and ­loudspeakers—should all be considered to create a veridical auditory illusion. etc.k. typically by specifying frequency-band-specific gain. orientation. for example. a. and position tracking Sensors: Ultrasonic or acoustic Magnetic Optical. WFS. HRTF processing deafen/muffle (attend) Doppler effects Display: Earphones. from MP3 or AAC (. the various elements—sensor (i. stored as computer files. 10. Decoded audio data has a flat PCM encoding. or another acoustic event (the source) causes small fluctuations of pressure above (compression. such as that encapsulated by WAV or AIFF files. headphones.. Such m ­ easurement. a microphone when a sound source is not otherwise provided). Balancing or panning a stereo signal involves coupled gain adjustments to a left–right signal pair. diffraction Reflection (echoes) and scattering Reverberation Sinks Reception and directional synthesis Auralization: Location and direction (orientation) Panning. and headsets Bone conduction Nearphones Loudspeakers Speaker arrays (5. Synthesis techniques include those listed below the Sources block at the left of Figure 12.and cloud-served music or other material. head. or synthesized or streamed in real time. might need to be decompressed.2.1. ­condensation) and below (rarefaction) atmospheric pressure. Audio sources might alternatively be provided as recorded material.1  Overview of virtual acoustics and augmented audio. expressed as a voltage signal.or A-law representations). and pinnae of the modeled listener. 22. These audio signals are filtered by a computer’s DSP (digital signal ­processing). scaling the envelope of a notional pulse train for source exposure or sink sensitivity.279 Dimensions of Spatial Sound and Interface Styles Location.e.1. temperature Refraction Transmission loss (air absorption) Propagation delay Obstructions and occlusion. room or space. is discretely sampled (in time) and quantized (in amplitude) by an audio interface ADC (analog–digital converter).m4a). Digital amplification or attenuation is accomplished by adjusting linear gain. These variations can be sensed by a microphone. Recordings. encoding it uniformly (as in LPCM. Such s­ weetening .2. in which audio signals are represented as sequences of amplitudes at a constant sampling rate. HOA. Networked streams are typically remote teleconferees’ voices or ­internet.1. linear pulse code modulation) or nonlinearly (as in μ. Spectrum-based adjustments—such as equalization and aural enhancement—are also possible. torso. transducing acoustic energy into electrical. infrared GPS/WAAS Gyroscopic.a. 7.) FIGURE 12. accelerometric Sources Object generation Environmental sounds Auditory icons. earcons Voice Music Nonspatial (anechoic: “dry”) sources: Sampling (microphones) Additive and subtractive synthesis AM and FM Physical modeling Granular synthesis Speech synthesis (including TTS) Nonlinear wave-shaping Waveguide synthesis Hybrid algorithms Network. A speaker’s voice. a musical instrument tone.

and therefore also corresponds to the windowed and averaged sum of squares of a sample sequence. as the sequence is processed with filters that model the effect of each listener’s head and torso (Kapralos et al. as in architectural walk-throughs and simulations of concert halls (Kleiner 2011). proportional to the square of the RMS (root mean square. sources are parameterized by spatial position. since area is unknown or unspecified). since human perception of loudness (as measured.1. Finally. Such mediation also includes propagation and attenuation models. In VR and AR applications. the standard deviation of a centered signal) pressure or voltage. other objects in a scene around which sound must diffract or bend. which induce pressure waves that interact with each human listener’s real environment before entering ears to be apprehended as auditory events. such as megaphone effects. describe deviation from omnidirectional radiation patterns by emphasizing certain directions. 2008). Such audio systems project only lateral arrangement of captured and mixed sources. as the apparent direction from which sound emanates is controlled by panning. This embedding specifies or implies interaction of the sources with a simulated environment. Simple source models project sound equally in all directions. corresponding to the time-averaged square of a linear signal’s running value. parameterized like sources with positions. cascaded in a data chain or aggregated into a monolithic filter. and loudspeakers. the multichannel PCM stream can be fed into a DAC (digital–analog converter). for instance. For a digital signal. or more typically a simplified or abstracted model. shifting the balance of a sound source . intensity is equivalent to power (proportional to an arbitrary indeterminate factor. in which energy equivalently corresponds to intensity (via Parseval’s theorem. Auralization emphasizes o­ torealistic acoustic rendering. as well as occlusions and obstructions (Funkhouser et al. and sent through analog amplifiers to loudspeakers. Sources and sinks are embedded in a space. 12. Power is energy per time. The level associated with a sound’s subjective loudness or volume (which is not to be confused here with the separate idea of 3D spatial extent) is proportional to the logarithm of the intensity.3 Spatial Sound In stereo reproduction systems.280 Fundamentals of Wearable Computers and Augmented Reality might include spectral extension—with subharmonic or overtone synthesis—as well as time-based enrichment—echoes and reverberation. smoothed out with reconstruction LPFs (low-pass filters). where the voltage-encoded signal corresponds to excursion of speaker diaphragms. reverberation. on a sone scale) is predicted by an approximately logarithmic compression of measured intensity. All of these effects are modeled as digital signal processes. In the frequency domain. which may be a realistic model of a room or building. including location and orientation as well as other attributes. Intensity is power per area. including echoic reflections off surfaces. 2002). At the end of the simulation the signal is received by sinks. but more complicated models. squares of the amplitudes of the Fourier components comprise a power spectrum. which is basically a restatement of conservation of energy). statistically approximable ambience. headphones. sound comes only from left and right transducers: earphones.

inside-the-head localization). The volume of space becomes part of the work and a strong sense of sculpting sound in three dimensions becomes apparent. In the words of the Audium’s director. By applying psychoacoustic effects with DSP. a cross-­coupled dual mixing variable resistor. Suzuki et al. The Audium is a unique spatial sound theater (Loy 1985. or. but vertical up–down (heave) and longitudinal back–forth (surge) qualities as well. Live performance of works gives a human. A melodic line acquires a starting point. a speed. and any combination. located basically only between loudspeakers and only at distances farther from the listener than the plane of the speakers. or harmony) is a fundamental element of music. Rhythmic ideas take on new qualities when speed and direction are enhanced by controlled movement. spatial sound is a sonic analog of 3D graphics. pantophonic (360° × ±90°. Cyberspatial sound projects audio media into acoustic space by manipulating signals so that they assume virtual or projected positions. as in antiphonal or polychoral music. Pulkki et al. Begault 2004. mapping them from zero space (source channels) into multidimensional space (listeners’ perceptual spaces). One of its literally motivating principles is that space (like rhythm. immersing listeners in soundscapes. such as that composed in . diagonal. melody. horizontal. Panning by cross-fading intensity yields lateralized images. layers unfold. overlap. Harmonic tensions between different locations in space open up unusual timbres. ­circular. periphonic (360°. It takes theater-in-the-round and turns it inside-out.Dimensions of Spatial Sound and Interface Styles 281 between channels with a pan pot (short for panoramic potentiometer). and a point to conclusion. But this technique yields images that are diffuse. engineers and scientists are developing ways of generating sound fields (Tohyama et al. 1995) and fully 3D sound imagery (Blauert 1997. and entwine to reveal a rich audio tapestry. Areas in space become launching sites and meeting stations for converging sound lines. or spatialized in a 3D. but more sophisticated processes can make virtual sources directionalized in a 2D. Shaff 2002). only between the ears (for intracranial. estimability of the position in space of virtual sources. Stan Shaff The most direct way of implementing spatial sound is by simply distributing real sources in space. Spatial audio involves technology that allows virtual sound sources to have not only lateral left–right (sway) attributes (as in a conventional stereo mix). a determined pathway. 2011. or IHL. Rumsey 2006. a specially constructed venue featuring music steered among its 176 speakers in an intimate (49 seats) setting in San Francisco. and that controlled rendering of sound in space creates a kinetic perception that must be a part of a composer’s musical vocabulary. interactive element to the Audium’s spatial electronic orchestra. 2011). Augmenting a sound system with spatial attributes unfolds extended acoustic dimensions. if headphones are used. perhaps with nonlinear cross-tapers to preserve total power across a distribution. Sounds are choreographed through their movement and intensity on multiple trajectories through space. 4π steradian solid angle) soundscape. 2π radians circumferentially) flat soundscape. Such virtual positions enable auditory localization. As each melodic line travels. degenerately spatialized 1D. Melodic convolutions can be physically felt as they flow along spatial planes—vertical. Gilkey and Anderson 1997.

TABLE 12. etc. A sound diffuser or spatializer.). in German: Kunstkopf ). triangle.1 Dimensions of Musical Sound Frequency content Pitch and register: tone. for cyberspatial capabilities (Cohen et al. his nephew Giovanni Gabrieli. acoustic space synthesis.2—as well as control of extra dimensions (Cohen and Wenzel 1995) shown in Figure 12. decay. Carlile 1996. equalization. release (musical note shape) Temporal envelope. square. and sweetening Spectral profile. Therefore. including envelope and moments (center frequency) Spectrotemporal pattern (evolving spectrum).282 Fundamentals of Wearable Computers and Augmented Reality Venice during the Renaissance by Andrea Gabrieli. and transporting a listener into another space. directivity. and environmental characteristics such as envelopment. including tremolo (AM) Timing Duration Tempo. sawtooth. repetition rate Duty cycle Rhythm and cadence. There are two paradigms for AR and VR perspectives: projecting simulated sources into a listener’s space. texture. and Claudio Monteverdi. Alternatively. the rest of this chapter concentrates on DSP synthesis of spatial cues. a multidimensional mixer. creates the impression that sound is coming from different sources and different places. the captured binaural signals from which are presentable to listeners. waveshaping. 1999). including syncopation Spatial position: location and orientation Direction: azimuth. sustain.2.1 and spatial dimensions presented later in Table 12. spatial sound can be bluntly captured by a gimbaled dummy head positioned around a fixed speaker or by physically moving sources and speakers around a mannequin (alternatively spelled manikin. Jot 1999. Rumsey 2001) allows dynamic. a dummy head. physically associates each source with a loudspeaker. rectification. However. harmony. statically placed or perhaps moved around mechanically. elevation Distance (range) Directivity: attitude and focus . arbitrary placement and movement of multiple sources in soundscapes—including musical sound characteristics outlined in Table 12. such implementations are awkward and not portable. just as one would hear in person. including apparent extent. Fully parameterized spatial audio (Begault 1994. tone color LTAS (long-term average spectrum) Dynamics Intensity/volume/loudness SNR (signal-to-noise ratio) Envelope: attack. melody. orientation. Such a literal approach to electroacoustic spatial sound. vibrato (FM) Waveform (sinusoid. tone color.

a function of the interaural distance.* a consequence of the head acoustic shadow. Audio displays based on such technology exploit human ability to quickly and preattentively (unconsciously) localize and segregate sound sources. whereas ILD implicitly suggests subjective sensation. flop): (roll) Yaw (whirl. Spatial hearing can be stimulated by assigning each phantom source a virtual position with respect to each sink and simulating auditory positional cues. These perceptual cues can be captured by frequency-domain anatomical or head-related transfer functions (HRTFs) or equivalently as time-domain head-related impulse responses (HRIRs). out: retreat (drag) ⤡ forth (fore).1  Directionalization and Localization Binaural localization cues include interaural time (phase) difference (ITD). in: advance (thrust) Up: ascend (lift) ↕ down: descend (weight) y φ Barrel roll ψ Azimuth θ Pitch (tumble. such as inside a musical instrument. In AR applications. twist): pan z About Axis Rotation Elevation Along Axis Climb/dive x Left/right y CW/CCW z Perpendicular to Plane Sagittal (median) Frontal (coronal) Horizontal (transverse) In Plane Sagittal (median) Frontal Horizontal (transverse) Simulations can be rendered of otherwise impractical or impossible situations.283 Dimensions of Spatial Sound and Interface Styles TABLE 12. . virtual position is chosen to align with real-world features or directions. since level is logarithmic. as intensity has dimensions of power per area and units W/m 2. Pose) Dynamic (Gesture) Location (Displacement) Scalar Translation Camera Motion Lateral (transverse width or breadth) Frontal (longitudinal depth) Abscissa X Ordinate Y Sway: track (crab) Surge: dolly Vertical (height) Altitude Z Heave: boom (crane) Orientation or Attitude Directions (Force) Left↔right x Back (aft). interaural intensity difference (IID). 12. and their generalization as binaural frequency-dependent attenuation.1. Including Cinematographic Gestures Position Static (Posture. in dB. measured for * IID is also known as ILD.3. IID emphasizes objective measurement.3. flip): tilt Roll (bank. as illustrated in Figure 12.2 Physically Spatial Dimensions: Taxonomy of Positional Degrees of Freedom.

amplified.284 Fundamentals of Wearable Computers and Augmented Reality Spatial impression Source Position Azimuth Distance Elevation Environment Focus. 1986. so often the earprint selection is parameterized by head tracking. including direction (azimuth and elevation) and distance (range). Cambridge. cavities. Wenzel 1992). MA. The bumps. These algorithms are deployed in spatial sound engines such as DirectX’s DirectSound and some implementations of OpenAL. which manifests as characteristic notches and peaks in a frequency plot (Ballou 1991. pinnae (outer ears). Focal Press. For each direction. Such systems process arbitrary audio signals—including voices. as predicted by the duplex theory.). and presented to speakers. 2. Watkinson 2001). This cancellation and reinforcement results in comb filtering. and music—with functions that place signals within the perceptual three-space of each listener. direction-dependent interference. Mason as reported in Rumsey. Depending upon the application.. creating psychoacoustic localization effects by expanding an originally monaural signal into a binaural or multichannel signal with spatial cues. Waltham.) the head. in S. as explained in the next section. D. 1997. F. and allows . the Java Sound Spatial Audio library works this way. Organised Sound. For instance. Smalley.. intimacy Dimensions Width Depth Width Height Depth Height Envelopment Dimensions Width Depth Height Perceived dimensions Width Depth Height FIGURE 12. The Language of Electroacoustic Music. a spatial sound signal processor implements digital filters. Smalley. diffuseness. adequate periphonic directionalization can be achieved by modulating only phase and intensity— just ITD (delay) and IID (as in balance panning) without heavier DSP. Spatial Audio. a left–right stereo pair of these HRTF earprints can be captured. 45. sound effects. 107–126. using filters such as the MIT KEMAR database. and torso of humans or mannequins (Hartmann 1999. (Extended from taxonomy by R. the output of which can be converted to analog signals. These static perceptual cues are fragile in the presence of conflicting dynamic cues (Martens 2003). Cyberspatial sound can be generated by driving input signals through these filters in a digital signal processor.2  Subjective spatial attributes. Emmerson (ed. 2001. and ridges of a pinnae cause superposition of direct and reflected sound. Thus.. MA. Macmillan-Palgrave. Spectro-morphology and structuring processes. heard as tonal coloration. a simple perceptual model of spatial hearing direction estimation. folds. D. so-called because of spectral modification. p.

and hence such ­signals carry no echoes. heads are not spherical. For instance. which affects interaural coherence. as illustrated by Figure 12. Lag in binaural arrival of a planar wavefront is estimated as τ = r(θ +sin θ)/C.* A spatial reverberation system. reverberant or anechoic spaces are said to be either “live” or “dead. and C is the speed of sound.2  Spatial Reverberation A dry or free-field model includes no notion of a virtual room. corresponding to about 1700 Hz.1.” high.3  Woodworth’s formula for interaural time delay (ITD) is a frequency-­ independent.285 Dimensions of Spatial Sound and Interface Styles Source r sin θ rθ Contralateral θ θ r θ Ipsilateral FIGURE 12.or low-frequency signals are said to be “bright” or “dark. (This model is a simplification. θ is the bearing of a far-field source. The radius of a typical adult head is about 10 cm. ITD is usable (without phase ambiguity from spatial aliasing) for wavelengths up to the Nyquist spacing of the distance between the ears.4. far-field.) original stereo tracks to be so projected. HRTFs generalize and subsume ITDs and IIDs: time delays are encoded in the phase spectrum. Primarily based on low frequency content of a sound signal.” and active or inactive signals are said to be “hot” or “cold. the so-called cone of confusion.3. since the HRTF-based filter requirement of a monophonic channel is waived. and IID corresponds to relative power of a filter pair. spherical head: time difference cues are registered at starts and ends of sounds (onsets and offsets). as there are an infinite number of locations along curves of equal distance from the ears having particular ITDs and IIDs. anywhere on the median plane ITD and IID vanish. where r is the assumed radius of a head. 12. ray-tracing model of a rigid.” . and ears do not symmetrically straddle the diameter. ITDs and IIDs do not specify a unique location. For example. creates an acoustic environment by simulating echoes consistent with * In similar colloquialisms. Artificial spatial reverberation is a wet technique for simulating acoustic information used by listeners of sounds in interior environments. but are slightly rearward. at around 100º and 260º.

which is more diffuse and usually statistically described. or HRTF). the binaural space impulse response and its corresponding binaural space transfer function capture both the dry directionalization and the ambience for full spatialization—location. The frequency-domain equivalent of a time-based signal is a transfer ­function. early reflections—off the floor. As the anatomical impulse response can be convolved ⊗ with an impulse response that captures reflections. The Fourier transform (FT) of an anatomical impulse response—which typically ­captures effects not only of the head itself but also of the pinnae and torso. and immersion. and presence. and the reverberation tail forms the ambience of the listening environment. and ceiling—provide source position-dependent auditory images of . There are two modeled classes of generated echoes: early reflections. like the trace of an ­echogram of a clap. or when the HRTF is c­ ascaded with (multiplied by ×) the room transfer function (RTF). Early reflections: Representing specific echoes. orientation. clarity and definition. the perception of environmental characteristics—such as liveness. size. Direct sound and discrete early reflections are cascaded with filters for ambience to yield spatial reverberation. a complex representation of frequency-dependent attenuation and delay or phase shift. which are discretely generated (delayed.286 Fundamentals of Wearable Computers and Augmented Reality Binaural space transfer function Frequency domain HRTF RTF Reflection Reverberation Absorption FT IFT HRIR Time domain Source Mic ADC RIR Convolution Binaural space impulse response DAC FIGURE 12. a momentary introduction of energy. but is n­ evertheless referred to as an head-related impulse response—is its frequency-domain equivalent. Spatial texture is associated with perception of interaction of sound with its environment. the ­anatomical transfer function (imprecisely known as the head-related transfer function. and absorption of a room or space (parameterized by positions of sources and sinks). walls.4  Spatial sound synthesis: An impulse response is a time-domain ­representation of a system’s reaction to a kick. The soundstage impression of the space in which sound is perceived is related to presence. resonance. envelopment. discrete. early reflections are the particular echoes generated by each source. and late reflections comprising reverberation. and shape—is correlated with indirect sound. with frequencydependent amplitude). Spaciousness. As explained in the following paragraphs. the space or room impulse response (RIR). particularly the interval between arrival of direct sound and first few (early) reflections. placement of virtual sources and sinks. reverberation.

It gives musical voice to Hatsune Miku. Reverberation time is often characterized by RT60. Audio reinforcement. The ratio of direct to indirect sound energy is an important distance cue (as reviewed in Section 12. the time for sound to decay 60 dB. or infinite impulse response (IIR). Reverberant implementations of spatial sound sometimes employ a recursive (feed-backward. incident or reflected. which are then processed (usually digitally by feed-forward tapped delay lines. 1986). or a hologram floating over a stage setting. to one millionth power or one thousandth amplitude.3). An image-model or ray-tracing algorithm can be used to simulate the timing and direction of these individual reflections. who sometimes performs with real-life performers such as Lady Gaga. is a kind of AAR. and brilliance refers to reverberation time at high frequencies. section to yield global reverberation effects of exponentially increasing density and decaying amplitude. a spatial reverberator generates the sound field arriving at one’s ears.2. Like electronic musical instruments producing tones that are both sampled and synthesized.1. Each spatialized audio source. portrayed as a CG 16-year-old girl with long turquoise pigtails. as in PA (public address) systems. for tapped delay + recirculation (not to be confused with time domain reflectrometry. Given an audio stream and a specification of source and sink motion in a modeled room.Dimensions of Spatial Sound and Interface Styles 287 virtual sources to each sink. Combined spatialization: A filter that combines early reflections and late field reverberation—sometimes called TDR. compiling synthetic voices from processed snippets of human singing recordings. Warmth is used to describe reverberation time at low frequencies. Commercial systems such as the Audyssey MultEQ are designed to analyze site-specific reflections and adjust rendered audio to compensate for unwanted distortions.5 exploits wireless capture of mobile phones: a dummy head equipped with (upside-down) binaurally arranged mobile phones enables . Spatialized audio is rendered into a soundscape in which listeners localize objects by perceiving both virtual sources and the simulated environment. and room characteristics. Late reverberation: Late field reverberation typically represents source positionindependent ambience of a space. mixing real information with virtual so that a composite display facilitates decisionmaking processes or enriches environments and experience. As a different sort of example. mixed reality and mixed virtuality systems combine naturally captured (or transmitted) and artificially generated information. requires its own DSP channel. a technique for capturing echograms)— captures cues to perceived sound direction. The concept illustrated by Figure 12. or autoregressive). as well as time-varying source and sink positions. A combination of nonrecursive and recursive filters allows a spatial reverberation system to accept descriptions of characteristics such as room dimensions and absorption. a humanoid persona and virtual idol.4 Spatial Audio Augmented Reality The goal of AR and augmented virtuality is to integrate external and internal data. like shift registers) as if they were separate sources (Kendall et al. 12. Yamaha Vocaloid uses granular synthesis. sound distance. represented by a finite impulse response (FIR) filter.

Sukan et al. Rozier et al. AAR can display sound intended to mix with naturally apprehended events. The distinction is akin to that between reflex cameras with real analog images visible in the viewfinder and mirrorless cameras with digital live preview rastering.5  Poor person’s mobile stereotelephony: a pair of inverted mobile phones. Optical see-through displays project augmenting information on transparent surfaces such as eyeglass lenses through which a scene is viewed naturally. Visual AR has two classes of head-up. 2004. Alternative hear-through semi-immersive implementations of AAR use open headphones to eliminate the occlusion effect and simplify design since microphones are not required (Martin et al. compositing augmenting graphics in a frame buffer before graphic display. especially when chewing or swallowing.288 Fundamentals of Wearable Computers and Augmented Reality FIGURE 12. simultaneously calling a dual voice line. realizes wireless binaural telepresence. see-through composited display: optical and video. Analogously. a remote listener can experience the same acoustic panorama. directionalizing virtual sounds by convolution with HRIRs. . but listeners find annoying the ­occlusion effect. Härmä et al. 1998. Video see-through displays capture a scene via cameras in front of the eyes. A mobile AAR system (Mynatt et al. Occluding the ear canal (as is done by the Etymōtic® Research ER⋅4 MicroPro™ earphones) eases the mixing of virtual and real sounds by passively canceling ­outside noise and actively controlling signal levels. 2004. By transmitting the sound m ­ ixture heard by a user. deployed as a microphone array attached to a mannequin. portable stereotelephonic telepresence. or explicitly capture ambient sounds and mix them with synthesized audio before display via acoustically opaque earwear such as circumaural headphones and earbuds. 2010) extends such ideas. wherein intensity increases at low frequencies. 2009).

web databases stuffed with geographic coordinates. a course or direction to follow) as well as place. AR systems need full position information. and optical feature tracking. Location-based and -aware services do not necessarily require orientation information. for example. Position can be cross-referenced to GIS (geographic ­information systems) data. Placelessness of purely virtual information cripples applications for LBS. Cyberspatial sound is naturally deployed to designate position of sources relative to sinks. orientation is capturable by sensors such as gyroscopes and magnetometers.2.1 Position = Location and Orientation. mobile devices use cameras along . Hyperlocality encourages the use of georeferences. and z (sway. Location-based and -aware services nominally use translational location to specify a subject’s place. and altitude or x. narrowcasting.2. As elaborated in the following sections. 12. surge. Position. and RTLS (real-time locating systems) to mash-up navigation and social networking. but locative media and location-based services (LBS) typically refer to sitespecific information delivered to mobile terminals. Location can be tracked by GPS-like systems (including the Russian GLONASS). These spatial dimensions are summarized in Table 12. including orientation. described in the following chapter.2. Orientation in three-space is commonly described as roll. 2014). or electromagnetic tracking promise the same kind of localizability indoors (Ficco et al. to align composited layers—using trackers or machine vision techniques such as fiducials. longitude. but position-based services are explicitly parameterized by angular bearing (from which can be derived. it is especially receptive to orientation parameterization.and whitherware applications invites over-saturation of interface channels. Emerging interior techniques using indoor GPS or based on acoustic. Since audition is omnidirectional but direction-dependent. electronic maps. y. and multipresence. encouraging interface strategies such as audio windowing. and heave). directionalizing and spatializing media streams so that relative arrangement of sources and sinks corresponds to actual or notional circumstances. and geospatial data usable by AR systems. rectangularly representable as latitude.2  WHEREWARE: SPATIAL DIMENSIONS 12. and yaw (or azimuth). especially in real-time communication interfaces such as AR applications. Position is the combination of location and orientation. as suggested by its cognate pose. markers. comprises rotation as well as translation. Besides GPS. GIS.2  Whereware for Augmented Reality Some services use real-time location information derived from GPS.Dimensions of Spatial Sound and Interface Styles 289 12. whereware suggests using hyperlocal georeferences to allow applications location awareness. optical. Combining literal direction effects and metaphorical distance effects in whence. Changing Pose = Translation and Rotation Location-based entertainment (LBE) usually describes theme park-style installations. pitch (or elevation). whenceand whitherware suggests the potential of position awareness to enhance navigation and situation awareness.

magnetometers (electronic compasses)—and dead reckoning (path integration). Through onboard devices such as navigation systems. For example. or a six o’clock message from behind. including LBS and AR applications. 2002. he had his crew stuff their ears with beeswax (passive attenuation) while Orpheus played his lyre. Whereware denotes position-aware applications. to infer position. Whenceware (from whence. whereupon they disappeared. can auditorily display sonic beacons—landmarks. primed with hyperlocal geotags of locations. enhancing discriminability and speech intelligibility. Whence. the town neglected to “pay the piper. and entertainment facilities. A real-time streamed voice channel might be p­ rocessed as part of its display to express the speaker’s position. * Legend suggests a couple of gruesome examples: The mythical Sirens sang so enchantingly that sailors were lured to shipwreck. whitherware (from whither. beckoning travelers to goals or checkpoints. He played a musical pipe to lure the rats into a river where they drowned.290 Fundamentals of Wearable Computers and Augmented Reality with microelectromechanical systems (MEMS)—gyroscopes. but could be mapped into individualized space of a sink. Holland et  al. a three o’clock appointment reminder could come from the right. while less critical voicemail comes from behind. A receiver needs (geometric. whenceware-enhanced voicemail s­ ystems could directionalize playback so that each displayed message apparently comes from its sender’s location. Important messages might come from a direction in front of a recipient. spatial sound can enhance the cocktail party effect. and come hithers. accelerometers. Such applications of s­ patial sound (Loomis et  al. can improve situation awareness. so that. rendering registration error. warnings of situated hazards. and dynamic error (time lag and jitter). running aground on the bird-women’s island. combined with some kind of sensor fusion.and whitherware navigation systems. As illustrated by Figure 12. etc. fuel. for instance. In  polyphonic soundscapes with multiple audio channels.) calibration with the real world to align overlaid objects and scenes. as well as information on nearby restaurants. meaning to where) denotes location-aware applications referencing a destination (Cohen and Villegas 2011). Issues include static (geometric) error and drift.” so he played enticingly again to lead his deadbeat patrons’ children out of town. telematic services based on mobile or vehicular communications infrastructure give drivers access to weather and traffic advisories. meaning from where) denotes ­location-aware applications that reference an origin. Such functionality is especially relevant to interfaces with spatial sound capability. photometric. allowing listeners to hear out a particular channel from the cacophony.6. all somewhat mitigated by a forgiving user or a nonliteral user interface. Looser mappings are also possible: a virtual source location need not correspond to g­ eographic location of a sender.* Spatial sound can be used to enhance driving experience. When Jason wanted to guide the Argonauts past them. localized audio sources can be aligned with real-world locations across various frames-of-reference for increased situation awareness and safety. . drowning out (masking) the otherwise irresistible singing of the sea-nymphs. In a later fabled era. consistent with and reinforcing one’s natural sense of direction. allowing plausible discrepancies within bounds of suspended disbelief. the Pied Piper was hired by the town of Hamelin to clear a rat infestation. Timetagged notifications could be projected to clock-associated azimuths. acoustic. 1990. May 2004). However.

y. but ordinary sounds such as speech and music rarely exceed 100 dB SPL and cannot usually be heard much more than a kilometer away. and spatial sound is almost always modeled using spherical coordinates. coordinates. or 3D.2.291 Dimensions of Spatial Sound and Interface Styles Compass Goal Junction. . To reify the notion of a source projecting audible sound to a sink—a spatial sound receiver represented by an avatar in a virtual environment or the subject of an AAR projection—it is usual to display the direction directly but to take some liberties by scaling the intensity. checkpoint Accident Mobile channel Traffic jam Location-based services Door ajar Blind spot traffic Other vehicles Sonar parking assist Land line Home FIGURE 12. 12. These frames-of-reference are naturally coextensive. Cartesian or rectangular (x. navigation data are typically originally GPS-derived geographic latitude–­longitude–altitude coordinates. z).3 Distance Effects In spatial auditory displays. the direction of virtual sources can be simply literal. but to make a source audible it must be granted a practical intensity (Fouad 2004) besides a figurative loudness. Although ­geometric computer graphic models are almost always finally represented in rectilinear 2D. Extraordinary sounds like the explosion of Krakatoa (a v­ olcanic island in the Indonesian archipelago) in 1883 can be audible hundreds of kilometers away. distinguishing direction (azimuth θ and elevation φ) and distance (radius or range ϱ). As in the visual domain (McAllister 1993). y). the mechanism for judging distances is a combination of mono (monaural) and stereo (binaural) and stationary and dynamic cues. Euclidian (x.6  Back-seat driver: localized beacons for vehicular situation awareness. milestone.

carrying appropriate bit depth for expressive intensity modulation . distance effects are often exaggerated. If a sound source or sink is moving. using stereo effects in telephones and spatial sound in groupware. Head orientation: Range estimations are better when source direction is nearly aligned with the interaural axis (Holt and Thurlow 1969). especially exaggerated at intimate. as estimation of range sharpens if level control is driven by models that roll off more rapidly than the physical inverse 1/d amplitude law of spherical waves. Source dullness (sharpness): Distant sources are duller due to high-frequency absorbent effects of air (Coleman 1968. Malham 2001). In virtual environments. whereas direct sound level attenuates approximately according to the aforementioned −6 dB/distance doubling (Zahorik et al. Nature acts like an LPF. sources far from a listener yield roughly the same reverberation level. the gain is usually clamped or railed within a certain near-field distance. temporal intensity variation and Doppler shift (pitch modulation) also contribute to distance estimation (Zakarauskas and Cynader 1991) (see Figure 12. Direct-to-reverberant energy ratio: In environments with reflecting surfaces.2. equivalent to linear gain scaling as a simple inverse and level falling off at −6 dB/range doubling. high-definition real-time audio becomes asymptotically Broadband. where it is assumed to be negligible.292 Fundamentals of Wearable Computers and Augmented Reality Apparent  distance of a sound source (Bronkhorst and Houtgast 1999. near-field whisper ranges. Moore and King 1999) is determined mainly (Villegas and Cohen 2010) by Overall sound level: Intensity of a point source varies as the reciprocal of its squared distance (Mershon and King 1975). Virtual sources can be brought even closer to a listener’s head by adjusting ILD to extreme values without requiring the level to be increased as much as would normally occur at such close range (Martens 2001). with high sampling rate for full spectrum Broadly dynamically ranged. Familiarity with environment and sound source: Distance estimation is worse for unfamiliar sound sources (Coleman 1962). Interaural level differences: Closer sources present a larger ILD (Brungart and Rabinowitz 1999). high-fidelity. 12.4 Stereotelephony Stereotelephony means putting multichannel audio into communication networks. 2005). and often disregarded beyond a certain range. Simple models might use this inverse-square free-field spherical intensity attenuation. As audio codecs improve and networks quicken for internet telephony. Reverberation: Distance perception is affected by lateral reflections (Nielsen 1992). Also.7).

C. letting network servers push preprocessed data at thin clients. 2(3). 1989. within which the sound is heard at full volume. Spatialization can be done in a peer-to-peer style (P2P) by multimedia processors directly exchanging streams at the edge of a network (Cohen and Győrbiró 2009) or in a client-server style. 6 for “5. Greenhalgh.293 Dimensions of Spatial Sound and Interface Styles Full volume area Extent (nimbus) Fall-off for distance attenuation 1 0. radio-quality music. Even though a conferencing media server might be called a voice bridge in acknowledgment of tuning for speech. 239–261. 2010). it can still be applied to other kinds of audio sources. and Benford. et al.7 0..5 0 0 –3 –6 dfva de Level/dB Gain Clamping –∞ Distance from source FIGURE 12. Differentiated treatment seems to be indicated (Seo et al. 364–386. always available (24⋅7 or 365.7  Virtual sound projection: The full volume area (dfva) is represented by the dashed circle. S. possibly with near-field range manipulation. pegged at −0 dB. (From Benford. 2009) can all be streamed. 4. S. McGookin et  al.. ACM Transactions on Computer–Human Interactions. and auditory icons (Blattner et  al. sound effects (SFX).).) (Such limits are a little like clipping planes in graphics renderers that cull beyond the sides of viewing frusta. The extent (de) of the exposure refers to the nimbus in which a focus-modulated sink can hear a nimbus-modulated source. codecs. or network transmission—and therefore almost indistinguishable from in person events Persistent. around the world) Multichannel.1. Stereotelephony encourages directional and spatial sound and can be used seamlessly by AAR systems.) Clear (with minimal noise). 360° × ±90°: around the clock. quantization. 1995. transparent—uncolored by artifacts of sampling. with wide streams for polyphony (2 for stereo. 1995. Presence: Teleoperators and Virtual Environments. as in cloud-based audio engines extending the functionality of a PBX (private branch exchange).” etc. directly spatializing predetermined clips or loops (locally stored . For instance.

the smooth interaction of devices at different scales. handheld mobile devices. It also has the alternative meaning of a combination of producer and consumer. A standard will emerge to send location and position information upon connecting an ordinary phone call.3 Saturated: Distributed and Pervasive. and wireless devices Wearable computers Computational clothing (smart clothes) .3. The digital 4C [‘foresee’] convergence is the confluence of communication devices. Rapidly adopted technology has caused several new words to enter the vernacular. As summarized by Table 12. portable. including hybrid AAR leveraging heterogeneous hardware. as well as spatializing monophonic streams (such as telephonic channels) and mixing soundscapes with network-delivered multichannel streams (such as stereo music). which distinguish signaling side channels from realtime media streams. SIP systems. 12. reflecting increasingly affordable high-performance equipment and more discriminating users. nomadic. TABLE 12. POTS (plain old telephone service). The reproduction apparatus imposes restrictions and complications on the integration of augmented spatial sound systems.3 WEARWARE AND EVERYWARE: SOURCE AND SINK DIMENSIONS The proliferation of auditory display configurations is a challenge and opportunity for audio engineers and sound and user experience designers. mobile. Such metadata can already be carried by non-POTSs. computing.294 Fundamentals of Wearable Computers and Augmented Reality or synthesized). could be configured to convey such information as well as to support stereotelephony (Alam et al. in accordance with bottom-up crowd sourcing. Continuous and Networked. such as VOIP. such integration enables ubicomp (ubiquitous computing). Transparent or Invisible (Spatial Hierarchy of Ubicomp or Ambient Intimacy) Smart spaces Cooperative buildings and smart homes Roomware (software for rooms) and reactive rooms media spaces Spatially immersive displays Information furniture Networked appliances Handheld. as cycles and bandwidth flirt with human sensitivity. and content. and eartop headwear span a continuum of form factors. including voice chat for MMORPG (massively multiplayer online role-playing games) and other conferencing systems. For example. 2009). Stationary loudspeakers. prosumer catches the sense of a class of product and user reconciling the amateur/professional dichotomy. consumer electronics.

clamshells. Such intimate sounds in one’s sacred space can evoke the so-called ASMR (autonomous sensory meridian response).Dimensions of Spatial Sound and Interface Styles 295 12.3. restricted bandwidth. Wireless computing has gone beyond laptops. smartphones. mobile (like a smartphone). but requ­ ire extension to deliver stereo sound. nomadic (like a tablet). and headsets arrange eartop transducers straddling one’s head at the ears: in (as with earbuds). and portable (like a laptop)—represent one end of a spectrum. Headphone-like displays. ultrabooks.3.1 Capabilities 12. allow greater f­ reedom of movement while maintaining individually controlled audio display. perhaps especially for younger users. or over (as with circumaural headphones). tables. which adds a polarity-inverted image of ambient noise to displayed signals. susceptibility to noise (less robust n­ etwork). spoken input (via ASR: automatic speech recognition) and s­ ynthesized speech (via TTS: text-to-speech). IACC is bluntly subdued. corresponding to the vertical dimension in Figure 12.1  Mobile and Wearable Auditory Interfaces The dream motivating wireless technology is anytime. notebooks. palmtops. as outlined by Table 12.4. although somewhat cumbersome. Design. which modern algorithms have outgrown the drunken. and tablets (as well as smartbooks. the auditory source width (ASW). size. 12. including nearfield effects such as whispering.9. Auditory display form factors can be ordered according to degree of intimacy along this private-public dimension. Scandinavian robot accents (so to speak) of the recent past. headphones.3. personality. Mobile communication offers unique requirements and prospects because of interesting form factors (weight. AAR mobile browsers are naturally extended by voice interfaces to allow audio dialogs and mixed-initiative conversations: recognition of multitouch gestures. These endpoints delimit a continuum of useful interfaces. like that imagined by Figure 12. Headphones which block e­ xternal sounds are especially well suited for supporting active noise c­ ancellation (ANC).8. and social potential (universality). style. Besides the personal sound display systems illustrated by Figure 12.2  Form Factors Personal audio interfaces—including intimate. . they obviate pseudoacoustic features for which binaural ­microphone-captured signals * Polarity inversion can also be used as a crude way to broaden stereo imagery. and uniqueness are important characteristics of such wearable devices.* Open-back headphones are more transparent to ambient sounds. wearable. anywhere communications. Interaural cross-correlation (IACC) is related to the diffuseness or “solidness” of an image and affects the perceived spatial extent of a source.1. handheld. PDAs. Stereo earphones. interface). netbooks. on (as with supraaural headphones). Mobile terminals such as smartphones are portable and repositionable. touchpad/handheld computers. and the resultant soundscape is widened.10.1. the other end of which is marked by social displays. fashion. By inverting one side of a stereo pair. an ultimate consequence of wearware could be talking clothing. a euphoric tingling sensation sometimes compared to that from binaural beats. and handheld gaming devices) to include wearable and intimate systems and smart clothing. slates.

The New Stereo Soundbook. (Extended from Streicher.. F. CA.296 Fundamentals of Wearable Computers and Augmented Reality Monophonic: one speaker Monotic or monaural: one ear Diotic: one signal to both ears Stereophonic: two speakers Dichotic–biphonic: separate channels Dichotic–binaural: mannequin microphones Σ Σ Loudspeaker binaural: crosstalk cancellation Nearphones.L. 1996. in AES: Audio Engineering Society Convention (120th Convention). Audio Engineering Associates. A. R. 683–705. speaker crosstalk can be reduced. earspeakers: Pseudophonic: stereo speakers close to ears cross-connected dichotic FIGURE 12. and Cooper. (From Bauck. and Everest. Marui.. as front–back and even up–down disambiguation is flipped. By preconditioning stereo signals. D. 44(9). 2006. Pseudophonic arrangements allow a striking demonstration of the suggestiveness of head-turning directionalization. nearphones for unencumbered binaural sound. France. Pasadena.) Such measures can be obviated by placing the speakers near the ears. and Martens.A. A special implementation of this technique is called transaural. Spatial character and quality assessment of selected stereophonic image enhancements for headphone playback of popular music. Paris.. W.) .H. 3rd edn.. J. Journal of the Audio Engineering Society. even if the subject can see the source.L.8  Personal sound displays.

The New Yorker Collection from cartoonbank. The Panasonic OpenEar Bone Conduction Wireless Headphones use earpiece speakers placed in front of the ears. Nearphones (a. Earbuds. such as Bluetooth e­ arpieces. 20 dB of passive cancellation or noise exclusion is possible. .) must be fed into collocated earphones with pass-through amplification to restore real environmental sounds. Such bone ­knockers do not block ambient sounds and can extend vibrotactile low-frequency stimulation. All rights reserved.k. Figure 12. Because they occlude the ear canal. earspeakers) straddle the head without touching it. using vibration through one’s upper cheekbones.Dimensions of Spatial Sound and Interface Styles 297 FIGURE 12. evoke that worn by Communications Officer Lt. Even more intimate than an article of clothing is a digit in an orifice.11 illustrates utilizing sound transmission through finger bones. but are close enough to the ears to minimize cross-talk. which is also combinable with ANC (Heller 2012). Uhura in the original classic Star Trek science fiction TV series.a.9  Introducing…interactivewear—spatial sound wearware. (Copyright Bone conduction headphones usually conduct acoustic energy through the mastoid portion of the temporal bone behind the ears.

vehicle spatially immersive display (e. concert arena Eyetop HWDs (headworn displays). stereo dipole. Diffusion refers to degree of concurrent usage. (Adapted and extended from Brull. Cabin) Club. wearable computer Eartop headphones. .g.g. 40–48. 5... IMAX) Diffusion Potentially everyone Massively multiuser Multiuser Single use Reality Augmented reality Location Omnipresent Mobile Augmented virtuality Virtuality Synthesis Location-based Stationary FIGURE 12. NEC VisionDome Projection Speaker array (e. etc. et al.. fishtank VR.) Sound bells are parabolic or hyperbolic ceiling-mounted speaker domes which focus acoustic beams. earspeakers Interpersonal Couch or bench Multipersonal. desktop monitor HDTV. and Kishino. IEICE Transactions on Informations and Systems. 2008. familiar Social Home theater. laptop display. Ambisonics) Public Stadium. (e. P.3. 1321–1329.g.. ear buds Chair Nearphones. (From Milgram. tablet. Parametric speakers use ultrasonic microspeaker arrays (described later in Section 12. private Individual Headset. 1994.g..) Location refers to how and where such AR systems might be actually used. Computer Graphics Animation.. E77-D(2). HMDs (head-mounted displays) Smartphone. Cave™.g. VBAP. W. WFS. suitable for semiprivate audition in public spaces such as g­ alleries and museums.10  Augmented reality location diffusion taxonomy: augmented reality refers to extension of users’ real environments by synthetic (virtual) content. The Synthesis axis is an original MR–MV continuum.1. theater. transaural™) Surround sound. F.4 Audio and Visual Displays along Private↔Public Continua Display Proxemic Context Architecture Audio Visual Intimate personal.2). reality center Loudspeakers (e.. 28(4).2.298 Fundamentals of Wearable Computers and Augmented Reality TABLE 12. HOA) Public address Large-screen display (e.

in 5. Line arrays can be driven for beam-forming. are inserted into both ears. With crosstalk cancellation or compensation (CTC or XTC).Dimensions of Spatial Sound and Interface Styles 299 FIGURE 12.. M. 7. preconditioning of a stereo signal to minimize unavoidable leakage from each speaker to the contralateral ear.1. 22. a user can hear sound conducted through finger bones. Since low frequency sounds are not very directionalizable. computer monitors.11  Whisper wearable terminal: By sticking a finger in one’s ear. Loudspeaker arrays can be configured for various kinds of deployments: Discrete arrangements are used in theatrical or AV home theater ­surround sound. configurations (Holman 2008). Multichannel loudspeaker arrays are least invasive and are best suited for situations where listeners are confined to a predetermined space (the sweet spot or stereo seat) and delivered sound is independent of each listener’s position.2. such arrangements can emulate binaural displays.) Stereo loudspeakers often bestride visual media such as televisions. pp. using amplitude. A finger-ring shaped wearable HANDset based on bone-conduction. The cardinal number to the right of the decimal point in such designations counts the number of bass channels: signals from LFE (low frequency effects) channels are combined with lower frequency bands from other input channels by a bass management system which drives separate woofers. 2005. When fingers on opposite hands. Speaker array systems allow multiple listeners to enjoy a shared environmental display.and phase-modulation to produce directional lobes of emphasis via constructive and destructive interference.1. and projection screens. (From Fukumoto. 10. in ISWC: Proceedings of the International Symposium on Wearable Computing.2. a kind of binaural sound can be displayed. etc. arrangement of woofers is . driven by respective sides of a stereophonic signal. but the display has a unique surround feel provided by the combination of aerial and bone conduction. 10–13. The channels are not so sharply separated compared with ordinary stereophonic sound.

Kim et al. uses only gain adjustment (no DSP) across surrounding speakers for sound diffusion (Pulkki 1997. DirAC. WFS systems use densely distributed DSP-driven speaker arrays to reconstruct. which might be attached to a .1.1. 2010. Barco Auro-3D® and Dolby Atmos® are contemporary theatrical standards. Vector Base Amplitude Panning. Wavefield (numeric solution) and wavefront (analytic solution) synthesis systems can be thought of as a complement of HOA (Kim and Choi 2013). sources. DSP can also be applied to signals presented via speaker arrays to enhance spatiality. and sinks Dynamic realtime with head tracking for AAR applications Wikitude-style contents. for Directional Audio Coding. complex wavefronts. such as manipulations of source elevation (Tanno et al. including musique concréte and acousmatic sound Dynamic non-realtime Recorded stereo mix with animated soundscape Static realtime Ordinary teleconference with fixed locations Dynamic realtime VR or AR chatspace with moving avatars. orderable into a hierarchy of parameterizability: Static non-realtime Ordinary stereo mix. 2014).4  Head Tracking Since virtual sources do not actually exist. Melchior et al.3  Dynamic Responsiveness Spatial sound and spatial reverberation systems have varying degrees of responsiveness. 2010). but are simulated by multiple channels of audio. by Huygens’ Principle. within the sweet spot of which localization of virtual sources does not depend upon or change with listener position (Rabenstein and Spors 2008. with fixed soundscape Binaural recording spatial music. 12. Virtual ­surround ­systems such as DTS Headphone:X are intended to emulate with headphone ­display the externalized spatial effect of speaker arrays. Ambisonics and HOA. are less flexible for AAR than real time. 2009).3.3. according to immediacy of the audio stream and dynamism of the sources and sinks. a spatialization system must be aware of the position of a listener’s ears to faithfully create a desired sound image. Recently such public venue development has literally elevated. 2004). VBAP. dynamically generated networkdelivered audio streams. A head tracker. is based on spatial frequency sampling (Pulkki et al. are based on spherical harmonics and reprojections of a sound field captured by tetrahedral microphone arrays. with deployment not only of speakers at ear level but also extension to height speakers and ceiling-mounted voice of God zenith speakers. 12. Higher-Order Ambisonics. in which crowd-sourced augmentation is prepared offline in advance.300 Fundamentals of Wearable Computers and Augmented Reality usually independent of other speakers.

11 (a.2 Special Displays 12.k. digital broadcasting. dynamically amplify conditioned signals. This feature is important for robust localization and soundscape stabilization.Dimensions of Spatial Sound and Interface Styles 301 ? or ? FIGURE 12. suggesting proliferation of persistent sessions. etc. allowing sources to stay anchored in an environment as one’s head is turned. Anticipated features include wireless technology ­integration—linking global systems with local. security.4). high-resolution video transmission. including 3D VR and AR interfaces.3.3. and ZigBee (IEEE 802. such as IEEE 802. and such affliction is expected to worsen in the future. MIMO. along with smart antennas. perform frequency filtering across separate bands. A  ­catchphrase for fourth-generation mobile is always best connected. Metropolitan area network (MAN) systems such as LTE. reconstruct full-band signals from separated bands. can continuously detect orientation of a listener’s head.2. Modern hearing aids are a kind of wearable computer: they capture and digitize signals.1.3.1  Binaural Hearing Aids About 5% of the world’s population suffers from significant hearing loss. including natural audition.).12. A  sophisticated system synthesizes a soundscape in which sound source images respect each user’s head orientation in real-time (Gamper 2014). thanks to multiple-input multiple-output (MIMO) architectures which optimize antenna usage.5 Broadband Wireless Network Connectivity: 4G.15. like intercoms to close friends and relations or intimate colleagues. as illustrated by Figure 12. especially beam-forming phased arrays. adjusting selection of filters for input signals. ABC. and SDR Exponentially increasing speed of roaming communication surpasses that predicted by Moore’s law for computation. uses intentional and unintentional head turning to disambiguate front↔back confusion.16). 12. Wi-Fi) and Bluetooth—SDR (cognitive or software-defined radio). suppress noise. and seems even more inevitable. will make theoretical bandwidth and practical throughput across the composite het net (heterogeneous network) even broader and higher. set of headphones or a head-mounted display.12  Head tracking: Active listening. and advanced multimedia mobile communications (IPV6. 12.a. and resynthesize amplified analog signals . WiMAX (IEEE 802.

12. Smartphones and other audio terminals could be used not only as remote controls.2. and can be bounced off surfaces. in the range of 140–150 dB SPL) can be allayed. Binaural hearing aids can use the cross-channels to address that problem. including Holosonic Research Lab’s Audio Spotlight™ and American Technology’s HyperSonic Sound System. the first (sum) intermodulation term is inaudible. below 200–400  Hz (as there is an inherent 12 dB/octave high-pass slope. decompose into bands through dispersion in the air: cos(c) ×sin(c + s) = 1 1 sin(2c + s) + sin(s). etc.2  Parametric Ultrasonics Ultrasound has wavelengths shorter than audible sound and can be aimed in beams tighter than those formed by normal loudspeakers. Similarly. 2005).3. mobile spatial audio could be used more extensively for artistic purposes. 2010) was a multiuser . nonlinear effects on air of ultrasonic signals (nominally above around 20 kHz. ultrasonic-based audio displays could be as flexible as analogous light-based visual displays. For example. They work by modulating an audio source onto an ultrasonic carrier. Contemporary models feature modes that allow selection from among several programs according to the circumstantial environment (conversation in noise. The highly directionalized sound beams are steerable through their focus and controllable spreading (reportedly as low as 3°). not at the speakers. 12. which is then amplified and projected into the air by ultrasonic transducers. They have been researched for decades as parametric acoustic arrays but only in the last decade have practical systems become commercially available (Ciglar 2010).302 Fundamentals of Wearable Computers and Augmented Reality which are finally transduced back into sound. Audible sound is generated in the air. intense signals can cause howling. but also as distributed displays. a consequence of the way that ultrasound demodulates into the audible range). Such displays create audible signals through propagation distortion.3  Distributed Spatial Sound Another exciting field of application of audio technologies is distributed spatial sound. Since microphones and speakers are collocated in such devices. which limits amplification range—the so-called GBF. television.). around 40–60 kHz in current practice). The product of two ultrasonic signals.2. allowing personal sound in otherwise quiet spaces such as libraries. via a kind of body area network (BAN). gain before feedback. ad hoc or spontaneous loudspeaker arrays embedded among collocated users and helping to overcome the limited power of individual devices. SoundDelta (Mariette et al.3. a reference carrier and the combination of the carrier and a variable audio source. and concerns about health hazards (as the inaudible sounds can be very intense. including detection stages in the processing chain so that when oscillation is on one side but not in the contralateral device at the same frequency. If technical issues regarding such systems’ lower-frequency response. 2 2 As in a Theremin. it can be identified as feedback and suppressed (Hamacher et al. phone call. but the second (difference) is not. Consumer products such as GN ReSound Linx and Starkey Halo use Bluetooth-connected smartphone interfaces to make directional and spectral hearing aid adjustments.

2007). or nudging them in a particular direction (Cohen 2003) (see Figure 12.3. distributing torque across the internet to direct the attention of seated subjects.) . (a) (b) FIGURE 12. As an audio output modality. The input ­modality is orientation tracking.4  Information Furniture Internet appliances can be outfitted with spatial sound capability. to display visceral vibration that can augment audio-channel information.13). (b) mixed reality simulation compositing panoramic imagery into dynamic cg: a simulated simulator. (Model by Daisuke Kaneko. multimedia baths can be configured for extravagant surround sound Massage chairs (such as those made by Inada) can synchronize s­ hiatsu with music. servomotors can twist motorized chairs under networked control. directionalizing audio using dynamically selected transfer functions determined by chair rotation (Cohen et al. which dynamically selects transfer functions used to s­ patialize audio in a stable (rotation-invariant) soundscape.13  Information furniture: a pivot (swivel) chair with servomotor deployed as a rotary motion platform and I/O device. 12. (a) Rotary motion platform (Developed with Mechtec. orienting users (like a dark ride amusement park attraction). www. rendering spatial audio via headphones. A swivel chair can also be deployed as an instance of multimodal information furniture. shown along with its digital analog. Commercially available Pioneer BodySonic™ configurations embed speakers in the headrest and seat of lounge chairs and sofas. For haptic output Participants heard an interactive spatial soundscape while walking around a public area such as a town square or park.mechtec. as well as dance floors.Dimensions of Spatial Sound and Interface Styles 303 AAR environment that used ambisonic zones in a way that resembled cells in a mobile telephony network. Nearphones straddling the headrest provide unencumbered binaural display. Besides home theater. nearphones straddling a headrest present unencumbered binaural sound with soundscape stabilization for auditory image localization. for instance.

An ultrasound based instrument generating audible and tactile sound. 40–48. Soc. Oxford University Press. S. (1997). Bauck. July/August). J. J. W. 11–44. M. Begault. Blauert. A. NIME: New Instruments for Music Expression. Generalized transaural stereo and applications. Presence: Teleoperators and Virtual Environments 4(4). Virtual Environments and Advanced Interface Design. A. R. R. Greenhalgh. W. Auditory localization of nearby sources. M. Part 1: Virtual and Binaural Audio Technologies. and T. http://www. Headrelated transfer functions. S. the integration of wireline and wireless interfaces. Focal Press.1038/17374. Furness III (1995). Begault. A. Handbook for Sound Engineers (3rd edn. Wearware—wearable computers and mobile. For contemporary instance. Networked virtual reality and cooperative work. and R. and R. Ashir (2009). pp. Springer. L. J. and W. W. Spatial Hearing: The Psychophysics of Human Sound Localization (revised edn. M. Aud. Greenberg (1989). D. and A. J. 364– Human Computer Interaction 4(1). DOI:10. and portable networked communication devices—and internet appliances and ubicomp interfaces allow heterogeneous multimodal displays. Eng. Cooper (1996. Benford. New South Wales. Computer Graphics Animation 28(4). Nature 397(6719). DOI: 10. REFERENCES Alam.4  CONCLUDING REMARKS AR can leverage fixed-mobile convergence (FMC). Rabinowitz (1999). Lindt. Fahlén. and D. (1991). M.427180. In Proc. Carlile. Herbst. Hybrid configurations will emerge. for personalized soundscapes including multilingual narration. (ed. Bowers. A. www. The Journal of the Acoustical Society of America 106(3). I. Sydney. Ahson and M. the Sony IMAX Personal Sound Display allows individual binaural channels on top of the theatrical six-channel system. Villegas. Ohlenburg. M. J. J. Narrowcasting in SIP: Articulated p­ rivacy control. nomadic.cfm?elib=7888. L. Blattner.aip. and considers more closely the possibilities of such fluid frames of reference. February 11). Towards Next-Gen mobile AR games. Barfield. (1996).. and Security of Session Initiation Protocol. Houtgast (1999. Brull. I.. Ballou.crcpress. D. Mariani. .).) (2004). Rodde (1995).1121/1. Australia. CRC Press/Taylor & Francis. D. Indoor sensors and outdoor GPS-like navigation systems will fuse into seamless tracking that leverages user position to enhance applications and location-based services: whereware. In S. SIP Handbook: Services. H. Audio Engineering Society. Brungart. and T. 683–705. 3-D Sound for Virtual Reality and Multimedia. M.). 44(9). Chapter 14. A. S.304 Fundamentals of Wearable Computers and Augmented Reality 12. Ciglar. 517–520. Cohen. and T. C. S. (1994). Spatial Sound Techniques. MIT Press. http://link. Auditory distance perception in rooms. such as loudspeaker arrays in conjunction with eartop displays (via BANs) and ad hoc arrangements of mobile phone speakers. Virtual Auditory Space: Generation and Applications. 323–345. Wetzel (2008. The dichotomy between mobile and LBS is resolved with mobile ambient transmedial interfaces that span both personal mobile devices and shared locative public resources. June). (2010. D.. Technologies. 1465–1479. The next chapter surveys various applications of such capabilities. Ilyas (eds. Braun. Sumikawa.). J.. Academic Press. M. including AAR: everyware. September). Earcons and Icons: Their structure and common design principles.

. Hybrid indoor and outdoor location services for new generation mobile terminals. Soc. June). (2003). Hattori. Ficco. J. 2915–2929. www. Villegas (2011. The design of multidimensional sound interfaces. Aizu-Wakamatsu. and A. virtual reality. A finger-ring shaped wearable HANDset based on bone-conduction..) (1995). K. 44(2). October). Aalto University. 511–516. August). H. I. T.1121/1. Barfield and T. K. Tsingos. Cyberspatial audio technology. and U. Virtual Reality—Science and Technological Challenges. 143–158. and future trends. S. Paik. Chapter 8. Failure to localize the source distance of an unfamiliar sound. pp. Martens (1999. National Research Council. and D. Coleman. Greenhalgh. PhD thesis. September). M. pp. Greenebaum. Iida. Journal of the Acoustical Society of Japan 20(6).) (1997). Fukumoto. Enabling technologies for audio augmented reality systems. (2014). Palmieri. Mavor (eds.1911132. 297–311. Zao. Lorho (2004.. J. T. D.  Brungart.) (2004). and R. Chalupper. In ISWC: Proceedings of the International Symposium on Wearable Computing. Puder. and A. Japan. and J. U. 291–346.2915. Eggers. Kato. Suzuki. November).1928121.and whitherware: Augmented audio reality for position-aware services. A. Cabrera. M.Dimensions of Spatial Sound and Interface Styles 305 Cohen. Soc. Gamper. M.1155/ASP. Spatialization with stereo loudspeakers. Lokki. and Y. D. V. A. In ISVRI: Proceedings of the International Symposium on Virtual Reality Innovations. 618–639. (1962). H. See Greenebaum and Barzel (2004). Anderson (eds. In T. Binaural and Spatial Hearing in Real and Virtual Environments. pp. Wellesley. MA: A K Peters. http://urn. Audio Anecdotes II. Cohen. Hamacher. Miyazaki. SIGGRAPH Course Notes. Virtual Environments and Advanced Interface Design. Espoo. 631–632. and S. Mobile narrowcasting spatial sound. 34(3). and W. February). M. Am. Cohen. Dual rôle of frequency spectrum in determination of auditory distance.aip. Cohen. Rass (2005. 271–285. I. Coleman. and E. and Y. Kornagel. and T. Massive: A collaborative virtual environment for ­teleconferencing. (Special Issue: Mediated Reality) 15(2). challenges. M. Durlach. Augmented reality audio for mobile and wearable appliances. (1968). Győrbiró (2009. October). (2005. Japan. T. CIT: Seventh International Conference on Computer and Information Technology. E. isvri2011. 239–261. H. C. NJ: Lawrence Erlbaum Associates. pp. Jakka. J. Journal of the Audio Engineering Society 52(6).aes.cfm?elib=13010. Karjalainen.2005. J. http://scitation. Tikander. March). P. International Journal of Human-Computer Interaction. N. Iwaya (eds. Sounds good to me!: Computational sound for graphics. Cohen. Härmä. Am. M. In W. The Internet Chair. N. Gilkey. and J. Control of navigable panoramic imagery with information furniture: Chair-driven 2. J. M. Cohen. PUC: Personal and Ubiquitous Computing 18(2). H. Finland. National Academy Press. 10–13. Signal processing in high-end hearing aids: State of the art. Oxford University jasa/44/2/10. J. IWPASH: Proceedings of the International Workshop on the Principles and Applications of Spatial Hearing. M. Furness III (eds. and G. D. Funkhouser. Acoust.. Singapore. and interactive systems. L. Benford (1995. January).). Fischer. F. H. R. Jot (2002. Fouad..5D steering through multistandpoint QTVR Panoramas with automatic window dilation.. From whereware to whence. Herder. 345–346. Doi. Mahway. DOI: 10. In Y. 389–395. Wei (eds. November). and N.1121/1. Proc. J. K. M. Castiglione (2014. Acoust. ACM Transactions on Computer—Human Interaction 2(3). D. Wenzel (1995). R. M. Hiipakka.). DOI: Barzel (eds.). M. Mine (2007. (2004). EURASIP Journal on Applied Signal Processing 2005.

doi. Gedenryd ( R. and O. (2001. 337–344. Ludwig. Computer Music Journal 9(2). The Journal of the Acoustical Society of America 88(4).. Ikeda. In Proceedings of ICAD: International Conference on Auditory Display. www. Journal of the Audio Engineering Society 62(5). 24–29. December). http://press. Computer Music Journal 25(4). Martens. L. A. Milios (2008.. (2003. . W. Thurlow (1969. G. and H. M. 41– C. Katz. December). Active localization of virtual sounds. and J.). M. In AES: Audio Engineering Society Convention (120th Convention). About audium—A conversation with Stanley Shaff. Real-time spatial processing of sounds for music. www. W. (2012). How we localize sound. L. W. G.K. and A.html. D. 31–38. Princeton University Press. Wiley. Holland. Physics Today 52(11). Freed. W. London.princeton. G. F.220. U. K. J. Guillerminet (2010. 212–233. and J. Jot. D. Schaik (2009). Holt. G. L. PUC: Personal and Ubiquitous Computing 6(4). Stereo Computer Graphics and Other True 3D Technologies. Multimedia Systems 7(1). M. 197–207. 1016–1027. Mariette. September). October). Kim. A. (2008). Cicinelli (1990.. D. Jin. Kendall. Spatial character and quality assessment of selected stereophonic image enhancements for headphone playback of popular music. (1999). S. Martens. (2001. Journal of the Audio Engineering Society 57(12). Martens. AudioGPS: Spatial audio navigation with a minimal attention interface.pdf. C. (1985). DOI: 10. Virtual audio systems. Loomis. Jenkin. (1993). Chapter 10. http://dx. Heller.. In AES: Audio Engineering Society Convention. Kleiner. Surround Sound: Up and Running (2nd edn. May). 1757–1763. Image model reverberation from recirculating delays.. M. J. Kim. and M. R. W. Psychophysical calibration for controlling the range of a virtual sound source: Multidimensional complexity in spatial auditory display. 55–69. Subject orientation and judgment of distance of a sound source.306 Fundamentals of Wearable Computers and Augmented Reality Hartmann. V.). SoundDelta: A study of audio augmented reality using WiFi-distributed ambisonic cell rendering. (1999. (2011). T. Perceptual evaluation of filters controlling source direction: Customized and generalized HRTFs for binaural synthesis. Oxford. S. Andersen and L.24. and W. F. Why You Hear What You Hear. Martin. http://legacy. and W. Marui. L..html. Qvortrup (eds. Choi (2013). 527–549. Reproducing virtually elevated sound via a ­conventional home-theater audio system. G. and R. S. November). Hebert. E. multimedia and interactive human–computer interfaces. Wayfinding. Karstens (1986). The Journal of the Acoustical Society of America 46(6).wiley. www. In P.html. Boussetta.-M. pp. B. Presence: Teleoperators and Virtual Environments 17(6).spa. J. pp.­ whyyouhearwhatyouhear. In Proceedings of the 128 Audio Engineering Society Convention. Princeton University Press. Acoustical Science and Technology 24(5).1162/01489260152815279. John Ross Publishing. E.-H. http:// (2004). Toward reality equivalence in spatial sound diffusion.aip. London: Springer. Acoustics and Audio Technology (3rd edn.1250/ast. M. May). Psychoacoustic evaluation of systems for delivering spatialized augmented-reality audio. Martens (2014). R.mitpressjournals. Holman. B. Kapralos.: Elsevier/ Focal Press. pres/17/6. Virtual Applications: Applications with Virtual Inhabited 3D Worlds. M. N. D. 253–259. December). J. Paris. D. Malham. New York. May. Winter). Martens (2006. ships and augmented reality. Sound Visualization and Manipulation.). McAllister. Morse.aalto. 1584–1585. and E.-W. 220–232. December).

159–172. Leonardo 35(3). Htoon.) (2006). Back. A. M. S. Rabenstein. In Proceeding of CHI: Conference on Computer– Human Interaction. and T. and S.). Emmerson (ed. R.. M. Spatialization with multiple loudspeakers. D. pp. Ellis ( M. http://dx. (1997. In HAID: Proceedings of the International Conference on Haptic and Audio Interaction Design. . Audio aura: Lightweight audio augmented reality. Brungart. 41–50. Pulkki. Baer. Auditory distance perception in different rooms. Spatial Audio. Spors (2008). J. Seo. Chapter “Virtual auditory displays. Springer. Audio Engineering Society. Taipei. San Francisco. Republic of China. In Audio Engineering Society (92nd Convention). Rocchesso (2011). www. and D. Rumsey. H. Spectromorphology: Explaining sound-shapes. D. http://dx. Spors (2010. 409– Part I. Auditory perception: The near and far of sound localization. (2002).. IWPASH: Proceedings of the International Workshop on the Principles and Applications of Spatial Hearing. Pihlajamäki ( Chapter 53. and Applications. August). R361–R363. and L. Pulkki. Mershon. D. and P. Germany. Nielsen. K. December). and Y. 139–184. 1321–1329. V. (1992. P. eproceedings. Rozier. pp. S.worldscinet. In S. pp. Ahonen. Huang (eds. F. V. R. (ed. Rumsey. DOI: 10. Shilling. Perception & Psychophysics 18(6). Iida. E. Smalley. Chapter 5. Wang (2010.doi.). IEICE Transactions on Information and Systems E77-D(12).1145/ ace2010. (1986). D. ntpu. J. and J. and S. Mahway. 65–92. Directional audio coding—Perception-based reproduction of spatial sound. March).doi. F. King (1975). California. Handbook of Speech Processing. T. V. In U. 248. Shinn-Cunningham (2002). The Language of Electroacoustic Music. (1997). November). Benesty. www. November).-D. King (1999). M. pp. Pulkki.1145/274644. pp. DAFX: Digital Audio Effects (2nd edn. Kishino (1994. M. In Proceedings of ICAD: International Conference on Auditory Display.).­ mitpressjournals. Priego (2009). April). November).. Pulkki. Sound field reproduction.-V.1017/S1355771897009059.1007/978-3-642-04076-4_5. Organised Sound 2(2). Audio bubbles: Employing nonspeech audio to support tourist wayfinding. (2001). Smalley. ACM Press/Addison-Wesley. Current Biology 9(10). See Greenebaum and Barzel (2004). J. Focal Press. Spatial Sound Techniques. R. S. Human Factors and Ergonomics. B. Dresden. Spatial effects. Hear and there: An augmented reality system of linked audio. D. Ahrens.Dimensions of Spatial Sound and Interface Styles 307 and B. Cabrera (eds. D.  Suzuki. Cambridge. Zimmermann. California. (2004. Melchior. A taxonomy of mixed reality visual displays. V. June).. T.html. and C. Donath (2004. DOI: 10. J. In J. Karahalios. Brewster. Kato. Handbook of Virtual Environments: Design. Spectro-morphology and structuring processes. B. M. 107–126. K. Mynatt. Spatial audio reproduction: From theory to ­production. Milgram. 1095–1114. Japan. Wiley. R. Los Angeles. and A. Moore. Intensity and reverberation as factors in the auditory perception of egocentric distance..cfm?elib=6826.aes. H. Vilkamo. M. Part 2: Multichannel Audio Technologies. Sondhi. 456–466.). In Proceedings of ACE: International Conference on Advances in Computer Entertainment Technology. and J. and D. H. Lokki. In Y. Springer-Verlag. K..). 0056. J. F. and F. D. Want. New Jersey: Lawrence Erlbaum Associates. In AES: Audio Engineering Society Convention (129th Convention). Shaff. “Spatializer”: A web-based position audio toolkit. R. Virtual source positioning using vector base amplitude panning. Journal of the Audio Engineering Society 45(6).” pp.274720. Lokki. AUDIUM: Sound-sculptured space. Implementation.274720. Zao. April). 566–573. E. Zölzer (ed. Massachusetts: Macmillan-Palgrave.

K. J. Presence: Teleoperators and Virtual Environments 1(1). http://doi.1142/7674. and Applications. (2001). Seoul vrcai2010. Sadalgi.1866240.. Principles and Applications of Spatial Hearing.. Watkinson. Cynader (1991). O. P. 401–402. Armonica: A collaborative sonic environment.acm. and Y.). San Francisco. Video. Hearing Research 52(5). Sukan. Pasadena. Y. California: Audio Engineering Associates. X. Cohen (2010. Hrir: Modulating range in headphone-reproduced spatial audio. P.308 Fundamentals of Wearable Computers and Augmented Reality Stanney. 80–107. Kato (eds. Localization in virtual acoustic displays.1866240. June). org/10. Suzuki. Entrena. Wenzel. Li. M. D. Paper 8281. Iida. and F. 233–244. S. Academic Press. Villegas.. Bronkhorst (2005. and S. Everest (2006). and M. M. Acta Acustica United with Acustica 91(3). R. M. K. J. Convergence in Broadcast and Communications Media: The Fundamentals of Audio. and J.. J. Brungart.. (2002). Human Factors and Ergonomics. Zahorik. Auditory distance perception in humans: A summary of past and present research. (1992). . Aural intensity for a moving source. and M.1145/1866218.1145/1866218. Streicher. Implementation. Y. DOI: and A. A. D. Handbook of Virtual Environments: Design. The Nature and Technology of Acoustic Space. Oda. and H. Singapore: www. Data Processing and Communications Technologies. The New Stereo Soundbook (3rd edn. In VRCAI: Proceedings of the Ninth International Conference on Virtual-Reality Continuum and Its Applications in Industry. Zakarauskas. S. Mahway. H. Qi. Shi. D. Cabrera. Feiner (2010).) (2011). W. Tohyama.​ 10. Tanno. K. A. E. A 3-d sound creation system using horizontally arranged loudspeakers. Ando (1995). In ACM Symposium on User Interface Software and Technology. California. Saji. 409–420. Suzuki. Iwaya. Huang (2010). M. pp. New Jersey: Lawrence Erlbaum Associates. M. Brungart. In AES: Audio Engineering Society Convention (129th Convention).worldscientific. December). Focal Press.

........................... 314 13.............................................................................................. 312 13......................... 322 13..2... recapitulated in Figure 13...................... Everyware........... 310 13..................................................1.................1  INTRODUCTION AND OVERVIEW The previous chapter outlined the psychoacoustic theory behind cyberspatial sound.............1 Navigation and Location-Awareness Systems........................1 Audio Windowing.............................. 315 13.............. 315 13..2 Performance..............3 Anyware and Awareware....3........... 313 13..........7 Entertainment and Spatial Music...................................309 13..............2...........3... This c­hapter considers application domains................2 Narrowcasting...4 Layered Soundscapes.............................. 314 13...............................3 Synesthetic Telepresence... 324 13.............. Whereware was described as a class of location...........................................................................................................4 Security and Scene Analysis............................................................4.......13 Applications of Audio Augmented Reality Wearware.........2.................................................2.......................... 321 13............................3 Multipresence...............5 Motion Coaching via Sonification...................................2 Assistive Technology for Visually Impaired..................... particularly those featuring spatial sound..................4................. 314 13..................1 Capture and Synthesis....3........ 315 13........................ and Awareware Michael Cohen and Julián Villegas CONTENTS 13................ Anyware...........................2.........2.........4 Challenges............3..........................and position-aware interfaces............ and the idea of audio augmented reality (AAR)........................................................6 Situated Games............................. 313 13......................................................... and display configurations to 309 ............ 317 13.............5 Concluding Remarks................. 321 13.......... 323 References..... including review of its various form factors...4.................................................................................................2 Applications.................3 Authoring Standards... interaction styles....................................................................... 319 13.............. 311 13..2.............................. 320 13......1 Introduction and Overview.......

Such devices will naturally include spatialization options for audio reproduction. making it difficult to identify and avoid hazards. professional. and leisure application areas are surveyed. including multimodal augmented reality (AR) interfaces featuring spatial sound. not only those that explicitly attempt . especially audition. Utility. adoption of novel visual technologies precedes that of audio technologies. Researchers are exploring alternative ways to present information via complementary sensory modalities. Two more “…ware” terms are introduced: anyware here refers to multipresence audio windowing interfaces that use narrowcasting to selectively enable composited sources and soundscape layers. IID (interaural intensity difference). contralateral signals at the ear away from a source are weaker and later. Historically.310 Fundamentals of Wearable Computers and Augmented Reality FIGURE 13. Ipsilateral signals at the ear closer to a source are stronger and earlier.1  Schematic summary of binaural effects: ITD (interaural time delay or ­difference). in the context of mobile ambient transmedial interfaces that integrate ­personal and public resources. and awareware automatically adjusts such narrowcasting. 13. In this section we survey a broad variety of applications of AAR. realize AAR.2 APPLICATIONS Users of visual AR systems can sometimes be subject to bewildering displays of information. and 3D visual displays are deployed on mobile platforms. we can expect a great increase in number of applications using wearable spatial audio technologies. Consideration of (­individual) wearware and (ubicomp) everyware is continued from the ­previous chapter. Current mobile technology already offers display resolution finer than human visual acuity at normal viewing distances. Considering such trends. Miniaturization of components will allow creation of devices small enough to be worn over the ear and controlled by facial or tongue movements. maintaining a model of user receptiveness in order to modulate and distribute privacy and attention across overlaid soundscapes. and head shadow (frequency-dependent ­binaural attenuation).

shouts. Jones et al. and recreational domains. Intelligent Traffic System (ITS) information can be spatialized using in-vehicle loudspeakers (Takao 2003). navigation instructions. Commercial products featuring location-aware audio rendering. For simple ­example.1  Navigation and Location-Awareness Systems Spatialized sound and music can be deployed to aid way-finding (Loomis et al. such as Audio Conexus GPS tour guide systems. and spatialized sound effects can enliven narration of a tourist guide. retrofitting a vehicle with location-aware announcements (tourist information. are currently used in tourism. . but a few examples illustrate such systems. a virtual sound source can guide a driver around a corner. Such products trigger contents display as a vehicle enters a designated zone. explosions. and traffic advisories).2  Localized beacons and directionalization for vehicular way-finding and way-showing. Systems like those outlined in the last chapter are being deployed in utility. as illustrated in FIGURE 13. Bederson 1995. as seen in Figure 13. We extended some of Takao’s ideas in GABRIEL (Villegas and Cohen 2010). An exhaustive review is beyond the scope of this chapter. considered here in that order. Besides using landmarks to trigger audio stream delivery.2. 2008). 13. These announcements were delivered via wireless headphones for passengers and bone-conduction headphones for the driver.Applications of Audio Augmented Reality 311 to desaturate the visual channel. our prototype used geolocated virtual sources that reprojected with updates in the position and course of the vehicle.2. A relaxed understanding of wearability admits consideration of vehicles as a kind of wearable computer. For example. 1990. at a historic battleground. It is easy to extend such capabilities to enact preprogrammed radio plays. professional.

virtual sources Tourists FIGURE 13.2.3. 2012). allowing users to hear what they (cannot) see.2 Assistive Technology for Visually Impaired Another rich field in binaural wearable computing is in assistive technologies for the visually impaired (Edwards 2011. 2014). 13.312 Fundamentals of Wearable Computers and Augmented Reality GPS receiver Driver RS-232. Motivated by the laborious process of navigating through series of menus in small screens. 2014). Mapping camera-captured optical frequencies to speaker-displayed audible tones (Foner 1997) induces synesthetic experience. beyond its original sense of a subject experiencing directly apprehended stimulation in another sensory modality. 2006). as some find these systems piteous or unnecessary. presenting distance cues as a inverse function of pitch and amplitude and letting sound duration represent the size of objects. especially compared to traditional white cane . the Eyeborg augmentation device maps colors into sounds. 2012) uses personalized head-related impulse responses (HRIRs) for azimuth and elevation cues. For instance. The goal is to provide users of wearable computers with complementary cues for way-finding (or way-showing) and situation awareness. Katz et al. One representative system (Bujacz et al. to apply to sensation invoked by the display of a mediating system mapping stimuli cross-modally (White and Feiner 2011). including both static contents and dynamic streams. achieving almost perfect recognition when four items are displayed at 60° intervals in front of the user. even for colorblind (Harbisson 2008). Other researchers have concentrated on easing interaction between users of wearable computers and the devices themselves. Figure 13. Such electronically assistive tools are still controversial in the blind community. Electronic travel aids (ETAs) are tools for the blind with which obstacles and paths are displayed haptically or acoustically (Terven et  al.3  System architecture of GABRIEL: vehicular GPS information updates the soundscape. menu items can be presented via spatiotemporal multiplexed speech enhanced with spatial audio (Ikei et al. Note that the idea of synesthetia has been generalized here. USB Broadcaster OSC Bone conduction OSC Audio server Wireless Landmarks. These ideas can be applied not only to smart or connected vehicles but also to mobile pedestrian interfaces featuring visual navigation enhanced with spatial sound displayed via headphones (Sanuki et al.

applications of which date back more than a century (Scientific American 1880). Such multimodal displays support what could be called synesthetic telepresence.2. developed by QinetiQ North America. GPS-enabled devices can coordinate operations between multiple units. so that. many blind users of such systems regard the resultant experience as more akin to sensory amputation than to sensory augmentation. These systems display estimated position of a sniper aurally as well as visually: incoming shot announcements are transmitted to an earpiece while a wrist display provides range. and caliber (Duckworth et  al. Blind signal or source separation (BSS) is the separation of a set of source signals from a set of mixed signals. Such detection systems use several microphones to capture supersonic gunshot. Individual Gunshot Detector ­technologies are passive acoustic systems that detect and localize sources of small arms fire. sharing its egocentric perspective for telepresent experience.4 Security and Scene Analysis Although spatial audio is usually thought of as a display technique. Modern wearable spatial audio technologies have had a fruitful contribution to gunshot position detection. developed by Raytheon BBN Technologies. Human ability to localize auditory events can be augmented though hookups from expanded microphone arrays. for instance. speed. and the Shoulder-Worn Acoustic Targeting System. For instance. 13. Another example of spatial sound capture in the field is the use of portable microphone arrays for reconnaissance and surveillance. important data might be sonified as well as visualized. 13. The most appropriate telepresence mapping might rely on crossing modal boundaries.Applications of Audio Augmented Reality 313 or guide dog approaches. Indeed. an auditory rendering of radioactive hot-spots could be delivered to human pilots. unimodal mapping of sensor data to displayed media is not compulsory: mediation of experience can substitute or include cross-modal stimulation. They are tuned to detect the crack bang of a shot. taking into account models of ballistics and acoustic radiation for calculating projectile deceleration. a remotely controlled telerobot (Tachi 2009) or UAV (unmanned aerial vehicle. azimuth.3 Synesthetic Telepresence Mixed reality audio interfaces are not limited to first-person experiences: secondperson teleoperated vehicles can exploit AAR displays. then analyze the recordings to determine bullet trajectory. 2001). directionalized audio sources aligned with the actual hazardous environment. recognizing audio signatures comprising the muzzle blast from rifle fire and shockwave of a bullet while screening out other acoustic events. Examples include the Soldier Wearable Shooter Detection System. A biologically inspired technique processes portable microphone array signals to . Overlaid on the drone’s naturally captured soundscape. and elevation coordinates of the shooter position.2. since a one-to-one. its principles can be applied to sound capture and auditory scene analysis (Bregman 1990). or drone) exploring a nuclear power plant might be equipped with sensors such as directional dosimeters (Geiger counters) as well as binocular cameras and binaural microphones.

2011). decibel 151 (Stewart et al. In such systems. Beyond security. 2005) are a fertile domain for AAR (Cater et al.2. which are used to identify and locate acoustic sources in the environment. sensors (e. A teacher can detect problems in a performance in a way that might be difficult without auditory feedback. including cross-modal applications such as those mentioned earlier. and pitch periodicity. to teaching dance (Grosshauser et al. Situated or location-based games (Gaye et al. left limping triggers an increase of music level on that channel). 2009) was an art installation and music interface that used spatial audio technology and ideas of social networking to turn individuals into walking soundtracks as they moved around each other in a shared real space and listened to each other in a shared virtual space. Magerkurth et al. and underwater acoustic monitoring. The game industry is interested in multimodal systems enriching impressions of motion and inducing motion changes via synchronous sound effects (Akiyama et al. ­accelerometers. interaural time delays or differences (ITDs) and interaural intensity differences (IIDs). sound gardens.g. but spatial music is still an underexploited capability of distributed media in general. such as MARA (Mobile Augmented Reality Audio). Magas et al. 2009).. acoustic data logging.g.6 Situated Games Location-aware games can use spatial audio to increase engagement of players (Paterson et al. 2012) and using vibration to reinforce auditory events (Chi et al. for example.. Some ­speciality publishers include spatial music in their catalog. BSS is the parsing of admixtures. Sonification displays benefit from spatialization: an event can trigger positional change in music being played (e. the analysis of a soundscape. Deligeorges et al. 13. 13. 2008). 2008). gyroscopes.5 Motion Coaching via Sonification Sonification is the rendering of non-auditory data as sound. a system that allows playback and recording of binaural sounds to track user position (Peltola et al. Breebaart and Faller 2007. that is. to mention a few applications. 2007). how to walk with a prosthetic limb.2.7 Entertainment and Spatial Music Spatial sound can be used to enhance musical expression and experience. and pressure sensors) are worn or used by a dancer to sonify ­performance. for example. including popular encodings such as MP3 and AAC (Quackenbush and Herre 2005. as many mammals do (Handel 1989. Breebaart et  al.314 Fundamentals of Wearable Computers and Augmented Reality extract acoustic features such as spectral content. employing sonifications somewhat like the squeaky shoes that delight toddlers by chorusing their footfalls. 2003. Platforms to build such kind of interactions are being developed. Audio infoviz (information visualization) can be applied.2. 2008. 2012). Radio dramas such as enactment of Stephen King’s . Such monitoring could work in combination with audio players that alter music to reflect motion of rehabilitation patients learning. such a system could be used for characterizing animal behavior. 2009). 13.

Interactive music browsing can leverage cyberspatial capability. (As defined in the last chapter. Fernando et  al. such as that suggested by the IEEE 1599 Music Encoding and Interaction standard (Baggi and Haus 2009). can be combined simply by summing. a panoramic potentiometer can control the balance of a channel in a conventional (left–right stereo) mix. in analogy to graphical windowing user interfaces. Head-tracking systems can anchor such soundscapes so that they remain fixed in a user’s environment.3.Applications of Audio Augmented Reality 315 Mist (King 2007) are enlivened by binaural effects. and reordered. Algazi and Duda 2011). In graphical user interfaces (GUIs). Soundscapes. As dilating network bandwidth allows increasingly polyphonic soundscapes. For instance. represent inviting opportunities for spatial sound. or at least a potential for sensory overload.) 13. or other conditioning might yield more articulate results. normalization. Cohen 2000. a sink is the dual of a source. used instead of listener to distinguish it from an actual human. as suggested in Figure 13. treats soundscapes as articulated elements in a composite display (Begault 1994). formalized by the expressions shown in Figure 13. narrowcasting (Alam et al. equalization. Interface paradigms such as audio windowing (Cohen and Ludwig 1991a). and hypermedia-encoded musical experiences. minimized. Virtual concerts can be optionally presented with perspective. 2005.5. to . application windows can be rearranged on a desktop. to better distinguish them from outdoor scenes. some sources might be muzzled or muted and some sinks might be narrowcastingly muffled or deafened. including AAR. any-. users and applications can set parameters corresponding to source and sink positions to realize a distributed sound diffuser. Audio windowing similarly allows such configuration of spatial soundscapes. Narrowcasting (Cohen and Ludwig 1991b. interior soundscapes might have reverberation applied. to make a composited soundscape manageable. and panoramic or practically panoramic displays (Cohen and Győrbiró 2008) can soften such information overload.3.1 Audio Windowing Audio windowing. a multidimensional pan pot. including allowing designation of multiple sinks for a single user.3  ANYWARE AND AWAREWARE Applications featuring AAR are not mutually exclusive. 13. maximized. augmented audio listeners might be overwhelmed with stimuli.4. Multitasking presents an over-tempting invitation to abuse. uni-. More significantly. 13. 2009)—by way of analogy with broad-. and multicasting—is an idiom for limiting media streams. as explained later in this chapter.2  Narrowcasting As mentioned in the last chapter. although in practice some scaling (amplification and attenuation). 2009). By using an audio windowing system as a mixing board. such as relative to a television (Algazi et al. analogous to layers in graphical applications. so that listening position coincides with virtual camera position. Audio mixing techniques can be deployed in user interfaces for acoustic diffusion (Gibson 2005).

But a second interpretation means freedom from disturbance. Narrowcasting operations manage privacy in both senses.) distribute. All rights reserved.1. The New Yorker Collection from cartoonbank. Anyware models are separate but combinable scenes.316 Fundamentals of Wearable Computers and Augmented Reality FIGURE 13. the first association being that of avoiding leaks of confidential information. respectively representing sound emitters . Advanced floor control for chat spaces and conferences is outlined in Table 13. (Copyright 2015. in the sense of not being bothered by irrelevance or interruption. attention. and control privacy. Sources and sinks are symmetric duals in virtual filtering duplex information flow through an articulated conferencing model. Privacy has two interpretations.4  Surroundsound: soundscape overload. ration. protecting secrets. allowing a user to have selective attendance across multiple spaces. and presence.

A human user might be represented by both a source and a sink in a groupware environment. allowing a user to monitor and inhabit many spaces (Cohen 1998). For deafen and attend. Modulation of source exposure or sink attention (Benford et al. Being anywhere is better than being everywhere. respectively. suggests desirability of realtime communication via persistent channels for media streams. especially coupled with position tracking systems.5  Simplified formalization of narrowcasting and selection functions in ­predicate calculus notation. Similarly. reflecting a sensitivity to one’s availability. Sources can be explicitly turned off by muting or implicitly ignored by selecting some ­others. like the “online”– “offline” status switch of a conferencing service. In groupware environments. encourage multipresence. narrowcastingenabled audition (for sinks) or address (for sources) of multiple objects of regard. allowing a single human user to designate doppelgänger delegates in distributed domains. 13. and the semantics are analogous: an object is inclusively enabled by default unless (a) it is explicitly excluded with mute (for sources) or deafen (for sinks) or (b) peers are e­ xplicitly included with solo or select (for sources) or attend (for sinks) when the respective object is not. the ­inhabiting by sources and sinks of multiple spaces simultaneously. since it is selective. and collectors. where “¬” means not. Multitasking users want to have presence in several locations at once.3. multipresence is distilled ubiquity. “∃” means there exists. and “⇒” means implies. partially softened with muzzling and muffling (Cohen 1993). Audibility of a soundscape is controlled by embedded sinks. . Greenhalgh and Benford 1995) need not be all or nothing: nimbus and focus can be. the sink relation is active(sink x) = ¬ deafen(sink x) ∧ (∃ y attend(sink y) ⇒ attend(sink x)). “∧” means conjunction (logical “and”). 1995. Multipresence is an interface strategy for managing attention and exposure. mute explicitly turning off a source. for mute and solo (or select). FIGURE 13. Enriched user interfaces. The duality between source and sink o­ perations is strong. from ­journaling through microblogging to life-streaming and tweets.Applications of Audio Augmented Reality 317 The general expression of inclusive selection is active(x) = ¬ exclude(x) ∧ (∃ y include(y) ⇒ include(x)). or perhaps by multiple instances of such delegates. sinks can be explicitly deafened or implicitly desensitized if other sinks are attended. both one’s own and others’ sources and sinks are adjusted for privacy. Display technology can enable such augmented telepresence for spoken telecommunication (Martens and Yoshida 2000). the source relation is active(sourcex) = ¬ mute(sourcex) ∧ (∃ y solo(sourcey) ⇒ solo(sourcex)). Narrowcasting attributes can be crossed with spatialization and used for polite calling or awareware. and solo disabling the collocated complement of the ­selection (in the spirit of “anything not mandatory is forbidden”).3 Multipresence Increasingly fine-grained networked interpersonal communication. So.

exposure) Reception Sensitivity Input (control) Focus (attention) Speaker Loudspeaker Mouth Megaphone Solo (select) Listener Microphone or dummy-head Ear Ear trumpet Attend Suppress Exclude Own (reflexive) Muzzle Mute Muffle Deafen (Thumb up) (Thumbs down) (Thumb down) (Thumbs up) Other (transitive) .1 s k Narrowcasting for sOUrce Tput and INput: Sources and Sinks Are Symmetric Duals in Virtual Spaces.318 Fundamentals of Wearable Computers and Augmented Reality TABLE 13. attenuation Output (display) Nimbus (projection. Respectively Representing Sound Emitters and Collectors Source Sink Function Level Direction Presence Locus Instance Transducer Organ Express Include Radiation Amplification.

. particularly those featuring position-aware (and not just location-aware) spatial sound. the soundscape corresponds to what is sometimes imprecisely called a PoV (point-of-view) visual perspective. These various perspectives are characterized by the relationship of the position of individual users to their display. Visual interfaces are copotentiated by AAR interfaces. more abstract sounds signifying scene objects. centered inside a subject. Except for nonliteral distance effects. and distal classes. using angular displacement instead of absolute azimuth. the user monitors intermittent voice chat among his colleagues. or a tone from a milestone marker. 13. recorded everyday sounds used with such displays. This intimate perspective can be described as endocentric. a vehicle. medial. Populating this layer are auditory icons. Moving can twist (but deliberately not shift) multiple sinks. and multipresence. second-person perspectives. perhaps headphones or nearphones attached to eyewear. the traveler might hear a snippet of a recording of a relative’s voice from home. A use-case scenario of an errand-running pedestrian illustrates these ideas. as well as third-person perspectives (such as a map).4  Layered Soundscapes VR interfaces feature self-referenced displays including immersive. home or origin. modulated and composed with audio windowing. anyware. etc. Following a hierarchical taxonomy for visual virtual perspectives (Cohen 1998). whose respective channels are directionalized to suggest arrangement of their desks at the office.3. tightly bound to the position of the listener’s head. He forks his presence. professional soundscape: At the same time. For instance. The rendering is responsive to translation (location) as well as rotation (orientation). Such distributed presence can be coupled with a motion platform (like that shown in Figure 12. first-person perspectives and tethered. or enjoys some blended combination of personal and public display. described here in decreasing order of intimacy and self-identification. awareware. for instance. Such systems can display direction and metaphorical distance. or position tracking. destinations. corresponding to the same utility–­professional– recreational classifications used to order applications in the previous section.13). An advantage of separating translation and rotation is that directionalizability can be preserved even across multiple frames of reference. navigation cues: directionalized sounds signifying north. inhabits multiple scenes simultaneously.Applications of Audio Augmented Reality 319 Independence of location and orientation can be exploited to flatter multipresent auditory localization. ubicomp everyware. A second-person. maintaining consistent proprioceptive sensation. A first-person. The subject dons an AAR display. intermediate checkpoints. wearware. with automatic prioritization of the soundscape layers. A technique for integration (resolving the apparent paradoxes) of such multipresence is explained in Cohen and Fernando (2009). and earcons. these layers can be sorted into proximal. narrowcasting. utility soundscape: A proximal reality browser layer comprises way-showing. Relaxedly shared position data can be filtered to adjust objects only relatively.

endocentric soundscape allows a displaced or telepresent experience. Within each soundscape. Like a first-person soundscape. can be described as ­exocentric. blood pressure. This perspective can be described as egocentric. A third-person. or other biological energy sources (Starner and . extension of battery capacity is far slower. but the sensation is more like an out-of-body experience.4 CHALLENGES Personal networked spatial audio technologies inherit the challenges of both wearable computing and artificial spatialization (Kyriakakis 1998). Narrowcasting sinks fuse multiple layers while preserving their individual activatability (via deafen and attend). such as a conventional stereo mix. An incoming call might automatically muzzle or attenuate sources in the music layer by deafening or muffling its associated sink. Longevity solutions are being pursued. These layers are combined. basically by adding them. as it is oblivious or indifferent to the position of the listener. Surrounding voices neither recede nor approach. a kind of metaphorical tethering. Such medial rendering is sensitive to orientation but not location: as a subject physically moves about. selectively managing and culling hypermedia. Respective sinks are still self-identifiable and personal. and music populating those layers. 13. sources are individually controllable. The anyware of such an articulated soundscape can be extended by sensitivity of each layer to activity in other layers. This distal perspective. recreational soundscape: Our subject also enjoys music ­rendered with respect to his head. awareware. it is idiocentric. Unlike first. analogous to that often used in games in which players may view their own avatar from over-the-shoulder or behind-the-back. and techniques are emerging in the realm of energy generators such as those collecting solar power. While miniaturization and accuracy of sensors and actuators improve rapidly. and nonindividualized. centered upon oneself.and second-person aural perspectives. or transducing limb and finger movements. but narrowcasting provides a user interface strategy for data reduction. Focusing on an office conference (by attending its sink) could stifle other soundscapes. there is no parallax.320 Fundamentals of Wearable Computers and Augmented Reality Relaxing the tight correspondence of a first person. Such a composite soundscape is overcrowded. centered on (but not within) each subject. Sinks are designated monitors of respective ­composited soundscapes. centered outside a subject. since their perceptual spaces are coextensive. harvesting human body heat (Leonov and Vullers 2009). overlaid on the actual environment of the user. either explicitly (via mute) or implicitly (by selecting some others). the soundscape twists but does not shift. perhaps shared with others. Awareware exemplifies intelligent interfaces that maintain a model of user state. it is a­llocentric. like a visible/invisible toggle switch in a graphical application or an “ignore” sender or thread function in e-mail and webforum browsers. adjusting multimodal input and output (I/O) to accommodate context and circumstances besides position. An entire audience might share a sweet spot. sources are sound effects. distributed voices. egalitarian.

4. raw measurements must be trimmed and equalized (e. 2001).g. Besides such low-level concerns. to scan each subject’s head and torso. Once recorded. bringing people to an anechoic chamber where each subject sits in a rotary chair controlled by computer that synchronizes emission of a reference signal and its recording from binaural microphones located at the ears. pinnae. constructing a 3D model to estimate each HRIR by boundary element methods (Gumerov et al. although more rapid solutions have been proposed (Zotkin et al. and torso characteristics (Rothbucher et al.2 Performance Audio spatialization brings its own challenges. 13. Geronazzo et al. Capture of individual HRIRs is tedious and expensive. respectively considered in the following sections. for example. One might also interpolate between multiple sets to find a good match for one’s head. 2010. 2002). directionalizing multiple sound sources simultaneously) and relatedly. but for the near-field this mechanism is ineffective (Brungart 2002. We addressed this issue by creating an object capable of realtime HRIR filtering with range control into the near-field. Alternatives intended to improve externalizability of virtual sound sources include acquisition of personalized HRIRs. Microsoft is experimenting with a procedure that uses its Xbox Kinect. such as polyphony (i. 13.. Synthesizing HRIRs can be achieved by sampling a person’s upper body.Applications of Audio Augmented Reality 321 Paradiso 2004). 2013). Other issues include sensitivity to noisy conditions. 2010). Spagnol and Avanzini 2009). Similar to provision in public offices of reading glasses with different diopter corrections.1 Capture and Synthesis Generic earprints usually produce auditory images located inside the head and deviations in elevation that depend upon listener pinnae size (Martens 2003). intelligibility of multiple speech sources. one could choose among several HRIR sets. The entire process can take several hours for each subject. adaptation of HRIRs entries from a database to match a given user’s anthropometrics.. between 20 and 160 cm (Villegas and Cohen 2013). so all synthetic sounds created with them are rendered on the surface of this sphere. for diffuse. a set-top console peripheral featuring near-infrared depth cameras and software to capture actors. and standardization of scene specification.e.4. Spagnol et al. This technique needs an accurate surface mesh model of a subject’s features to give acceptable results. 2011. selecting a best fitting dataset (Algazi et al. repeating the operation for many angles—typically at 5° intervals (about five times the minimum audible angle for low-frequency frontal sources). The process involves. and difficulty in . 2014). remaining issues include individualizability of earprints.or free-field reproduction). and HRIR synthesis (Rund and Saturka 2012). matching measured anthropometric parameters against those of other training subjects for whom HRTFs had been previously captured (Bilinski et al. dynamic soundscape display. Simulating distance changes can be done by modulating intensity for the farfield. Capture of HRIRs is usually at a fixed distance from the subject’s center of the head.

The file format is agnostic regarding the . it uses a channel-based model. 2012). The minimum audible movement angle ranges from about 8° for slow sources to about 21° for sources moving at one revolution per second (Perrott and Musicant 1977). display configuration is predetermined. and therefore has a separate focus from MDA™. channel-. a collaborative research project being conducted at IRCAM focused on “assessment of the quality of the experience made possible by binaural listening. Augmented Reality Markup Language (ARML) (MacIntyre et  al. including periphonic. 2013). The Web3D Consortium is working on extending its XML-based X3D (Brutzman and Daly 2007) to support AR applications. scene-. Humans generally perform poorly at tasks involving localization of moving sources. 2013) is also an XML grammar that lets developers describe an augmented scene. hoping for future-proofing by extensibility. an object-oriented representation.3 Authoring Standards Authoring contents for such AR applications can be difficult. A method has been proposed to model such scenes with a two-stage process (Lemordant and Lasorsa 2010). and the definition of a format for sharing binaural data in anticipation of an international standard. but object-based models feature transient sounds tagged with position metadata. Distance and movement cues are difficult to express with HRIRs. MPEG-H Part 3 is a compression standard for 3D audio that can support many loudspeakers. Spatial Sound Description Interchange Format (SpatDIF) is a format that describes spatial sound information in a structured way to support real-time and non-real-time applications (Peters et al. 2011). and object-based arrangements. and channels are rigidly persistent. the problems of storing and sharing HRIRs and BRIRs (binaural room impulse responses) can hopefully be ameliorated (Majdak et al. 13. its objects. Intended for display of prerendered contents. Unifying these paradigms is Tech 3364. rendered at runtime for both flexible speaker arrangement and interactivity.4.322 Fundamentals of Wearable Computers and Augmented Reality distinguishing sounds coming from the back or front of the listener.” With initiatives such as SOFA (Spatially Oriented Format for Acoustics). The challenge of delivering satisfactory binaural experience is reflected in projects such as BiLi—Binaural Listening. and a broadly accepted standard has yet to emerge (Perey et al. but there is still no clear way to definitively solve this problem. In channel-based systems. the research and development of solutions for individualizing the listening while avoiding tedious measurements in an anechoic chamber. Multi-Dimensional Audio. the European Broadcast Union (EBU) Audio Data/ Definition Model (ADM). System issues include the need for low latency for richly populated polyphonic soundscapes and seamless integration of streamed network communication and signal processing. which integrates audio streams and metadata. and some behavior. binaural. analogous to rendering documents using HTML and CSS languages. This inability to follow moving sources (binaural sluggishness) is also evident in virtual scenes where binaural cues need to be approximated at sometimes insufficient rates and where other cues such as Doppler shift are not readily available (Pörschmann and Störig 2009). Compromises between computational complexity and realism of movement illusion have been reached.

to simulate and stimulate. Entities in cyberspace and augmented displays may be imbued with auditory qualities as flexibly as they are assigned visual attributes. and dynamic binaural audio. a gyroscope. Multipresent concert goers can actively focus on particular channels by sonically hovering over musicians in a virtual recital hall. Google Glass-like eyewear and VR-style HMDs such as the Sony PlayStation 4 Morpheus and Facebook Oculus Rift will also incorporate binaural AAR displays.) In the medium future. handheld/nomadic/portable interfaces. The Intelligent Headset™ features GPS. and gestural recognition can be used to nudge computer interfaces toward interaction styles described variously as ambient. 2014) like indoor GPS and the Apple iBeacon will allow modeling of real spaces and tracking within them. formerly flattering. hearing aid functions. and spatial filtering functions. both creative and re-creative (Streicher and Everest 2006) and unplug and play. one’s sources can be muted or selected or others’ sinks deafened or attended for selective privacy. Tracking sensors. accelerometer. using a hand to grab and steady one’s headwear. connoting a cumbersome leash. Presumably. will probably persist. Anticipations of specific commercial offerings quickly get stale in printed media such as this book. the VR salute. Literally sensational spatial sound is around and upon us. and wearable multimedia computing interfaces— offer opportunities for innovative design and advanced applications. since native tracking used for stabilizing visual scenes can be applied to soundscapes as well. scanning tools like Google Project Tango using machine vision as well as indoor localization tools (Ficco et al. speech enhancement. Users altering source and sink parameters can experience the sensation of wandering among conferees in a conference room. Navigation interfaces feature way-finding and -showing and increase situation awareness. and is self-declaratory so that intelligent decoding/replay will be able to perform optimal rendering of content. It is ironic that the participle wired. selected sources can be muted or selected to cull parts of a soundscape. whereware RTLS (real-time location. modulated for attention and exposure by anyware and awareware narrowcasting interfaces. Audio windowing can dynamically adjust composite soundscapes. To control attention (focus). . however. is now somewhat pejorative.[and position-] based services) can parameterize cyberspatial sound displayed by mobile ambient transmedial blending of personal wearware and public resources. everyware ubicomp interfaces. The emergence of wireless form factors— such as mobile networked computing. Applications of AAR span utility. global roaming. but as we go to press some emerging products deserve mention. work. breaking down the conventional fourth wall separating performers from audience. software-defined radio. and leisure. Sound presented in such dimensional fashion is as different from conventional mixes as sculpture is from painting. Binaural accessories will emerge featuring hearing protection. or one’s own sinks deafened or attended to ignore entire scenes. (Until HMDs become lighter weight. magnetometer.Applications of Audio Augmented Reality 323 audio assets payload. motion capture. to control exposure (nimbus). 13. noise reduction. emphasizing important sound signals such as warnings and alarms.5  CONCLUDING REMARKS Recapitulating the themes of this chapter and the preceding one.

DOI: 10. 2007). Duda. Algazi.2012. Weiser and Brown 1997). 323–345.crcpress. or ultrasonic sensitivity.g. Ahson and M. 33–42. and paralyzed. telerobotics. and A. Genetic engineering. the disappearing or invisible c­ omputer (Streitz et al. We are still a long way from the Star Trek holodeck or telepathic interfaces. Effect on—Enriching impressions of motions and physically changing motions via synchronous sound effects. Jr. but also ­literally describe s­ ilicon.. Ilyas (eds. ubicomp or calm (Weiser 1991. V. The future never arrives but beckons us toward it. Thompson. Narrowcasting in SIP: Articulated privacy control. CI in the other). Duda.2001.. which are properly known as hearing aids (HA). J. Science fiction is a fertile asymptote for engineers (Shedroff and Noessel 2012). V. CRC Press/ Taylor & Francis. HA in one ear. which have an effect like internal headphones. they are similar in function and opportunities.. Motiontracked ­binaural sound for personal music players. Dry and wet not only metaphorically describe reverberation. and R. DOI: 10. and D. M.. A.969552.). and nanotechnology will enhance audition: perhaps humans can be crafted to have superpinnae (Durlach et al. and spatially immersive displays. Thompson (2005. or more than two ears (Schnupp and Carr 2009). S. V. reality-based. pp. and Security of Session Initiation Protocol. such as cochlear implants* implanted in the nervous system to help the deaf. In Proceedings of the IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics. January). In S. O. 1997). Cohen. tacit (Pedersen et al. . D. 2000). In AES: Audio Engineering Society Convention (119th Convention). multimedia information furniture. Dalton. 1993). K. R. blind.. Villegas. November).and carbon-based intelligence. Mind–machine interfaces will ­hasten the seamlessness and disintermediation of our connectivity. chapter 14. www. S. J. O. 99–102. Although there are important differences between them from the wearable computer perspective. but fantastic developments await. Alam. Makino. Natural interfaces without mouse or keyboard (Cooperstock et al. Algazi.6505218.324 Fundamentals of Wearable Computers and Augmented Reality embedded. IEEE Signal Processing Magazine 28(1).1109/ASPAA. integrating organic and biomechatronic systems. R. Let’s see. Y. Headphone-based spatial sound. SIP Handbook: Services. BTE—behindthe-ear devices). and let’s hear! REFERENCES Akiyama. are more convenient and intuitive than traditional devices and enable new computing environments. and organic (Rekimoto 2008). R.1109/SCIS-ISIS. Bionics is the use of devices (Sarpeshkar 2006). * Note that we have not distinguished between external assistive hearing devices (e. October). let alone the so-called singularity when machine intelligence overcomes that of humans.. pp. Duda (2011. Maeno (2012. including telerobotics. 856–860. Nor have we discussed bimodal and hybrid configurations (e. New York. Ashir (2009). postWIMP (van Dam 2005). The CIPIC HRTF database. M. As tools such as hearing aids and eyeglasses become prostheses. augmented humans increasingly become cyborgs (cybernetic organisms). and cochlear implants (CI). R. Avendano (2001). and C. Sato. In Proceedings of the Joint International Conference on Soft Computing and Intelligent Systems (SCIS) and International Symposium on Advanced Intelligent Systems (ISIS). pp. Hive-like networked sensors will make hearing more collective.g. and T. and such technology will increasingly be applied to normally abled as well. tangible (Ishii and Ullmer 1997). pervasive. Technologies. R..

Awareness Systems: Advances in Theory. Engdegård. N. A. 269–304.1109/MC. Florence. M. In Proceedings of the CHI: Conference on Computer–Human Interaction. . K. An investigation into the use of spatialized sound in locative games. (1990). and C. Human– Computer Interaction Series. pitching. Cho. 696–708. Methodology and Design. L. and O. J.. 2315–2320. Melamed. Thomas. M. Sound-specific vibration interface using digital signal processing. Hellmuth. number.1241000. May). cfm?elib=16374. 33–35. Lee. chapter 19. December). and pronouns. Rodde (1995). F. In Proceedings of the International Conference on Computer Science and Software Engineering. and P.). Hilpert. March). In T. R. vol. pp. Near-field virtual audio displays. Y. DOI: 10. Tokyo: Springer-Verlag. Fernando (2009). Chi. In P.. Ludwig (1991a. J. M. pp.innovationmagazine. Cohen. Hull.1162/105474600566637.). Tashev. A. and L.85. Auditory Scene Analysis: The Perceptual Organization of Sound. R. Computer 42(3). and G. Hoelzer. 364–386. Cohen. Brungart. Jun.1145/223355. (2002). Innovation: The Magazine of Research & Technology 8(3). Academic Press. http://www. Fahlén. Personal and portable.mitpressjournals. I. April).aes. B. IJMMS: Journal of Person–Computer Interaction 39(2). D. S. and J. Begault. In AES: Audio Engineering Society Convention (124th Convention). and N. (1995).org/e-lib/browse. and R. Breebaart. http://www. plus practically panoramic: Mobile and ambient display and control of virtual worlds. 4. In Proceedings of the CHI: Conference on Computer–Human Interaction. X3D: Extensible 3D Graphics for Web Authors. Spatial Audio Processing: MPEG Surround and Other Applications. J. C. D. Cohen. (1998). Audio augmented reality: A prototype automated tour guide. Ltd. and Signal Processing. Quantity of presence: Beyond person. August). Luciani (eds. IJMMS: Journal of Person–Computer Interaction 34(3). California. You. Speech. DOI: 10. M.icassp2014. Presence: Teleoperators and Virtual Environments 9(1)... S.) (2007). 3-D Sound for Virtual Reality and Multimedia. Daly (2007.Applications of Audio Augmented Reality 325 Baggi. D. Platt (2014. J.. Throwing. San Jose. Breebaart. J. M. Strumillo (2012). J. (1994). Networked virtual reality and cooperative work. Skulimowski. 93–106. M. West Sussex: John Wiley & Sons. 114–117. pp.doi. Exclude and include for audio sources and sinks: Analogs of mute and solo are deafen and attend. L. Spatial audio object coding (SAOC)—The upcoming MPEG standard on parametric object based audio coding. C. and catching sound: Audio windowing models and modes. Benford. Presence: Teleoperators and Virtual Environments 4(4). Journal of Audio Engineering Society 60(9). M.223526. Bederson. Kunii and A. J. Cohen. 84–96. Bowers. Haus (2009. P. May). and L.cfm?elib = 14507. 84–87. Falch. Bilinski. Springer. C. S. Hutchings (2007. Koppens et al. In ICASSP: Proceedings of the International Conference on pp. org/10.2009. Mariani. http://dx. and multipresence. O. privacy. Morgan Kaufmann. Cohen. Multidimensional audio window management. T. MIT Press. J. Oh. D.. M. March). Bregman. H. Awareware: Narrowcasting attributes for selective attention. Bujacz. P. www. Naviton—A prototype mobility aid for auditory presentation of three-dimensional scenes to the visually impaired. (2000. 289–308. Cohen. 319–336. Brutzman. chapter 11. D. Cater. Ahrens. B. April). (1993. J. http://www. D. IEEE 1599: Music encoding and interaction. (2008. HRTF magnitude synthesis via sparse representation of anthropometric features. Greenhalgh. February). Cyberworlds. and M. and W. 259–289. de Ruyter. www. B.1145/1240866. Győrbiró (2008). S. Presence: Teleoperators and Virtual Environments 11(1). Mackay (eds. Sung (2008. K. and T. Markopoulos. Faller (eds.

September).. Foner. C.178. L. 65–73. France.998. M. N. M. Engineering. US Patent 7. Hubbard. (2008. S. chapter 10. R. O’Donovan. and L. .2006. The Art of Mixing: A Visual Guide to Recording. DOI: 10. Gaye. October). DOI: 10. Biomimetic acoustic detection and localization system. Barger.. (2011). Listening: An Introduction to the Perception of Auditory Events. Y. 370–386. and A. Buxton. The Sonification Handbook.. F. and http://sonification. 156–157.141. Duplex narrowcasting operations for multipresent groupware avatars on mobile devices. R. Grosshauser. G. MIT Press. 89–103. Edwards. and D..1997. Mountain (2009... Geronazzo. 239–261. Dijon. Shinn-Cunningham. November). M. 280–287. www. C. March). Spagnol. vCocktail: Multiplexed-voice menu presentation method for wearable computers. K. Duraiswami.. B. and Production (2nd ed. February). Supernormal auditory localization: I. N. Held (1993).php?journalID=46&year=2009&vol=4&issue=2. and F. pp. Hybrid indoor and outdoor location services for new generation mobile terminals. Holmquist (2003). pp. Modern Painters. A. Deligeorges. ACM Transactions on Computer–Human Interaction 2(3). and T. Benford (1995.21. Computation of the head-related transfer function via the fast multipole accelerated boundary element method and its spherical harmonic representation. Hermann (2012). D. R.. Greenberg (ed. Ludwig (1991b). chapter 17.).php/chapters/chapter17/. C. (1997. Hermann. Spieth. 70Ж73Ð. Duckworth. F. A. Auditory display in assistive technology. I. pp. September). June). 271–285. Gilbert (2001. G. January).1109/SITIS.1109/VR. Gumerov. and J.). Acoustic counter-sniper system. Artificial synesthesia via sonification: A wearable augmented sensory ­system. and M. W. In Proceedings of the International Symposium on Wearable Computers.2011. A. and K. Palmieri.141. In T. B. Wearable sensor-based realtime sonification of motion and foot pressure in dance teaching and training. C. Artistpro. N. The International Contemporary Art Magazine. and D.3257598. J. A head-related transfer function model for real-time customized 3-d sound rendering. (1989). A. J. Bläsing. H. M. (2005). Ikei. Journal of Audio Engineering Society 60(7/8). London: Academic Press. U. In S. In Proceedings of the International Conference on Signal Image Technology & Internet-Based Systems. M. DOI: 10. O. Reactive environments: Throwing away your keyboard and mouse. Computer Supported Cooperative Work and Groupware.. Journal of the Acoustical Society of America 127(1). PUC: Personal and Ubiquitous Computing 18(2). Handel. In NIME: Proceedings of the Conference on New Interfaces for Musical Expression. and and D. pp. 183–190. S.326 Fundamentals of Wearable Computers and Augmented Reality Cohen. Yamazaki. Berlin: Logos Publishing House. February).. Zotkin (2010. Multidimensional audio window management. Sonic city: The urban environment as a musical interface. Castiglione (2014. Avanzini (2011. Presence: Teleoperators and Virtual Environments 2(2). D. E. T. S. Communications of the ACM 40(9). Cohen. 174–179. Hirota. Durlach. In Proceedings of the Virtual Reality Conference. Massive: A collaborative virtual environment for teleconferencing. January). Kawaguchi (2009).). General background. N. N. L. Harbisson.495. Painting by ear. Neuhoff (eds. 580–589. 193–210. S. 109–115. Cooperstock. S. Fernando. Dumindawardana. Mazé. Fels. Hunt. and L. and S. N. IJWMC: International Journal of Wireless and Mobile Computing 4(2). Greenhalgh..1121/1.ieeecomputersociety. Gibson.1109/ISWC. A. E. inderscience. US Patent 6. Smith (1997. Ficco. Hirose ( N. pp. http://doi.629932.

DOI: 10. Katz. Tangible bits: Towards seamless interfaces between people. B. C.. Nicol.664281. D. March). Martens.. M. Magas. MacIntyre. Majdak. Augmented reality audio editing. Journal of the 3D-Forum Society of Japan 14(4). Suzuki. Engelke.1077257. C. Nilsen (2005. Augmenting spoken telecommunication via spatial audio transformation. and B. SIGGRAPH Art Gallery. DOI: 10. Warren. and A. M.Applications of Audio Augmented Reality 327 Bainbridge. T. Noisternig (2013.). Cheok. (2003. Paperbuttons: Expanding a tangible user interface. chapter 2.. Nelson (2000). L.1667290. ACM. N. and R. May).220. In DIS: Proceedings of the third Conference on Designing Interactive Systems.1063/1. Proceedings of the IEEE 86(5). Conway (2011. Haahr. N. K. http:// ieeexplore. 513–525. IEEE Computer Graphics and Applications 33(3). May). Active localization of virtual sounds.. M. S. Current status of standards for augmented reality. Pedersen. B. Jouffrais (2012. R. 1757–1763. H. Jones. A. Jones.  Denis. Holmes (2008). Mandryk. A. Recent Trends of Mobile Collaborative Augmented Reality NAVIG: Augmented reality ­guidance system for the visually impaired. www. July). Perey.. May). Martens. 69–175. 4. 253–269. Wierstorf.. T. M. Carpentier. browse.cfm?elib=15176. New J.1109/5. Brilhault. S. K. PUC: Personal and Ubiquitous Computing 12(7). and C. Truillet. www. Rouzati. Magerkurth. In AES: Audio Engineering Society Convention (134th Convention).1007/s00779-007-0155-2. Kyriakakis. and J. C. Watanabe. and B. 220–232. Rome. G. T. F. M. Locationaware interactive game audio. Thorpe. Loomis. Ziegelwanger. Reed (2011).24. Parseihian. H. G. In L. DOI: 10. and M. London.. Lemordant. Auvray. D.1250/ast.aes. Ontrack: Dynamically adapting music playback to support navigation. Fields (2009. October). Preprint #8880. V. Cicinelli (1990.cfm?elib=15439. Lasorsa (2010. Virtual Reality 16(4). Sokoler. Naliuka. bits and atoms. and G. H. Leonov. 941–951. W. Carrigy. February). In Proc. pp. Perceptual evaluation of filters controlling source direction: Customized and generalized HRTFs for binaural synthesis. M. Huang (eds. December). Peltola. and M. and C. and Y. Walled gardens: Apps and data as barriers to augmented reality. Acoustical Science and Technology 24(5). Savioja (2009. decibel 151. J. Yoshida (2000. pp. C. http://dx. DOI: 10. G. and T.cfm?elib=15769. s2009/­galleries_experiences/information_­aesthetics/index. (1998. Alem and W. Vullers (2009). P. 77–81. In AES: Audio Engineering Society Convention (41st International Conference): Audio for Games. pp. O. Pervasive games: Bringing computer entertainment back to the real world. In AES: Audio Engineering Society Convention (35th International Conference): Audio for www. S. Roginska.jsp?arnumber=664281. 21–38.1145/1077246. Lechner (2013. and F. Stewart.. E. Computer Entertainment 3(3). W.. Gutierrez.1145/347642.347723.1007/ s10055-012-0213-6. R.siggraph. DOI: 10. Augmented reality audio for location-based games. May/June). R. November).php. http://scitation. August). and L.aes. Louisiana.aip. Wearable electronics self-powered by using human body heat: The state of the art and the perspective. King. Ullmer (1997. and L. February). October). S. The mist in 3-D Bradley. Y. www. M. Paterson. In Proceedings of the CHI: Conference on Computer–Human Interaction. (2007. Spatially oriented format for acoustics: A data exchange format representing head-related transfer functions. A.aes.3255465. R. Journal of the Acoustical Society of America 88(4).. Hebert. 216–223. P.doi. Springer-Verlag. May). DOI: 10. . T. Journal of Renewable and Sustainable Energy 1(6). 234–241. Fundamental and technological limitations of immersive audio ­systems. Lokki. New Orleans. M. In Audio Engineering Society Convention (128th Convention). L. H. G..

328 Fundamentals of Wearable Computers and Augmented Reality Perrott. Lossius. Journal of the Acoustical Society of America 62(6). Diepold (2010. and F. July). http://www. Scientific American (1880.265. Speech. On hearing with more than one ear: Lessons from evolution. 1463–1466. 102–106. 1–35. State-of-the-Art Survey. (2008). Streitz. (2003. Rosenfeld Media. In AES: Audio Engineering Society Convention (136th International Conference). Schacher (2012. M. http:// doi.wul. and I. July). In Proceedings of SMC: Sound and Music Computing Conference. DOI: 10. Tachi. DOI: 10. R. PhD thesis.2013. dspace. www. S. M. and B. Starner.). 696–706. September). M. Auditory localization in the near-field. H. specification. C. nature. and Language Processing 21(3). Pasadena. Information and Systems 47(4). Quackenbush. and A. S. Avanzini (2013. IEEE Spectrum 43(5).html. C.1109/MMUL. 18–23.381675. http://spectrum. Avanzini (2009.. and examples. Rothbucher. Sandler (2008.. W. A. Terven. April). and C. D. CRC Press. Rekimoto. Make It So: Interaction Design Lessons from Science Fiction. MPEG surround. Stewart. Cohen (2014. The New Stereo Soundbook (3rd and J. 24– dspace/handle/2065/436. In TSP: Proceedings of International Conference on Telecommunications and Signal Processing. S. and M. S. Noessel (2012). Telexistence. Sanuki. Levy. Everest (2006). J. Habigt. pp. May). Shedroff. R.2325. T. In Proceedings of DAFx: 11th International Conference on Digital Audio Effects. .. Raducanu (2014.76..1109/MC. Habigt. Pörschmann. 5. Singapore: World Scientific Publishing Company. and C. 508–519. December). Kameas. T. Spagnol. and F. Rund. J. N. Springer LNCS 4500.265. System Infrastructures and Applications for Smart Environments. Alternatives to HRTF measurement. http://doi. (2006. pp. and F. Berlin. Sarpeshkar. Schnupp. Streicher. R. Peters. California: Audio Engineering Associates..htm. Adapting 3D sounds for auditory user interface on interactive in-car information tools. SpatDIF: Principles. Musicant (1977). Spagnol. Investigations into the velocity and distance perception of moving sound sources. Acta Acustica United with Acustica 95(4). Waseda University. Störig (2009. T.1109/MC. May). The Disappearing Computer: Interaction Design. Copenhagen. D. Saturka (2012. Minimum auditory movement angle: Binaural localization of moving sound sources. IEEE Transactions on Audio.1038/nn. 38–44. July)..2005. A. October/December). Scientific American 43(1). N. H. Navigation in fogs. Communications of the ACM 51(6). Riedmaier. Villegas. pp. August). and C. J. Tokyo. Salas. Nature Neuroscience 12(6).ieeecomputersociety. IEEJ Transactions on Electronics. New opportunities for computer visionbased assistive technology systems for the visually impaired. and J. Herre (2005. 692–697. July).) (2007).. Espoo. Brain power: Borrowing from biology makes for low-power ­computing. N. In Low Power Electronics Design. J. IEEE Multimedia 12(4). and F. 648–652. (2009). J. J. and K. Human generated power for mobile April). F. R. Paradiso (2004). and M. Mavrommati (eds. Takao. R. A. Measuring anthropometric data for HRTF personalization. 3D interactive environment for music collection navigation. March).org/ brain-power/.2325. DOI: 10. Porto. Carr (2009. Organic interaction technologies: From stone to skin. Machi-beacon—An application of spatial sound on navigation systems. and J. E.1121/1.waseda. In Proceedings of Sound and Music Computing Conference.­ gutenberg. July).2013. On the relation between pinna reflection patterns and head-related transfer function features. In SITIS: Proceedings of the International Conference on Signal-Image Technology and Internet-Based Systems. 52–58.

September/October). October).aes. chapter 13. p. Weiser. (2005. New York. abstract representations of audio in a mobile augmented reality conferencing system. Springer. In AES 40th International Conference “Spatial Audio: Sense the Sound of Space”. The coming age of calm technology. and M. . (1991. October). http://www. Tokyo. Visualization research problems in next-generation educational D. S. Kyoto. Cohen (2010. Villegas. pp. July). 149– S. J. IEEE Computer Graphics and Applications 25(5). 94–104. Copernicus (Springer-Verlag). 88–92. J. and J. M. Brown (1997).2005. Feiner (2011). White.. September). pp. M. A. Customizable auditory displays. Metcalfe (eds.1109/MCG. www.aes. The computer for the 21st century. Weiser.Applications of Audio Augmented Reality 329 van Dam. EB3-2. Villegas. “GABRIEL”: Geo-Aware BRoadcasting for In-Vehicle Entertainment and Localizability. 75–85. In P. J.118. DOI: 10. Cohen (2013. Zotkin. and M. Duraiswami. Davis (2002. In Recent Trends of Mobile Collaborative Augmented Reality Systems. R. In Proceedings of ICAD: International Conference on Auditory Display. S. Real-time head-related impulse response filtering with distance control. and S. Beyond Calculation—The Next Fifty Years. Scientific American. Denning and R. N. chapter 6. Dynamic. and L. M.). In AES: Audio Engineering Society Convention (135th Convention).


...2 ARMOR.346 14......................2 Two-Stage Rendering............2. Suyang Dong...................... Kamat CONTENTS 14............................................14 Recent Advances in Augmented Reality for Architecture.................1................ 337 14.....1 Occlusion Handling Process..........2..................................2 Visual Illusion of Virtual and Real-World Coexistence (Occlusion)..........................................1 ARVISCOPE..................... 353 14....................................................................................................................................................344 14....... and Vineet R...2..............2.3........2.... 355 331 ........................... Behzadan........... and Construction Applications Amir H...3.1 UM-AR-GPS-ROVER................. 349 14. 353 14.................2.........................................................1............ 350 14..4 Experimental Results..........2.....1........3............................... Engineering.... and Construction......2.........1 Spatial Alignment of Real and Virtual Objects (Registration).3 Software and Hardware for AR in AEC Applications...3....... 350 14............2.....2 Recent Advances in AR for AEC Applications.........1..346 14..................... 350 14.......1 Overview of Augmented Reality in Architecture.....................1........3 Implementation Challenges.2.......................................2...2......2 Challenges Associated with AR in AEC Applications.....2 SMART.......3...... 337 14.......2...........................344 14..........1 Registration Process....................... 337 14..........1 Software Interfaces.......................................... 335 14.......... 341 14......... 333 14.....................2 Hardware Platforms... 332 14...........2................ 352 14.1.......... Engineering..................3.............1 Introduction................2 Experimental Results............................................

....... 390 14..................................................3 AR for Collaborative Information Delivery..1 Technical Approach for AR-Based Information Delivery.......... 362 14.................... another category of visualization techniques... 379 14..........................2 Multiple Views in ARVita. archiving.......4........ the cost and effort of constructing a faithful synthetic environment include tasks such as model engineering (the process of creating.... An important category of visualization is termed virtual reality (VR)..................... and graphics rendering and can thus be enormous (Brooks 1999)..................................5 IDR Calculation... 376 14........1 AR-Assisted Building Damage Reconnaissance........1. even though VR can provide a stable.....1.....6 Experimental Results.................. attempts to preserve the user’s awareness of the real environment by compositing the real-world and the virtual contents in a mixed 3D space..............2....4............ 365 14.......... 370 14............................. 358 14.......... 373 14...... called augmented reality (AR).361 14... however...... 381 14............... the user’s sensory receptors (eyes and ears) are isolated from the real physical world and completely immersed in the synthetic environment that replicates the physical world to some extent... scientific visualization..........4......1..2 Processing Geodata Vectors and Attributes for AR Visualization...... and maintaining 3D models).............................. scene management..5 Summary and Conclusions..2 Vertical Edge Detection.....4....................... 392 14..........................4............ computer games....... In particular..4.........................4........... 365 14.2......... which attempts to replace the user’s physical world with a completely synthetic environment...... and virtual training.............4......................................1 Technical Approach for AR-Based Damage Assessment....4.......................2 AR for Georeferenced Visualization and Emulated Proximity Monitoring of Buried Utilities... AR refers to the visualization technology that blends virtual objects with ..... animation........... 374 14... 392 References.........................3......364 14............1..................... In addition.......... In contrast to the VR paradigm................4.............................385 14.1.4..........1......... 367 14...... refining................3 Horizontal Edge Detection.......4............3 Real-Time Proximity Monitoring of Excavators to Buried Assets.......... and immersive experience...........................4.1 INTRODUCTION In several science and engineering applications.. 358 14...............4 Implemented AEC Applications.....1 Technical Approach for Visualization of Buried Utility Geodata in Operator-Perspective AR.................4 Monitoring Excavator–Utility Proximity Using a Geometric Interference Detection Approach..........2.......4 Corner Detection.....2. 383 14...2........... robust.............................. interactive.........3............ In VR........4.............332 Fundamentals of Wearable Computers and Augmented Reality 14..................... There are a wide array of applications now commonly associated with VR such as computer-aided design (CAD).........5 Experimental Results.... visual simulation.....4...........4.. visualization can enhance a user’s cognition or learning experience especially when the goal is to communicate information about a complex phenomenon or to demonstrate the applicability of an abstract concept to real-world circumstances..

2001). whereas VR applications are mainly restricted to designing. where modelers can take full advantage of the real context (e. reconstructing the context is rarely a problem in AR. The first attempt at visualizing underground utilities was made by Roberts et al. respectively. (1996) developed an AR system for improving the inspection and renovation of architectural structures.. where the work has been extended to improve visual perception for excavation safety and subsurface utilities. (2007). Engineering. Webster et al. Some further exploration can be found in Behzadan and Kamat (2009a) and Schall et  al. and construction (AEC) industry. AR serves as a useful inspection assistance method in the sense that it supplements a user’s normal experience with context-related or georeferenced virtual objects. terrains and existing structures) and render them as backgrounds. supervision. In addition. which allows users to readily obtain an augmentation in order to find differences between an as-designed 3D model and an as-built facility. the awareness of the real environment in AR and the information conveyed by the virtual objects help users perform real-world tasks. AR must not only maintain a correct and consistent spatial relation between the virtual and real objects. (2002a). (2011) presented another supervision example of overlaying as-built drawings onto an aboveground site photo for the purpose of continuous quality . thereby saving a considerable amount of effort and resources. A discrepancy check tool has been developed by Georgel et  al. The paper reveals eight work tasks that may potentially benefit from AR (i. promotes people’s appreciation about their context. (2008). layout.Recent Advances in Augmented Reality for AEC Applications 333 the real world (Azuma et al. coordination. and provides hints for the users to discover their surroundings. However.. and Construction AR has significant potential in the architecture. 14.1. They looked beneath the ground and inspected the subsurface utilities. (2009) demonstrated an example of applying AR for construction supervision. and strategizing). For this purpose. inspection. and training (Azuma 1997). but also sustain the illusion that they coexist in the augmented space. Shin and Dunston (2008) presented a comprehensive outline for identifying AR applications in construction. positioning. Dai et  al. They implemented a system for visualizing performance metrics. Figure 14. commenting. engineering.1 Overview of Augmented Reality in Architecture. running simulations. These essential entities usually exist in a complex and dynamic context that is necessary to the model. It aims to represent progress deviations through the superimposition of 4D as-planned models over time-lapsed real jobsite photographs. but costly to replicate in VR. AR offers a promising alternative to the model engineering challenge inherent in VR by only including entities that capture the essence of the study (Behzadan and Kamat 2005).g. Golparvar-Fard et  al. The blending effect reinforces the connections between people and objects. Users can have x-ray vision and see columns behind a finished wall and rebars inside the columns. Furthermore. excavation.e.1 depicts some example applications from these areas.

These works share the characteristic of monitoring the discrepancy in the chronological order. IEEE Comp. (c) supervision (From Golparvar-Fard. Piekarski (2006) visualized the design of an extension to a building using a mobile AR platform called TINMITH2.. which is different from the discrepancy check mentioned earlier. Behzadan and Kamat (2007) designed and implemented augmented reality visualization of simulated construction operations (ARVISCOPE).).. and (d) project feasibility analysis (From Piekarski.).) investigation of a bored pile construction.K.).. Aiteanu et al. the quality of welding depends on the welder’s experience and skill. 418. et al. (a) subsurface utilities (From Schall..Comput. 14. For example. (2003) improved the working conditions for welders. 2007. Some examples of coordinating and strategizing are the visualization of construction simulations and architectural designs. Cambridge. Some other construction tasks excluded by Shin and Dunston (2008). 26(1).334 Fundamentals of Wearable Computers and Augmented Reality (a) (b) (c) (d) FIGURE 14.. 2010. 23(6). (b) inspection (From Georgel. Civil Eng.. Virtual redlining for civil engineering in real environments. An industrial augmented reality solution for discrepancy check. M. 2009. Graphics Appl. an AR framework for visualization of simulated outdoor construction operations to facilitate the verification and validation of the results generated by discrete event simulation (DES).1  Example applications of AR in the AEC industry. as well as the quality control. et al. Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR). 2006. U. G. which feature high complexity. P. J. 111–115.. W. pp. by . may also benefit from AR. et al. With permission from ASCE. Proceedings of the 2007 IEEE and ACM International Symposium on Mixed and Augmented Reality.

2  AR research for AEC applications in LIVE. et al. the rapid postdisaster reconnaissance system for building damage assessment superimposes previously stored building baselines onto the corresponding images of a real structure. Eng. thus helping prevent accidents caused by utility strikes (Talmaki et al.)(Continued ) . before and during the welding process.2 Recent Advances in AR for AEC Applications The Laboratory for Interactive Visualization in Engineering (LIVE) at the University of Michigan has been engaged in AR research with applications related to construction. 2013. Lastly.. S. such as paper drawings and online quality assistance. The developed visual collision avoidance system allows excavator operators to persistently see what utilities lie buried in the vicinity of a digging machine or a human spotter. safety.335 Recent Advances in Augmented Reality for AEC Applications developing a welding helmet that augments visual information. the tabletop collaborative AR visualization helps to bridge the gap between paper-based static information and computer-based graphical models.A. The onsite inspectors can then estimate the damage by evaluating discrepancies between the baselines and the real building edges (Dong 2012). Inform. inspection. visualization of construction operations in outdoor AR facilitates the verification and validation of the results of simulated construction processes. 2013). and visualization of operations-level construction processes in both outdoor AR and the collaborative tabletop AR environments (Figure 14. It reflects the dynamic nature of a jobsite. 2013). These AEC applications include visual excavator-collision avoidance systems. Moreover. and preserves the convenience of face-to-face collaboration (Dong et al. and education.2). 14. (a) Visual collision avoidance.. Video camera Electronic compass Orientation of excavator Tracking and video hardware AR cabin platform Write Geometric data in Dynamic pipeline model generation algorithm Position of excavator Outdoor augmented reality registration Rover part of RTK AR visualization Read Real world data input Geodatabase (a) FIGURE 14. (From Talmaki. rapid reconnaissance systems for measuring earthquake-induced building damage. Adv. with minimum effort spent on creating 3D models of the surrounding environment (Behzadan and Kamat 2009b).1. operations planning. With the aid of AR. 283. 27(2).

(From Dong. (b) Reconnaissance of damaged building..336 Fundamentals of Wearable Computers and Augmented Reality GPS antenna Electronic compass Camera HMD RTK rover receiver RTK rover radio (b) GPS satellites Trace file DES model Augmented view Mobile user Interpreter CAD models (c) FIGURE 14. 23(6).) (c) Visualization of construction processes. Civil Eng. 2012. A. Scalable and extensible augmented reality with applications in civil infrastructure systems.H. J. Comput. 2009b. and Kamat. (From Behzadan. S.)(Continued) .. University of Michigan.2 (Continued )  AR research for AEC applications in LIVE. MI. With permission from ASCE. Ann Arbor.. PhD dissertation. Department of Civil and Environmental Engineering. V.R. 405.

Softw.2  CHALLENGES ASSOCIATED WITH AR IN AEC APPLICATIONS 14. 45. the X-axis pointing to the east.2.1 Spatial Alignment of Real and Virtual Objects (Registration) Spatial registration in AR attempts to guarantee that the real-world objects and superimposed virtual objects are properly aligned with respect to each other (Behzadan and Kamat 2007). Transforming virtual objects from the world coordinate system to the eye coordinate system Since the origin of the world coordinate system coincides with the user’s eye coordinate system. Adv.1  Registration Process As shown in Table 14.Recent Advances in Augmented Reality for AEC Applications 337 (d) FIGURE 14. Elsevier J. The eye coordinate system complies with . 2013. Determining the shape of the viewing volume 4.2...) 14. As shown in Figure 14. Positioning virtual objects in the world coordinate system 3. the world coordinate system uses a right-handed system with the Y-axis pointing in the direction of the true north. which is the user’s geographical location in each frame. 2006): 1. the illusion that the two worlds coexist inside the user’s view of the augmented space will be compromised. (From Dong. and the Z-axis pointing upward. the registration process typically consists of four major steps (Shreiner et al. In the absence of proper registration. as well as the lens parameter of the camera that captures the real-world views. Eng. (d) Collaborative AR visualization. Positioning the viewing volume of a user’s eyes in the world coordinate system 2. 14.1. et al. these steps must consider six degrees of freedom (three for position and three for head orientation) measured by tracking devices.1. 55. S.2 (Continued )  AR research for AEC applications in LIVE.3.

1 Four Steps of the Registration Process in AR Step Viewing Task Parameters and Device Illustration Position the viewing volume of a user’s eyes in the world. Perspective projection matrix . Attitude of the camera (electronic compass) Zw Ze Ye Xe Modeling Position the objects in the world. Yw Xw Location of the world origin (RTK-GPS) Zw Zo Yw Yo Xw Xo Creating viewing frustum Decide the shape of the viewing volume. Lens and aspect ratio of camera (camera) Projection Project the objects onto the image plane.338 Fundamentals of Wearable Computers and Augmented Reality TABLE 14.

1. The procedure is explained as follows. pitch.1) Pe = Rz ( Ψ ) *Rx ′ ( −Θ ) * Ry ″ ( −Φ ) * Pw (14.2 through 14.3) ³–1 µ˜ . the rotating matrix is written as Rz(Ψ)R x′(−Θ)Ry″(−Φ). a simple and robust way to express rotation. the head rotates around the Y″-axis by roll angle Φ ∈ [−180. and all associated equations are listed in sequence in Equations 14. Since the rotation is clockwise under the right-handed system. Suppose the eye and world coordinate systems coincide at the beginning. therefore.  +180] with a counterclockwise rotation of Ry″(Φ) to reach the final attitude. with counterclockwise rotation of R x′(Θ). Since OSG provides quaternion. The user’s head rotates around the Z-axis by yaw angle Ψ ∈ [−180. X′ and Y′. the head rotates around the X′-axis by pitch angle Θ ∈ [−90.339 Recent Advances in Augmented Reality for AEC Applications Zw Upward Ze Zw Zo Yw East Yo Xw Xo True North Y˝w(Ye) Yw Ye Y΄w Xe Xw FIGURE 14. and finally rotating around the Z-axis by Ψ degrees: 0 0 � �cos(Φ ) 0 sin(−Φ ) � � X w � � X e � �cos(Ψ ) sin(−Ψ ) 0 � �1 � � � � � � � � � � 1 0 � Ye � = � sin(Ψ ) cos(Ψ ) 0 � * �0 cos(Θ) sin(Θ) � * � 0 � * � Yw � �� Z e �� �� 0 0 1 �� ��0 sin(−Θ) cos(Θ) �� �� sin(Φ) 0 cos(Φ) �� �� Z w �� (14. +90] to get the new axes Y″ and Z″. the yaw. then rotating around the X′-axis by −Θ degrees. Finally. +180] to get the new axes. As shown in Figure 14. as shown in Equation 14. the rotation matrix is Rz(−Ψ).5: rotating around the Y″-axis by −Φ degrees. The zxy rotation sequence is picked to construct the transformation matrix between the two coordinate systems. the rotation matrix is further constructed as quaternion by specifying the rotation axis and angles.2) •0 — ³ µ Z-axis = ³0 µ (14. and roll angles are used to describe the relative orientation between the world and eye coordinate systems.3. the OpenSceneGraph (OSG) (Martz 2007) default coordinate system. and the Y-axis departing from the eye. Converting the virtual object from the world coordinate to the eye coordinate is an inverse process of rotating from the world coordinate system to the eye coordinate system. Then. using a righthanded system with the Z-axis as the up vector.3  Definition of the world coordinate system.

GPS sensor) carried by the user. The geographical location of the world coordinate origin is also given by position tracking devices (e. and near and far planes. horizontal and vertical aspect ratio. rotation.4.340 Fundamentals of Wearable Computers and Augmented Reality � cos(Ψ ) sin(Ψ ) 0 � �1 � � cos(Ψ ) � � � � � � � X′-axis =  �sin(−Ψ ) cos(Ψ ) 0    � * �0 � = � −sin(Ψ ) � (14.g. Finally. the next step is to model the virtual objects in their exact locations. any ­further translation. Therefore. Far Top Aspect ratio = Width Height Left Near Ze Line o t f sigh Ye Center of point Xe Vertical FOV Bottom Right FIGURE 14. The real world is perceived through the perspective projection by the human eye and the video camera. The methods to calculate the distance between geographical coordinates is originally introduced by Vincenty (1975). the 3D vector between the object and world coordinate origins can be calculated. In order to increase computational efficiency. Once a virtual object is modeled inside the user’s viewing frustum. Four parameters are needed to construct a perspective projection matrix: horizontal angle of view. the user’s viewing frustum must be defined.5) Once the rotation sequence and transformation is completed. As shown in Figure 14. The definition of the object coordinate system is determined by the drawing software. these parameters together form a viewing frustum and decide the virtual content to be displayed in the augmented space.4  The viewing frustum defines the virtual content that can be seen. . all virtual objects outside of the viewing frustum are either cropped or clipped. The origin is fixed to a pivot point on the object with user-specified geographical location.. and scaling operations are applied on the object. Behzadan and Kamat (2007) used this approach to design an inverse method that uses a reference point to calculate the 3D vector between two geographical locations.4) �� 0 0 1 �� ��0 �� �� 0 �� 0 0 � � cos(Ψ ) sin(Ψ ) 0 � �0 � � sin(Ψ ) � �1 � � � � � � � � Y ″-axis = �0 cos(Θ) sin(−Θ) � * �sin(−Ψ ) cos(Ψ ) 0 � * �1 � = �cos(Θ)cos(Ψ ) � ��0 sin(Θ) cos(Θ) �� �� 0 0 1 �� ��0 �� �� sin(Θ)cos(Ψ ) �� (14.

21° X pos: –0.15 m Y pos: 0. a series of experiments were performed on a Dell Inspiron machine with an Intel® Calibration result Yaw offset: –4. PUSH) with the head orientation sensor may cause synchronization latency problems in dynamic registration.. correct static registration does not necessarily guarantee that the user can see the same correct and stable augmented image when in motion. so that the level of alignment can be judged. the communication mechanism (PULL vs.e. the virtual box should coincide with the real box when moved together with six degrees of freedom. The virtual box is first projected without adjustments to the attitude measurement.04 m X pos: –0. when virtual objects are not moving inside the viewing frustum).30 m Z pos: –0. Figure 14. Figure 14.09 m Roll: –22.5  Mechanical attitude calibration result and validation experiment of registration algorithm.05 m Y pos: 0. A real box of size 12 cm × 7 cm × 2 cm (length × width × height) is placed at a known pose. A semitransparent 3D model of the same size is created and projected onto the real scene. The virtual box is then shifted to align with the real one by adding a compensation value to the attitude measurement. in order to determine the latency under PUSH mode. Despite achieving satisfactory results in static registration (i. In the authors’ research.5 also shows that overall the virtual box matched the real one in all tested cases.30 m Z pos: –0. The experiment was further continued to validate the agreement between the real and virtual camera. due to the latency induced by the head orientation sensor itself.12° FIGURE 14.09 m Pitch: 46. .07 m Y pos: 0.1.30 m Z pos: –0.6 lists the main steps involved in PULL and PUSH mechanisms. if the static registration algorithm works correctly.5 shows the process followed to calibrate the mechanical attitude discrepancy and validate the registration algorithm.2  Experimental Results Figure 14. In particular.Recent Advances in Augmented Reality for AEC Applications 341 14.3° Roll offset: –1. A more detailed description of these methods can be found in Dong (2012).5° Pitch offset: –7.0° X pos: –0. and discrepancy is thus present.2.

.6  Communication stages in (a) the PULL mode and (b) the PUSH mode. The device update function was written as a delegate and registered with the OnDataReceived event when a new data packet was placed in the buffer. Both the camera and the TCM module ran at approximately 30 Hz. otherwise go to parsing data stage • Check data completion by CRC matching • Interpret YAW. CoreTM 2 Duo CPU T6600 2.2 GHz and 64-bit Windows operating system. The camera update function was written as a callback and executed at every frame.342 Fundamentals of Wearable Computers and Augmented Reality • Notify the TCM module of the data components needed for each request Request data • Notify the TCM module to send the data Wait for data packet • Waiting for the response from the TCM module • If time out.7. and other relevant Parse data data from binary data packet (a) Establish communication Wait for data packet • Notify the TCM module of the data components needed only at the beginning • Notify the TCM module to flush the data • Attach callback to the OnDataReceived event • If OnDataReceived event is trigged. The system time was recorded when each new frame was captured. The system time stamp was also assigned to the angular data each time the event is triggered. the time stamps were compared to find the lag of the TCM module PUSH mode. the TCM module was held static at the beginning. along with their corresponding time stamps. and then rapidly swung to one side at the speed of about 150°/s. then OnDataReceivedFunc is invoked • Check data completion by CRC matching • Interpret YAW. and other relevant Parse data data from binary data packet (b) FIGURE 14. then return to the stage of requesting data. ROLL. an integrated camera was used and the resolution was adjusted to the minimum option of 160 × 120. ROLL. PITCH. Later. A TCM compass module was used as a 3D orientation tracking device. the exact instant that the module started swinging was identified from the recorded image frames and the TCM module angular data. PITCH. In order to minimize the transmission latency between the camera and the host system. In this way. As shown in Figure 14.

the calibration of the magnetometer can compensate for a local static magnetic source within the vicinity of the compass module.. Except for the high frequency vibration noise. This implies that the communication delay in the PUSH mode was small enough to be neglected.Recent Advances in Augmented Reality for AEC Applications 343 (b) (a) (c) Degree 200 150 100 50 0 (d) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Time instance FIGURE 14.e.7  Comparison between the TCM-XB data log and the corresponding recorded image frames. Six groups of experiments were carried out and the delay in the PUSH mode relative to the web camera was found to be 5 ms on average. dynamic magnetic distortion still impacts the module in motion. (a) Static. heading (i. Another source of latency error in PUSH mode is due to the finite impulse response (FIR) filter of the compass module. yaw) is the most sensitive to the noise. In particular. Among the three degrees of freedom. The compass . Usually the noise increases with the acceleration. (b) Begin to swing. However. (d) Recorded data log. (c) Second frame of swing. The shaded area highlights the exact instant that the module started swinging. other types of noise can be removed by a FIR Gaussian filter. and the noise magnification depends on the acceleration of the module.

module comes with five options for filtering: 32. The higher the number. it induces a 0. instead of virtual objects being simply superimposed on top of a real-world background. Consider the case of selecting a 32 tap filter. real and virtual objects must be seamlessly blended in all three dimensions.5 s delay for a 32 tap filter.8  The filter-induced latency when a 32 tap Gaussian filter is used. The result of composing an AR scene without considering the relative depth of the real and virtual objects is that the graphical entities in the scene appear to float over the real background. 14.08 0. When it is time to send out estimated data at time instant A.9 is a schematic AR scene where a real object (structural column) is closer than the virtual object (forklift) to the viewpoint (Behzadan and Kamat 2008). and applies a Gaussian filter to the queue. However.2. but significant jittering. random gross errors blink virtual objects on and off and turn out to be very distracting. but with the major limitation that the contours need . This is referred to as filter-induced latency.06 Frame Frame A–31 A–32 0.8. This phenomenon is commonly referred to as incorrect occlusion. as shown in Figure 14. A 0 tap filter implies no filtering.344 Fundamentals of Wearable Computers and Augmented Reality Frame A Frame Frame Frame A–1 A–2 A–3 Gaussian filter Pop out Insert Frame A–16 0. and 0 tap filter.2.25 s delay for 16 tap filter. Since the module samples at approximately 30–32 Hz. The occlusion problem is more complicated in outdoor AR where the user expects to navigate the space freely.2 Visual Illusion of Virtual and Real-World Coexistence (Occlusion) In an ideal AR visualization scenario. the more stable the output but the longer the expected latency. 14. the module adds a new sample A to the end of the queue with the first one being dropped.02 0. 16. producing an incorrect illusion that the forklift is in front of the column. Wloka and Anderson (1995) implemented a high-speed stereo matching algorithm that infers depth maps from a stereo pair of intensity bitmaps. However. as is the case in most current AR approaches. 4. However. 8.04 Update Frame Frame Frame A A–1 A–2 Frame A–15 Gaussian 0. and so on. The rightside image shows visually correct occlusion where the forklift is partially blocked by the structural column. and it applies to both PULL and PUSH modes.00 Frame Frame A–30 A–31 –18 –16 –14 –12 –10 –8 –6 –4 –2 0 2 4 6 8 10 12 14 16 18 FIGURE 14. Berger (1997) proposed a contour-based approach. a 0. and where the relative depth between the involved virtual and real content changes dynamically with time. Dong (2012) provided a detailed account of how filter-induced latency can be avoided by moving the Gaussian FIR filter from the hardware to the software. Figure 14. the left-side image shows the scene in absence of occlusion.2. rather than blend or coexist with real objects in that scene.1  Occlusion Handling Process Several researchers have explored the AR occlusion problem from different perspectives. the filtered result actually reflects the estimated value at time instant A–15.

the semiautomated method is only appropriate for postprocessing. an RGB video camera. Compared to the previous approach. 2. Lepetit and Berger (2000) refined the previous method with a semiautomated approach that requires the user to outline the occluding objects in the key views. and render to texture (RTT) techniques to correctly resolve the depth of real and virtual objects in real-time AR visualizations. 3.9  Example of incorrect and correct occlusion in AR visualization. OpenGL Shading Language (GLSL). and then the system automatically detects these occluding objects and handles uncertainties on the computed motion between two key frames. Ubiquity: The TOF camera is capable of suppressing the background illumination (SBI) and enables the designed algorithm to work in both indoor and outdoor environments. (a) Incorrect occlusion. Speed: The processing and sampling of the depth map is parallelized by taking advantage of the GLSL fragment shader and the RTT technique. (2010) designed an interactive segmentation and object tracking method for real-time occlusion. Tian et  al. Ryu et al. A fundamental step to correct occlusion handling is obtaining an accurate measurement of the distance from the virtual and real object to the user’s eye. In the authors’ research. However. and a depth-based approach using a stereo camera. Fortin and Hebert (2006) studied both a model-based approach using a bounding box. (2010) and Louis and Martinez (2012) tried to increase the accuracy of the depth map by a region of interest extraction method using background subtraction and stereo depth algorithms.Recent Advances in Augmented Reality for AEC Applications (a) 345 (b) FIGURE 14. It puts the least limitation on context and conditions compared with any previous approach. (2009) described a parallel research effort that adopted a similar approach for TV production in indoor environments with a 3D model constructed beforehand with the goal of segmenting a moving factor from the background. Despite the visual improvements. this approach enables improvements in three ways: 1. but their algorithm fails in the situation where virtual objects are in front of the real objects. to be seen from frame to frame. (b) Correct occlusion. Koch et al. The former works only with a static viewpoint and the latter is subject to low-textured areas. In an outdoor . only simple background examples were demonstrated. a robust AR occlusion algorithm was designed and implemented that uses a real-time time-of-flight (TOF) camera. it can work regardless of the spatial relationship among involved virtual and real objects. Robustness: Using the OpenGL depth-buffering method.

only those fragments that were not obscured by any others remain visible. the background of the real scene is drawn as usual. the TOF camera measures radio frequency (RF)-modulated light sources with phase detectors. The locations of the virtual objects are predefined by the program. It is capable of capturing a complete scene with one shot. In a simulated construction operation. the distance from the virtual object to the viewpoint can be calculated using the Vincenty algorithm (Vincenty 1975).. on the other hand. If enabled in the OpenGL drawing stage. either hidden by a real object or another virtual object. the fragment will not be drawn unless its corresponding depth value is smaller than the previous one.2  Two-Stage Rendering Depth buffering.g.2. This algorithm interprets the metric distance based on the geographical locations of the virtual object and the user. Specifically.2. the distance between the OpenGL camera and the virtual object is no longer the physical distance (Shreiner et al. The transformation model is explained in Table 14. Depth buffering thus provides a promising approach for solving the AR occlusion problem. and is usually done efficiently in the graphics processing unit (GPU). will be correctly occluded. the SBI method is used to allow the TOF camera to work flexibly in both indoor and outdoor environments (PMD 2010). 2010). but with the depth map retrieved from the TOF camera written into the depth buffer at the same time.. In this way. For an incoming fragment at a certain pixel.2. A TOF camera estimates the distance from the real object to the eye with the help of the TOF principle. the invisible part of virtual object. Figure 14. the depth buffer keeps record of the closest depth value to the observer for each pixel. is the solution for hidden-surface elimination in OpenGL. In the first stage of rendering.2. Therefore. In the second stage.g. the virtual objects are drawn with depth buffer testing enabled. from the transmitter to the receiver (Beder et al. with well-defined speed. Consequently. for example. 2007). 14. and with speeds of up to 40 frames per second (fps). the geographical locations of virtual building components and equipment are extracted from the engineering drawings.10 shows a two-stage rendering method. then the corresponding depth value in the depth buffer is replaced by the smaller one. 14. GPS) carried by the user. Compared to traditional light detection and ranging (LIDAR) scanners and stereo vision. The location of the viewpoint. If it is drawn.2. is tracked by a position sensor (e. A depth buffer is a 2D array that shares the same resolution with the color buffer and the viewport. and the phase shift of that carrier is measured on the receiver side to compute the distance (Gokturk et al.3  Implementation Challenges Despite the straightforward implementation of depth buffering.346 Fundamentals of Wearable Computers and Augmented Reality AR environment. After being processed through the OpenGL graphics pipeline and written into the depth buffer. common TOF cameras are vulnerable to background light (e. 2006). which measures the time that a signal travels. However. also known as z-buffering. In the authors’ research. the TOF camera features real-time feedback with high accuracy. after the entire scene has been drawn. The modulated outgoing beam is sent out with an RF carrier. there are several challenges when integrating the depth buffer with the depth map from the TOF camera: 1. the distance . artificial lighting and the sun) that generates electrons and confuses the receiver.

.) TABLE 14. +∞) Zc =  Z e * ( f + n) 2 * f * n * We − f −n f −n [−n.Recent Advances in Augmented Reality for AEC Applications De buf pth fer B RG age im First rendering stage Co buf lor fer Hidden surface removal h pt e De ag im 347 AR registration Second rendering stage FIGURE 14.10  Two-stage rendering for occlusion handling. before it is written into the depth buffer for comparison. and its result is written into the color buffer. with its result being written into both the color and depth buffers. 607. Civ. and an OpenGL camera. 2. 1] for each pixel from the real object to the viewpoint recorded by the TOF camera has to be processed by the same transformation model. The TOF camera acquires the depth map of the real scene.. 27(6). . Eng. In order to ensure correct alignment and occlusion. 2013. (From Dong. S. The OpenGL camera projects virtual objects on top of real scenes. The video camera captures RGB or intensity values of the real scene as the background. Comput. J. 1] [0. ideally all cameras should share the same projection parameters (the principle points and focal lengths).5 2*( f − n) Ze*( f − n) [−1.2 Transformation Steps Applied to the Raw TOF Depth Image Name Meaning Operation Expression Ze Distance to the viewpoint Acquired by TOF camera Zc Clip coordinate after projection transformation Mortho * Mperspective * [Xe Ye Ze We]T Zcvv Canonical view volume Zc/Wc (Wc = Ze and is the homogenous component in clip coordinate) Z cvv =   Zd Value sent to depth buffer (Zndc + 1)/2 Zd = Range (0. et al. and its result is written into the depth buffer. There are three cameras for rendering an AR space: a video camera. f ] where n and f are the near and far planes and We is the homogenous component in eye coordinate and is usually equal to 1 f +n 2* f *n − f − n Z e * ( f − n) f +n f *n − + 0. a TOF camera.

27(6).348 Fundamentals of Wearable Computers and Augmented Reality Even though the TOF camera provides an integrated intensity image that can be aligned with the depth map by itself. focal lengths. The projection parameters of OpenGL camera are adjustable and can accommodate either an RGB or TOF camera. if an external video camera is used. (From Dong. On the other hand. the RTT technique is used to carry out the interpolation and registration computation in parallel. Civ.. (a) (b) FIGURE 14.e. an alternative and efficient approach using OpenGL texture and GLSL is used. 2013. different principle points. S.11 shows snapshots of the occlusion effect achieved in this research by using homography mapping between the TOF and RGB camera.e. This implies the necessity of interpolation between the TOF depth map and the depth buffer. some image registration methods are required to find the correspondence between the depth map and the RGB image. then the intrinsic and extrinsic parameters of the video camera and TOF camera may not agree (i. Eng. Therefore. Comput.. image registration demands an expensive computation budget if a high-resolution viewport is defined. Dong (2012) provided a detailed description of two methods including a nonlinear homography estimation implementation adopted from Lourakis (2011) and stereo projection that are used to register the depth map and RGB image. and distortions). Traditional OpenGL pixel-drawing commands can be extremely slow when writing a 2D array (i. In the authors’ research. 607. The resolution of the TOF depth map is fixed as 200 × 200.. Figure 14. Furthermore. In the authors’ research. while that of the depth buffer can be arbitrary.11  Occlusion effect comparison using homography mapping between the TOF camera and the RGB camera: (a) occlusion disabled and (b) occlusion enabled. depending on the resolution of the viewport. the monocular color channel compromises the visual credibility. 3.) . et al. 4. the depth map) into the frame buffer.. J.

Eng.12  Indoor simulated construction processes with occlusion (a) disabled and (b) enabled. 2013.5) on the depth map. J. its distance is represented as 0. if the standard measurement range is 7. the TOF camera is positioned about 7.engin. The maximum valid range is limited by the RF carrier wavelength.. since the receiver decides the distance by measuring the phase offset of the carrier. 607.4  Experimental Results Despite the outstanding performance of the TOF camera in speed and . For instance. object detection and segmentation were explored as possible options to mitigate this limitation and the experiment range is intentionally restricted to within 7. In the indoor experiment. a forklift picks up a virtual piece of cardboard in front of the virtual stack.. facing the wall. In the meantime.12 shows snapshots of indoor experiments to validate the occlusion (a) (b) FIGURE 14.5 m. In both cases. Demonstration videos of both experiments are maintained and can be found at http://pathfinder. et al. (From Dong. and then puts a physical bottle beside the virtual cardboard.2. instead of 8 m. Civ. and an object happens to be 8 m away from the camera. Two sets of validation experiments were conducted in both indoor and outdoor environments.umich. Figure 14. This can create incorrect occlusion in outdoor conditions.Recent Advances in Augmented Reality for AEC Applications 349 14. and maneuvers to put it on top of a physical piece of cardboard. S. a construction worker passes by with a buggy.htm.5 m.5 m (8 mod 7.5 m.5 m away. 27(6). where ranges can easily go beyond 7. In the authors’ research. All of the virtual models are courtesy of the Google 3D Warehouse community. Comput. the biggest technical challenge it faces is modular error.

2012).. and operates the virtual minidozer. Sequential statements written in this language can describe a smooth and continuous operation of arbitrary length and complexity. Appropriate data structures. The communicated statements (i.13  Outdoor simulated construction processes with occlusion (a) disabled and (b) enabled.3. In the outdoor experiment. pushes the debris to the virtual pile of dirt with a real shovel. 607.e. Eng. J. Comput. a running DES model) to author a dynamic visualization in AR (Behzadan et al. The worker then jumps off the scissor lift.3. It is clear from the composite visualization that occlusion provides much better spatial cues and realism for outdoor AR visual simulation. et al. and routines are then invoked to manipulate CAD models and other 3D geometric primitives to present a smooth. 14. Figure 14.1. events) are interpreted by the visualization engine of the AR application.3  SOFTWARE AND HARDWARE FOR AR IN AEC APPLICATIONS 14... (From Dong.350 Fundamentals of Wearable Computers and Augmented Reality (a) (b) FIGURE 14. 2013. .) correctness. Civ. accurate representation of the operations.1 Software Interfaces 14.1 ARVISCOPE ARVISCOPE was the first-generation expressive self-contained AR animation authoring language that was designed to allow an external software process (e..g. algorithms. S. 27(6).13 shows snapshots of outdoor experiments to validate the occlusion correctness. a construction worker stands on a virtual scissor lift and paints the wall.

2. the time-ordered sequence of animation statements written out by all the activities in the model during a simulation run constitutes the trace file required to visualize the modeled operations in AR.14 shows how two new lines are created and added to the animation trace file describing a simple earthmoving operation as a result of a statement added to the simulation input file of the same operation.2f\059 TRAVEL Hauler%. For example. During this process.ResNum Return. and the graphical representation corresponding to the event in each line of the trace file is simultaneously created and depicted inside the user’s augmented viewing frustum.00.2f\059\n'' SimTime Return.of ReturnRoad %. the syntax of the language is not very complex.3.. The completed trace file will contain other lines of text that will be written out when other parts of the modeled operation take place.e.00. … FIGURE 14.Hauler.1. Thus.Duration.g. and control statements (Behzadan 2008). the individual statements are processed. These two lines will be written to the trace file numerous times with different arguments (e. user’s positional and head orientation data are continuously obtained and processed. Despite the fact that the ARVISCOPE authoring language is powerful enough to describe the complexities involved in a typical construction operation. These statements can be sequentially recorded into and interpreted from a text file referred to as the animation trace file.. The animation trace file is sequentially interpreted line by line as soon as the application starts. including additional code and statements in a simulation model).2. object name. duration. The animation trace file can be created either manually (for short animations) or automatically during a simulation run. Dump AR language (arviscope) … Generating SIMTIME 12. Manual generation of an animation trace file is typically not practical except in the case of simple demonstrative examples of short animated duration. .3. DES tool (stroboscope) Soil Loader Soil Load Haul Hauler Return ONSTART Return PRINT ATF Instrumenting ''SIMTIME %.351 Recent Advances in Augmented Reality for AEC Applications In order to create a viewing frustum with the user’s eye at the center of the projection. route name) depending on the specific instance of the activity taking place. ARVISCOPE language statements are grouped into scene construction. time tag. dynamic. Automatic generation of a trace file is more recommended since it requires less time and produces more accurate results. Details of the tracking devices used to achieve this are also described in Section 14. TRAVEL Hauler1 ReturnRoad 15.14  Sample instrumentation of a DES input file for automated generation of the corresponding ARVISCOPE animation trace file. Automatic generation of an animation trace file requires instrumentation of a simulation model (i.1. the user can freely move in the animated augmented space. using the procedure described in Section 14. Figure 14. Based on their functionality.

and placing them repeatedly at appropriate locations using multiple transformation nodes. postdisaster reconnaissance of damaged buildings. . 14.352 Fundamentals of Wearable Computers and Augmented Reality FIGURE 14. It should be noted that the simulation input file partially shown in Figure 14. and to maintain performance ­levels as the size of the operation increases.14 can be created by any DES authoring language such as STROBOSCOPE (State and Resource Based Simulation of Construction Processes).15  Animated structural steel erection operations in ARVISCOPE. The operation consisted of a virtual tower crane that picked up steel sections and installed them in their appropriate locations on the steel structure.3.3.15 shows animation snapshots of a structural steel erection operation that was modeled in STROBOSCOPE and visualized in full-scale outdoor AR using ARVISCOPE. Scalability allows the creation of very complex scenes such as the erection of an entire structural steel frame consisting of several beams and columns by loading only a few CAD models of steel sections.1. and visualization of simulated construction processes. Instead.2 SMART Scalable and Modular Augmented Reality Template (SMART) is an extensible AR computing framework that is designed to deliver high-accuracy and convincing augmented graphics that correctly place virtual contents relative to a real scene and robustly resolve the occlusion relationships between them.1) and is a loosely coupled interface that is independent of any specific engineering application or domain. ARVISCOPE supports animation scalability. SMART is built on top of the previously designed ARVISCOPE platform (Section 14. The multistory steel structure shown in this figure was completely modeled and animated in the augmented scene using CAD models of only a few steel sections.1. it can be readily adapted to an array of engineering applications such as visual collision avoidance of underground facilities. Figure 14. STROBOSCOPE is a programmable and extensible simulation system designed for modeling complex construction operations in detail and for the development of special-purpose simulation tools (Martinez 1996). which in the context of the authors’ research is defined as the ability of the visualization to construct complex scenes that potentially consist of a large number of CAD objects.

1 UM-AR-GPS-ROVER The designed software interface must be accompanied by a robust and easy-todeploy hardware platform that enables users to perform operations in both indoor and outdoor settings.3. a first-generation wearable hardware apparatus called . an adaptive lag compensation algorithm is designed to eliminate the dynamic misregistration. The controller keeps pointers to the graph and the scene. Applications derived from SMART are single document interface (SDI). Given the fact that the user’s head can be in continuous motion. 14. The update of a virtual object’s status is reflected when it is time to refresh the associated graphs. while the latter defines the relation among scene. Therefore. which is in charge of CARSensorForeman and CARSiteForeman. CARController. 2. the graph always invokes callbacks to rebuild the transformation matrix based on the latest position and attitude measurement. 3. and CARGraph and the connection of graphs to the appropriate scene. it orchestrates the creation of CARScene. The controller manages all user interface (UI) elements. therefore. and refreshes the background image. The graph corresponds to the view and reflects the AR registration results for each frame update event. The FIR filter applied to jittering output of the electronic compass leads to filter-induced latency. multiple threads are dynamically generated for reading and processing sensor measurement immediately upon the data arrival in the host system. there is only one open scene and one controller within a SmartSite. The former initializes and manages all tracking devices such as real-time kinematic (RTK) GPS receivers and electronic compasses. which includes the following: 1. 2.2 Hardware Platforms 14. The model counterpart in SMART is the scene that utilizes application-­ specific input/output (I/O) engines to load virtual objects. The SMART framework that is based on a scene–graph–controller setup is shown in Figure 14. and that maintains their spatial and attribute status. Therefore. graphs.2. Scene–graph–controller is the implementation of the MVC pattern in SMART and is described in the following: 1. Some efforts have also been made to reduce dynamical misregistration.3.Recent Advances in Augmented Reality for AEC Applications 353 The inbuilt registration algorithm of SMART guarantees high-accuracy static alignment between real and virtual objects. and controller. The SMART framework follows the classical model–view–controller (MVC) pattern. Once a CARSiteForeman object is initialized.16 and is constructed in the following way: the main entry of the program is CARApp. and responds to a user’s commands by invoking delegates’ member functions such as a scene or a graph. In order to reduce synchronization latency.

.354 1 1 1 CARSiteManager –Attributes +Operations() 1 +Operations() 1 CARSensorForeman –Attribute +Operation() 1 1 1. +Operations() 1 1 <<uses>> CARMotionTracker –Attributes 1 1 CARStatementProcessor CARAnimation –Attributes +Operations() –Attributes +Operations() <<uses>> <<uses>> CARLocation –Attributes +Operations() CAROrientation –Attributes +Operations() Fundamentals of Wearable Computers and Augmented Reality SMARTVideo –Attributes CARApp –Attribute +Operation() . 1 SmartSite –Attributes +Operations() <<uses>> 1 CARTrackerCallback –Attributes +Operations() 1 1 1 1 CARControllerA <<uses>> CARSceneA –Attributes –Attributes +Operations() +Operations() 1 1..* CARGraphA –Attributes +Operations() FIGURE 14.16  SMART framework architecture.

3. . Computing devices capable of rapid position calculation and image rendering including an interface for external input (i. 14. 2008). However. The backpack was also too heavy for even distribution of weight around the body. there are two primary design defects that are inadequately addressed: accuracy and ergonomics. and wires into a single backpack makes it impossible to accommodate more equipment such as RTK Rover radio.18 shows the main hardware components of UM-AR-GPS-ROVER that include a head-mounted display (HMD). An interface to display the final augmented view to the user 3.. and a mobile laptop computer to control and facilitate system operation and user I/O devices.17  Overview of the UM-AR-GPS-ROVER hardware framework. UM-AR-GPS-ROVER was equipped with the following: 1. Figure 14. UM-AR-GPS-ROVER succeeded in reusability and modularity. Second.Recent Advances in Augmented Reality for AEC Applications 355 GPS sensor Tracker (hidden) Video camera Head-mounted display Laptop Touch pad FIGURE 14. and produced sufficient results in proof-of-concept simulation animation. External power source for the hardware components to ensure continuous operation without restricting user mobility The design also had to take into account ergonomic factors to avoid user discomfort after long periods of operation.2. The Augmented Reality Mobile OpeRation (ARMOR) platform evolves from the UM-AR-GPS-ROVER platform. both user commands and a video capturing the user’s environment) 2.2 ARMOR As a prototype design.17 shows the configuration of the backpack and the allocation of hardware. the insecure placement of tracking devices disqualifies the UM-AR-GPS-ROVER from the centimeter-accuracy-level goal. First. Figure 14. user registration and tracking peripherals. packaging all devices. UM-AR-GPS-ROVER was designed in which GPS and three DOF head orientation sensors were used to capture a user’s position and direction of look (Behzadan et al.e. power panels.

Intuitive user command input. The indoor mode does not necessarily imply that the GPS signal is unavailable. The GPS signal quality can be extracted from the $GGA section of the GPS data string that follows the National Marine Electronics Association (NMEA) format. 3. Load-bearing vest to accommodate devices and distribute weight evenly around the body. (h) i-Glasses SVGA Pro Head-Mounted Display. (k) WristPC Wearable Keyboard. (g) Helmet. (e) Trimble GPS Antenna (mounted on the backpack). The improvements featured in ARMOR can be broken into four categories: 1..356 Fundamentals of Wearable Computers and Augmented Reality f e b g h c i Helmet components a Backpack components d j k Input devices FIGURE 14. Lightweight selection of I/O and computing devices and external power source. (d) Powerbase NiMh External Portable Power Pack. but that the qualified GPS signal is absent. (j) Cirque Smart Cat Touch Pad. ARMOR introduces high-accuracy and lightweight devices. and renovates the carrying harness to make it more wearable. (f) TCM5 3-Axis Orientation Tracker (hidden inside the helmet). (b) Sony Vaio Laptop. and 5 means float RTK. The user can define the standard (i. The fix quality ranges from 0 to 8. ARMOR can work in both indoor and outdoor modes. Highly accurate tracking devices with rigid placement and full calibration. 2 means differential GPS (DGPS) fix. 4.18  Hardware components of UM-AR-GPS-ROVER (a) Kensington Contour Laptop Backpack. (i) Fire-i Digital Firewire Camera. An overview comparison between UM-AR-GPS-ROVER and ARMOR is listed in Table 14. (c) Trimble AgGPS 332 Receiver. which fix quality is deemed as qualified) in the hardware .3. rigidly places all tracking instruments with full calibration.e. 4 means RTD fix. 2. For example.

g. and this pseudo-location can be controlled by a keyboard.5 cm horizontal accuracy and 3. The configuration of the vest has several advantages over the Kensington Contour laptop backpack used by ARVISCOPE. Third. Extensible and easy to access equipment. Wii Remote is lightweight and intuitive to use. configuration file.3 Comparison between UM-AR-GPS-ROVER and ARMOR Platforms Component UM-AR-GPS-ROVER ARMOR Comparison Location tracking Trimble AgGPS 332 using OmniStar XP correction for differential GPS method Orientation tracking PNI TCM 5 Trimble AgGPS 332 using CMR correction broadcast by a Trimble AgGPS RTK Base 450/900 PNI TCM XB Video camera Fire-I digital FireWire camera Microsoft LifeCam VX-5000 Head-mounted display Laptop i-Glasses SVGA Pro video see-through HMD Dell Precision M60 notebook eMagin Z800 3DVisor User command input Nintendo Wii Remote Power source WristPC wearable keyboard and Cirque Smart Cat touchpad Fedco POWERBASE OmniStar XP provides 10–20 cm accuracy.19 shows the configuration of the ARMOR backpack and the allocation of hardware. the design of the pouches allows for an even distribution of weight around the body.. Same accuracy. different parts of the loading vest are loosely joined . Second. has small volume. The wire lengths are customized to the vest. Otherwise. First. and has small volume and less wire. AAA batteries) are distributed in the auxiliary pouches.7 cm vertical accuracy. and the leftside pouch holds the HMD connect interface box to a PC and the MP3750 battery. RTK provides 2. weight. USB to serial port hubs. a preset pseudo-­location is used.357 Recent Advances in Augmented Reality for AEC Applications TABLE 14. Asus N10J is lightweight. There are three primary pouches: The back pouch accommodates the AgGPS 332 Receiver. the SiteNet 900 is stored in the right-side pouch. and rigidity allows that all components be compacted and secured into one load-bearing vest. LifeCam VX-5000 is lightweight. and is equipped with NVIDIA GPU. the separation of devices allows the user to conveniently access and checks the condition of certain hardware. Figure 14. The optimization of all devices in aspects such as volume. the geographical location is extracted from the $GPGGA section of the GPS data string. but ARMOR places TCM XB rigidly close to camera. which minimizes outside exposure. An Asus N10J netbook is securely tied to the inner part of the back pouch. Z800 3DVisor is lightweight with stereovision. Backpack apparatus Kensington contour laptop backpack Load-bearing vest Asus N10J netbook Tekkeon myPower ALL MP3750 MP3750 is lightweight and has multiple voltage output charging both GPS receiver and HMD. When a qualified GPS signal is available. All other miscellaneous accessories (e.

4  IMPLEMENTED AEC APPLICATIONS 14. especially given that building inspectors do not have enough opportunities to conduct building safety assessments and verify their judgments. so that the vest can fit any body type. Despite the de facto national standard of the ATC-20 convention. Most of these approaches. A large IDR indicates a higher likelihood of damage. The assessment procedure can take from minutes to days depending on the purpose of the evaluation (Vidal et al. or red (unsafe) for immediate occupancy (Chock 2006). which provide procedures and guidelines for making on-site evaluations (Rojah 2005).06 translate to severe damage (Krishnan 2006). as earthquakes are infrequent. build on the premise that significant local structural damage manifests itself as translational displacement between consecutive floors. The interstory drift ratio (IDR). is a critical structural performance indicator that correlates the exterior deformation with the internal structural damage. yellow (limited entry). Tubbesing 1989) that this approach is subjective and thus may sometimes suffer from misinterpretation. and b