Professional Documents
Culture Documents
Santa Barbara
W
A Dissertation submitted in partial satisfaction
IE
of the requirements for the degree of
EV
Doctor of Philosophy
in
Computer Science
PR
by
Committee in Charge:
Professor Timothy Sherwood, Chair
Professor Tobias Höllerer
Professor Chandra Krintz
September 2013
UMI Number: 3602010
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
W
IE
UMI 3602010
Published by ProQuest LLC (2013). Copyright in the Dissertation held by the Author.
EV
Microform Edition © ProQuest LLC.
All rights reserved. This work is protected against
unauthorized copying under Title 17, United States Code
PR
ProQuest LLC.
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346
The Dissertation of
Jeffrey Casper Browne is approved:
W
IE
Professor Tobias Höllerer
EV
Professor Chandra Krintz
PR
July 2013
Sketch Practically Anywhere:
W
Capturing, Recognizing, and Interacting with Physical Ink Using Commodity
Hardware
IE
Copyright
c 2013
EV
by
iii
W
To Zoey, this was only possible with you by my side.
IE
EV
PR
iv
Acknowledgements
oration with, and in the presence of, many fantastic scientists and engineers. I
would like to thank the members of the UCSB Archlab for their input, for listen-
ing to my many practice talks, and for generally keeping a friendly, collaborative
W
from Jonny Valamehr, Mohit Tiwari, Hassan Wassel, Xun Li, Bryce Boe, and of
IE
course my predecessor in sketch recognition research, Ryan Dixon.
Carpendale, and Yann Riche. I would also like to thank my mentors at Citrix
Online, Florian Winterstein and Albert Alexandrov, for their research and engi-
neering guidance.
Finally, I would like to thank my PhD committee for their insights and com-
Höllerer, and Tevfik Bultan helped guide my dissertation research to its final result
through their expert knowledge in system building, interface design, and software
v
engineering theory. Most especially, I would like to thank my advisor, Professor
Tim Sherwood. His passion and knowledge for all aspects of Computer Science
mediately apparent, directions. Further, his guidance for writing and presenting
it accessible to others.
W
IE
EV
PR
vi
Curriculum Vitæ
Jeffrey Casper Browne
Education
Santa Barbara.
W
2008 Bachelor of Science in Computer Science with minors in Mathe-
Selected Publications
June 2012.
vii
Jeffrey Browne, Bongshin Lee, Sheelagh Carpendale, Nathalie
W
mantics of Erasure in Sketch Applications,” In IUI 2011 Sketch
viii
Abstract
Sketch Practically Anywhere:
Capturing, Recognizing, and Interacting with Physical Ink
Using Commodity Hardware
Jeffrey Casper Browne
When faced with complex design, analysis, or engineering tasks, novices and
W
professionals alike attempt to better understand problems through diagrams, and
IE
a natural first step in this process is working on a whiteboard. Through their
drawings, people can gain valuable insights into subtleties of design and analysis
EV
tasks, but once a diagram gains sufficient complexity, further progress becomes
Sketch recognition interfaces over the last few decades have sought to ease this
the structures they want to analyze, leveraging their previous experience with
drawing diagrams. From circuit design, chemical analysis, and even 3D modeling,
these applications have allowed people to more effectively utilize the power of
ix
CAD firms or art studios, with the whiteboard-scale equivalents, necessary for col-
laborative design tasks, being even more exotic. The goal of this work is to utilize
able) to enable sketch recognition where people are already drawing: whiteboards,
In service of this goal, we have created SPARK, the Sketch Practically Any-
W
where Recognition Kit. Our system enables a person to interact with real world
drawings by recognizing meaning from images of hand drawn diagrams that are
IE
captured via a smartphone or a webcam, and by providing an interface through
EV
augmenting projectors or the phone’s own display. The system is constructed in
for extracting stroke data from static images, and finally a component to extract
PR
key frames from a video stream of an active whiteboard for interactive recognition.
that exercise each module: SketchVis applies traditional, virtual stroke sketch
scale interface. Our Turing machine app enables simulation of Turing machine
diagrams drawn with physical ink through a mobile, explicit capture interface.
x
the continuous sketch recognition of—and interaction with—physical ink captured
with a webcam.
W
IE
EV
PR
xi
Contents
W
Acknowledgments v
Abstract ix
List of Figures xv
EV
1 Introduction 1
1.1 Overview of Contributions . . . . . . . . . . . . . . . . . . . . . . 5
PR
2 Related Work 11
2.1 Sketch Recognition as an Interaction Method . . . . . . . . . . . . 12
2.2 Sketch Recognition Development Frameworks . . . . . . . . . . . 15
2.3 Physical Ink: Recognition and Interaction . . . . . . . . . . . . . 18
2.3.1 Synchronized Physical and Virtual Ink . . . . . . . . . . . 18
2.3.2 Vision-Based Interaction with Physical Ink . . . . . . . . . 20
3 Recognition 25
3.1 Recognition Framework Architecture . . . . . . . . . . . . . . . . 28
3.2 Modular Sketch Application Design Patterns . . . . . . . . . . . . 32
3.2.1 Visualizers and Markers . . . . . . . . . . . . . . . . . . . 32
3.2.2 Factory Observers . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.3 Collectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.4 Hierarchical Structuring of Observers . . . . . . . . . . . . 37
xii
3.2.5 Debug Observer . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 State-Based Semantics of Sketch Recognition . . . . . . . . . . . . 40
3.3.1 Previous Work in Semantics and Erasure . . . . . . . . . . 45
3.3.2 The Semantics of Recognition . . . . . . . . . . . . . . . . 48
3.3.3 Intermediate Semantics . . . . . . . . . . . . . . . . . . . . 55
3.3.4 Emulating Common Features . . . . . . . . . . . . . . . . 63
3.3.5 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 Example Sketch Application: SketchVis . . . . . . . . . . . . . . . 66
3.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4.2 Related Work for Sketch Visualization . . . . . . . . . . . 69
3.4.3 Iterative Design . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4.4 Design Challenges . . . . . . . . . . . . . . . . . . . . . . . 72
W
3.4.5 System Description . . . . . . . . . . . . . . . . . . . . . . 74
3.4.6 System Architecture . . . . . . . . . . . . . . . . . . . . . 79
3.4.7 Insights into Charting through Sketch Recognition .
IE . . . . 84
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
xiii
5.1.7 Visual Echo Cancellation . . . . . . . . . . . . . . . . . . . 141
5.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 144
5.2.1 Calibration Phase . . . . . . . . . . . . . . . . . . . . . . . 145
5.2.2 Webcam Capture . . . . . . . . . . . . . . . . . . . . . . . 146
5.2.3 Board Change Watcher . . . . . . . . . . . . . . . . . . . . 146
5.2.4 Stroke Extraction . . . . . . . . . . . . . . . . . . . . . . . 147
5.2.5 Sketch Recognition and Display . . . . . . . . . . . . . . . 148
W
6.2.2 Error Rectification . . . . . . . . . . . . . . . . . . . . . . 153
6.2.3 Beyond Draw-Erase Interaction . . . . . . . . . . . . . . . 154
6.2.4 Augment the Environment Directly . . . . . . . . . . . . .
IE 155
6.3 Contributions Beyond Sketch Recognition . . . . . . . . . . . . . 156
6.3.1 Foreground Filtering for Whiteboard Sharing and Archiving 156
6.3.2 Stroke Extraction as Vectorization . . . . . . . . . . . . . 157
EV
6.3.3 Minimalistic Computer-Augmented Environments . . . . . 158
Bibliography 161
PR
xiv
List of Figures
W
1.1 Continuously capturing, recognizing, and interacting with physical
ink involves three major components: a) the stroke recognition frame-
work, b) the single-image stroke extraction module, and c) the video
IE
stream-based board change watcher. . . . . . . . . . . . . . . . . . . . 6
xv
3.9 SketchVis helps people explore a crime rates data set through sim-
ple, interactive sketches. . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.10 SketchVis is a stand-alone sketch recognition application created
in our development framework and is composed of hierarchical markers
for recognition, as well as visualizers for UI rendering. . . . . . . . . . 80
W
cessed by the remote server, which isolates ink in the original image (2),
extracts stroke information from the ink (3), and performs recognition
on the generated strokes (4). The final semantic meaning of the diagram
IE
is sent back to the phone, where a user can simulate the Turing machine
(5). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3 The overall process of recognizing sketched ink from an image of a
whiteboard. The raw image (a) is processed to remove the background
EV
shadows and reflections (b), and is contrast boosted (c) before binariza-
tion (d). Strokes are thinned and traced (e) and then submitted for
basic glyph recognition (f), and finally assembled into a Turing machine
and displayed on the phone(g) . . . . . . . . . . . . . . . . . . . . . . . 98
PR
4.4 The long tails in the value histogram associated with the light
(4.4a) versus dark (4.4b) ink better distinguish surfaces than the median
value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.5 An illustration of the spurious edge artifacts from basic thinning.
The true stroke can be iteratively merged by assuming intersections are
in the stroke and iteratively connecting points not already covered by
the thickness of the current true stroke. . . . . . . . . . . . . . . . . . 103
4.6 An illustration of the split intersections due to thinning. If the re-
gions defined by stroke thickness surrounding two intersections overlap,
they are collapsed into one. . . . . . . . . . . . . . . . . . . . . . . . . 105
4.7 Accuracy of the classifier when training and classifying on traced
strokes as a factor of accuracy when using actual stroke data as captured,
with all features enabled. . . . . . . . . . . . . . . . . . . . . . . . . . . 110
xvi
5.2 A screenshot of the sketch-based equation graphing application.
Recognized mathematical expressions are plotted within the nearest
“chart area” while the text itself is underlined in the same color. Note
that the image values have been inverted, and virtual strokes are shown
for clarity; when in use, the background is projected black, while strokes
are not projected at all, such that only augmenting light is projected. . 124
5.3 Extending the single image capture methods to support continuous
capture involves three major steps: (a) separating out ink differences
within each live frame, (b) grouping and filtering those differences to
produce snapshots of whole-stroke update events, and (c) using stroke
extraction methods to generate new strokes and erasure events. . . . . 128
5.4 (a) The whiteboard is often occluded during the drawing process.
W
(b) In order to use the single-image stroke extraction framework, we
must filter foreground objects from the scene. . . . . . . . . . . . . . . 129
5.5 We leverage the thinness of whiteboard ink √
IE to isolate pixels that
have an ink update. Here, the user has drawn x, and brighter ar-
eas mean greater difference. The raw difference image (a) is eroded
to remove narrow components (b). The removed components are then
EV
isolated and binarized to find “ink-like” update areas (c). . . . . . . . . 131
5.6 Most non-ink differences can be ignored by creating a mask (5.6b)
of large “blob” differences that remain after smoothing the image (5.6a).
The unmasked differences that remain (5.6c) are then considered√“ink-
like” enough for further processing, in this case the newly drawn x. . 133
PR
xvii
5.10 (Continued) The pixels that are lighter than their surroundings in
the previous model are masked out of the new ink image (5.10a) while
the lighter pixels in the live image are filtered from the erased ink model
image (5.10b). In this case, there are no updates to the ink, so no strokes
are sent for recognition or erasure. . . . . . . . . . . . . . . . . . . . . 140
5.11 The final architecture that makes up the SPARK system. A contin-
uous stream of images is captured and displayed by a stand-alone process
(a.), which feeds those images to a board change watcher process (b.)
that filters foreground objects. Images of board difference updates are
passed to the stroke extraction process (c.) for stroke separation and
tracing, before the strokes themselves are fed to the sketch recognition
framework (d.), which performs both recognition as well as graphical
W
annotation (e.) through the projector. Data is passed using Python
synchronized Queues both between processes and between threads. . . 143
IE
EV
PR
xviii
List of Tables
W
4.1 Classification features determined to be timing-dependent. These
features were disabled when generating results for R− and Rt− . . . . . 109
IE
EV
PR
xix
Chapter 1
Introduction
W
IE
Since people could draw with a stick in the dirt, maps, symbols, and other dia-
EV
grams have aided people’s thinking by leveraging our innate perceptual abilities for
problem solving. Today, diagrams permeate many aspects of our lives, with visual
PR
drawing diagrams still serves its original function of making abstract concepts
more concrete, and novices and professionals alike still make heavy use of avail-
able drawing space: paper, whiteboards, and chalkboards. However, even though
point even graphical analysis can turn into an exercise of tedious bookkeeping,
1
Chapter 1. Introduction
Though computer-based tools can aid users in scaling their analyses well be-
yond a person’s manual abilities, the user interfaces often stand in the way, obfus-
cating common tasks like design specification through their reliance on toolbars,
Draw1 requires a great deal of training and experience with just the interface before
a user can effectively use the system[47]. This problem of opaque interfaces arises
W
in many disciplines, and the time commitment is required even for experts in a
field, as offline experience often does not necessarily translate to familiarity with
IE
an application’s interaction methods.
EV
Sketch recognition interfaces have sought to address these common UI short-
traditional low-tech diagramming methods with which people are already famil-
PR
iar. Systems in many various domains, from circuit design, to chemical analysis,
have eased the input process for state-of-the-art analysis tools by the leveraging
hand-drawn diagrams, providing for direct, natural human input to the tool.
the capabilities of modern pen capture hardware to digitize the user’s drawing
1
PerkinElmer Inc., http://www.cambridgesoft.com/Ensemble for Chemistry/ChemDraw/
2
Chapter 1. Introduction
actions. Desktop capture surfaces are rarely seen outside of art studios, design
deployable solution. Though the prices of tablets and pen digitizing surfaces
decreases with time like any technology, the devices needed to capture strokes
W
drawn by users are still bulky, relatively expensive, and ultimately relegated to a
niche market. IE
The end result of reliance on specialized hardware is that, though the sketch
EV
recognition applications can best serve students and novice designers or otherwise
untrained users, these people are the least likely to have the technology needed to
use sketch recognition software. Instead, the vast majority of users are doing their
PR
designs and analysis initially on simple, yet unaugmented, surfaces like traditional
more polished version to a computer. If these people are to benefit from sketch-
3
Chapter 1. Introduction
employing computer vision with these pieces of common consumer devices, we en-
able people to directly interact with their diagrams through drawing and erasing
physical ink combined with virtual augmentations, either projected directly onto
W
IE
EV
PR