You are on page 1of 24

UNIVERSITY OF CALIFORNIA

Santa Barbara

Sketch Practically Anywhere:


Capturing, Recognizing, and Interacting with
Physical Ink Using Commodity Hardware

W
A Dissertation submitted in partial satisfaction
IE
of the requirements for the degree of
EV
Doctor of Philosophy

in

Computer Science
PR

by

Jeffrey Casper Browne

Committee in Charge:
Professor Timothy Sherwood, Chair
Professor Tobias Höllerer
Professor Chandra Krintz

September 2013
UMI Number: 3602010

All rights reserved

INFORMATION TO ALL USERS


The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.

W
IE
UMI 3602010
Published by ProQuest LLC (2013). Copyright in the Dissertation held by the Author.
EV
Microform Edition © ProQuest LLC.
All rights reserved. This work is protected against
unauthorized copying under Title 17, United States Code
PR

ProQuest LLC.
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346
The Dissertation of
Jeffrey Casper Browne is approved:

W
IE
Professor Tobias Höllerer
EV
Professor Chandra Krintz
PR

Professor Timothy Sherwood, Committee Chairperson

July 2013
Sketch Practically Anywhere:

W
Capturing, Recognizing, and Interacting with Physical Ink Using Commodity

Hardware
IE
Copyright 
c 2013
EV

by

Jeffrey Casper Browne


PR

iii
W
To Zoey, this was only possible with you by my side.
IE
EV
PR

iv
Acknowledgements

This dissertation is a product of several years’ research, conducted in collab-

oration with, and in the presence of, many fantastic scientists and engineers. I

would like to thank the members of the UCSB Archlab for their input, for listen-

ing to my many practice talks, and for generally keeping a friendly, collaborative

atmosphere in our ocean-front lab. Specifically, I would acknowledge the support

W
from Jonny Valamehr, Mohit Tiwari, Hassan Wassel, Xun Li, Bryce Boe, and of
IE
course my predecessor in sketch recognition research, Ryan Dixon.

Much of the work presented in this dissertation is a direct (as in SketchVis)


EV
or indirect product of my experience in industry internships. Working with pro-

fessionals beyond the university environment provided valuable experience in re-


PR

search methods outside of academia, and I would like to acknowledge my fantas-

tic collaborators at Microsoft Research: Bongshin Lee, Nathalie Riche, Sheelagh

Carpendale, and Yann Riche. I would also like to thank my mentors at Citrix

Online, Florian Winterstein and Albert Alexandrov, for their research and engi-

neering guidance.

Finally, I would like to thank my PhD committee for their insights and com-

ments throughout my graduate school career. Professors Chandra Krintz, Tobias

Höllerer, and Tevfik Bultan helped guide my dissertation research to its final result

through their expert knowledge in system building, interface design, and software

v
engineering theory. Most especially, I would like to thank my advisor, Professor

Tim Sherwood. His passion and knowledge for all aspects of Computer Science

have served as an inspiration for me to take my research in interesting, not im-

mediately apparent, directions. Further, his guidance for writing and presenting

technical material has helped me better understand my own research by making

it accessible to others.

W
IE
EV
PR

vi
Curriculum Vitæ
Jeffrey Casper Browne

Education

2013 Master of Science in Computer Science, University of California,

Santa Barbara.

W
2008 Bachelor of Science in Computer Science with minors in Mathe-

matics and German, Gonzaga University.


IE
Experience
EV
2011 – 2013 Graduate Research Assistant,

University of California, Santa Barbara.


PR

2012 Research Intern, Citrix Online.

2010 Research Intern, Microsoft Research.

2007 – 2008 Student Researcher, Gonzaga University.

Selected Publications

Jeffrey Browne, Timothy Sherwood: “Mobile Vision-Based Sketch

Recognition with SPARK,” In Proceedings of the International

Symposium on Sketch-Based Interfaces and Modeling (SBIM 2012),

June 2012.

vii
Jeffrey Browne, Bongshin Lee, Sheelagh Carpendale, Nathalie

Riche, Timothy Sherwood: “Data Analysis on Interactive White-

boards through Sketch-based Interaction,” In Proceedings of the

ACM International Conference on Interactive Tabletops and Sur-

faces (ITS 2011), October 2011.

Jeffrey Browne, Andre Sayre, Timothy Sherwood: “State Se-

W
mantics of Erasure in Sketch Applications,” In IUI 2011 Sketch

Recognition Workshop, February 2011.


IE
Jeffrey Browne, Andre Sayre, Timothy Sherwood: “State Se-
EV
mantics of Erasure in Sketch Applications,” In IUI 2011 Sketch

Recognition Workshop, February 2011.


PR

viii
Abstract
Sketch Practically Anywhere:
Capturing, Recognizing, and Interacting with Physical Ink
Using Commodity Hardware
Jeffrey Casper Browne

When faced with complex design, analysis, or engineering tasks, novices and

W
professionals alike attempt to better understand problems through diagrams, and
IE
a natural first step in this process is working on a whiteboard. Through their

drawings, people can gain valuable insights into subtleties of design and analysis
EV
tasks, but once a diagram gains sufficient complexity, further progress becomes

tedious (or even intractable) without the aid of a computer.


PR

Sketch recognition interfaces over the last few decades have sought to ease this

barrier to entry through pen-based interaction, enabling users to directly sketch

the structures they want to analyze, leveraging their previous experience with

drawing diagrams. From circuit design, chemical analysis, and even 3D modeling,

these applications have allowed people to more effectively utilize the power of

computation in their everyday work.

Unfortunately, despite providing a more familiar interaction style, interface

hardware requirements mean that sketch recognition interfaces still go largely

unused; desktop-scale pen capture displays presently remain largely relegated to

ix
CAD firms or art studios, with the whiteboard-scale equivalents, necessary for col-

laborative design tasks, being even more exotic. The goal of this work is to utilize

common consumer hardware (webcams, smartphones, and projectors when avail-

able) to enable sketch recognition where people are already drawing: whiteboards,

chalkboards, and even on loose paper.

In service of this goal, we have created SPARK, the Sketch Practically Any-

W
where Recognition Kit. Our system enables a person to interact with real world

drawings by recognizing meaning from images of hand drawn diagrams that are
IE
captured via a smartphone or a webcam, and by providing an interface through
EV
augmenting projectors or the phone’s own display. The system is constructed in

three parts: a stand-alone stroke-based sketch recognition framework, a module

for extracting stroke data from static images, and finally a component to extract
PR

key frames from a video stream of an active whiteboard for interactive recognition.

As evidence of our methods, we have created a series of prototype applications

that exercise each module: SketchVis applies traditional, virtual stroke sketch

recognition techniques to data exploration through charting on a whiteboard-

scale interface. Our Turing machine app enables simulation of Turing machine

diagrams drawn with physical ink through a mobile, explicit capture interface.

Finally, the equation graphing application serves as a proof of concept exercise of

x
the continuous sketch recognition of—and interaction with—physical ink captured

with a webcam.

Professor Timothy Sherwood


Dissertation Committee Chair

W
IE
EV
PR

xi
Contents

W
Acknowledgments v

Curriculum Vitæ IE vii

Abstract ix

List of Figures xv
EV

List of Tables xix

1 Introduction 1
1.1 Overview of Contributions . . . . . . . . . . . . . . . . . . . . . . 5
PR

1.2 Dissertation Road map . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Related Work 11
2.1 Sketch Recognition as an Interaction Method . . . . . . . . . . . . 12
2.2 Sketch Recognition Development Frameworks . . . . . . . . . . . 15
2.3 Physical Ink: Recognition and Interaction . . . . . . . . . . . . . 18
2.3.1 Synchronized Physical and Virtual Ink . . . . . . . . . . . 18
2.3.2 Vision-Based Interaction with Physical Ink . . . . . . . . . 20

3 Recognition 25
3.1 Recognition Framework Architecture . . . . . . . . . . . . . . . . 28
3.2 Modular Sketch Application Design Patterns . . . . . . . . . . . . 32
3.2.1 Visualizers and Markers . . . . . . . . . . . . . . . . . . . 32
3.2.2 Factory Observers . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.3 Collectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.4 Hierarchical Structuring of Observers . . . . . . . . . . . . 37

xii
3.2.5 Debug Observer . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 State-Based Semantics of Sketch Recognition . . . . . . . . . . . . 40
3.3.1 Previous Work in Semantics and Erasure . . . . . . . . . . 45
3.3.2 The Semantics of Recognition . . . . . . . . . . . . . . . . 48
3.3.3 Intermediate Semantics . . . . . . . . . . . . . . . . . . . . 55
3.3.4 Emulating Common Features . . . . . . . . . . . . . . . . 63
3.3.5 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 Example Sketch Application: SketchVis . . . . . . . . . . . . . . . 66
3.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4.2 Related Work for Sketch Visualization . . . . . . . . . . . 69
3.4.3 Iterative Design . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4.4 Design Challenges . . . . . . . . . . . . . . . . . . . . . . . 72

W
3.4.5 System Description . . . . . . . . . . . . . . . . . . . . . . 74
3.4.6 System Architecture . . . . . . . . . . . . . . . . . . . . . 79
3.4.7 Insights into Charting through Sketch Recognition .
IE . . . . 84
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4 Recognizing Strokes from a Single Image 89


EV
4.1 System Description . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.1.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . 96
4.2 Extracting Strokes from an Image . . . . . . . . . . . . . . . . . . 99
4.2.1 Image Normalization . . . . . . . . . . . . . . . . . . . . . 101
4.2.2 Bitmap Thinning . . . . . . . . . . . . . . . . . . . . . . . 102
PR

4.2.3 Stroke Tracing . . . . . . . . . . . . . . . . . . . . . . . . 106


4.3 Recognition over Extracted Strokes . . . . . . . . . . . . . . . . . 107
4.3.1 Impact of Stroke Timing on Recognition . . . . . . . . . . 107
4.3.2 Low-level Recognition . . . . . . . . . . . . . . . . . . . . 112
4.3.3 Impact of Extraction Noise on Recognition . . . . . . . . . 112
4.3.4 Turing Machine Recognition . . . . . . . . . . . . . . . . . 113
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

5 Sketch Recognition from a Continuous Image Stream 119


5.1 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.1.1 Usage Overview . . . . . . . . . . . . . . . . . . . . . . . . 123
5.1.2 Example Graphing Application . . . . . . . . . . . . . . . 125
5.1.3 Overall Architecture . . . . . . . . . . . . . . . . . . . . . 127
5.1.4 The Board Image Model and its Uses . . . . . . . . . . . . 130
5.1.5 Incremental Differences . . . . . . . . . . . . . . . . . . . . 137
5.1.6 Providing Interactive Ink . . . . . . . . . . . . . . . . . . . 138

xiii
5.1.7 Visual Echo Cancellation . . . . . . . . . . . . . . . . . . . 141
5.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 144
5.2.1 Calibration Phase . . . . . . . . . . . . . . . . . . . . . . . 145
5.2.2 Webcam Capture . . . . . . . . . . . . . . . . . . . . . . . 146
5.2.3 Board Change Watcher . . . . . . . . . . . . . . . . . . . . 146
5.2.4 Stroke Extraction . . . . . . . . . . . . . . . . . . . . . . . 147
5.2.5 Sketch Recognition and Display . . . . . . . . . . . . . . . 148

6 Future Directions and Conclusions 149


6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.2.1 Automated Semantics Guarantees . . . . . . . . . . . . . . 152

W
6.2.2 Error Rectification . . . . . . . . . . . . . . . . . . . . . . 153
6.2.3 Beyond Draw-Erase Interaction . . . . . . . . . . . . . . . 154
6.2.4 Augment the Environment Directly . . . . . . . . . . . . .
IE 155
6.3 Contributions Beyond Sketch Recognition . . . . . . . . . . . . . 156
6.3.1 Foreground Filtering for Whiteboard Sharing and Archiving 156
6.3.2 Stroke Extraction as Vectorization . . . . . . . . . . . . . 157
EV
6.3.3 Minimalistic Computer-Augmented Environments . . . . . 158

Bibliography 161
PR

xiv
List of Figures

W
1.1 Continuously capturing, recognizing, and interacting with physical
ink involves three major components: a) the stroke recognition frame-
work, b) the single-image stroke extraction module, and c) the video
IE
stream-based board change watcher. . . . . . . . . . . . . . . . . . . . 6

3.1 Stroke-based diagram recognition occurs in isolated board observers,


which register to listen for changes to the strokes on a board, or for
EV
changes to semantic annotations tagged on the same strokes. . . . . . . 29
3.2 Erasure semantics can be tedious to enumerate even for simple
sketch applications. When the user erases an X, the board could declare
the game invalid, or roll back the moves. If she erases part of the board,
PR

the possibilities are even more abstract. . . . . . . . . . . . . . . . . . 43


3.3 Semantics classes in terms of sequence and sets of strokes and dele-
tions. Each class is expresses some subset of the semantics of more
specific classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4 The transition system representing the semantics of class 1. Ap-
plication state is wholly dependent on the unordered set of strokes. . . 51
3.5 The transition system representing the semantics of class 7. A
state’s meaning is completely dependent on the ordered sequence of
stroke and delete operations. . . . . . . . . . . . . . . . . . . . . . . . . 54
3.6 The transition system representing the semantics of class 6. Appli-
cation state depends on both the ordering of the strokes and the ordering
of deletes, but not on the ordering between strokes and deletes. . . . . 56
3.7 SketchVis integrates real data with hand written selections (a: axis
labels) and sketch-based controls (b: axis arrows, bar stroke; c: tic
marks; d: legend area; e: transformation menu) . . . . . . . . . . . . . 68
3.8 SketchVis helps people explore a crime rates data set through sim-
ple, interactive sketches. . . . . . . . . . . . . . . . . . . . . . . . . . 75

xv
3.9 SketchVis helps people explore a crime rates data set through sim-
ple, interactive sketches. . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.10 SketchVis is a stand-alone sketch recognition application created
in our development framework and is composed of hierarchical markers
for recognition, as well as visualizers for UI rendering. . . . . . . . . . 80

4.1 Apps within SPARK use sketch-recognition techniques on strokes


extracted from photographs, enabling users to simulate Turing machine
diagrams drawn on surfaces such as whiteboards, paper, and even chalk-
boards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.2 The system architecture for our Turing machine app built within
the SPARK framework. Photos taken on the mobile phone (1) are pro-

W
cessed by the remote server, which isolates ink in the original image (2),
extracts stroke information from the ink (3), and performs recognition
on the generated strokes (4). The final semantic meaning of the diagram
IE
is sent back to the phone, where a user can simulate the Turing machine
(5). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3 The overall process of recognizing sketched ink from an image of a
whiteboard. The raw image (a) is processed to remove the background
EV
shadows and reflections (b), and is contrast boosted (c) before binariza-
tion (d). Strokes are thinned and traced (e) and then submitted for
basic glyph recognition (f), and finally assembled into a Turing machine
and displayed on the phone(g) . . . . . . . . . . . . . . . . . . . . . . . 98
PR

4.4 The long tails in the value histogram associated with the light
(4.4a) versus dark (4.4b) ink better distinguish surfaces than the median
value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.5 An illustration of the spurious edge artifacts from basic thinning.
The true stroke can be iteratively merged by assuming intersections are
in the stroke and iteratively connecting points not already covered by
the thickness of the current true stroke. . . . . . . . . . . . . . . . . . 103
4.6 An illustration of the split intersections due to thinning. If the re-
gions defined by stroke thickness surrounding two intersections overlap,
they are collapsed into one. . . . . . . . . . . . . . . . . . . . . . . . . 105
4.7 Accuracy of the classifier when training and classifying on traced
strokes as a factor of accuracy when using actual stroke data as captured,
with all features enabled. . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.1 Continuously capturing and recognizing from a video stream of the


whiteboard allows users to interact with their diagrams directly on the
whiteboard through drawing and erasing. . . . . . . . . . . . . . . . . . 122

xvi
5.2 A screenshot of the sketch-based equation graphing application.
Recognized mathematical expressions are plotted within the nearest
“chart area” while the text itself is underlined in the same color. Note
that the image values have been inverted, and virtual strokes are shown
for clarity; when in use, the background is projected black, while strokes
are not projected at all, such that only augmenting light is projected. . 124
5.3 Extending the single image capture methods to support continuous
capture involves three major steps: (a) separating out ink differences
within each live frame, (b) grouping and filtering those differences to
produce snapshots of whole-stroke update events, and (c) using stroke
extraction methods to generate new strokes and erasure events. . . . . 128
5.4 (a) The whiteboard is often occluded during the drawing process.

W
(b) In order to use the single-image stroke extraction framework, we
must filter foreground objects from the scene. . . . . . . . . . . . . . . 129
5.5 We leverage the thinness of whiteboard ink √
IE to isolate pixels that
have an ink update. Here, the user has drawn x, and brighter ar-
eas mean greater difference. The raw difference image (a) is eroded
to remove narrow components (b). The removed components are then
EV
isolated and binarized to find “ink-like” update areas (c). . . . . . . . . 131
5.6 Most non-ink differences can be ignored by creating a mask (5.6b)
of large “blob” differences that remain after smoothing the image (5.6a).
The unmasked differences that remain (5.6c) are then considered√“ink-
like” enough for further processing, in this case the newly drawn x. . 133
PR

5.7 Image processing must occur at the granularity of complete strokes,


so any ink-differences even connected to a large blob occlusion are filtered
from the board model update. . . . . . . . . . . . . . . . . . . . . . . . 135
5.8 (Continued) Image processing must occur at the granularity of
complete strokes, so any ink-differences even connected to a large blob
occlusion are filtered from the board model update. . . . . . . . . . . . 136
5.9 Light from the projector is filtered from the board update images by
ignoring pixels that are lighter than their surrounding neighborhoods.
The live image (5.9a) contains raw differences from the board model
(5.9b) that correspond to both newly projected light (e.g. x (cos x)), as
well
√ as areas where light used to be projected (e.g. the previous plot of
x). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

xvii
5.10 (Continued) The pixels that are lighter than their surroundings in
the previous model are masked out of the new ink image (5.10a) while
the lighter pixels in the live image are filtered from the erased ink model
image (5.10b). In this case, there are no updates to the ink, so no strokes
are sent for recognition or erasure. . . . . . . . . . . . . . . . . . . . . 140
5.11 The final architecture that makes up the SPARK system. A contin-
uous stream of images is captured and displayed by a stand-alone process
(a.), which feeds those images to a board change watcher process (b.)
that filters foreground objects. Images of board difference updates are
passed to the stroke extraction process (c.) for stroke separation and
tracing, before the strokes themselves are fed to the sketch recognition
framework (d.), which performs both recognition as well as graphical

W
annotation (e.) through the projector. Data is passed using Python
synchronized Queues both between processes and between threads. . . 143
IE
EV
PR

xviii
List of Tables

W
4.1 Classification features determined to be timing-dependent. These
features were disabled when generating results for R− and Rt− . . . . . 109
IE
EV
PR

xix
Chapter 1

Introduction

W
IE
Since people could draw with a stick in the dirt, maps, symbols, and other dia-
EV

grams have aided people’s thinking by leveraging our innate perceptual abilities for

problem solving. Today, diagrams permeate many aspects of our lives, with visual
PR

languages actively used for everything from biological systems’ inhibitor/promoter

analysis to seating arrangements at a wedding reception. No matter the discipline,

drawing diagrams still serves its original function of making abstract concepts

more concrete, and novices and professionals alike still make heavy use of avail-

able drawing space: paper, whiteboards, and chalkboards. However, even though

hand-drawn diagrams help people to reason about complex tasks, at a certain

point even graphical analysis can turn into an exercise of tedious bookkeeping,

and one is better served by transitioning the work to a computer-based tool.

1
Chapter 1. Introduction

Though computer-based tools can aid users in scaling their analyses well be-

yond a person’s manual abilities, the user interfaces often stand in the way, obfus-

cating common tasks like design specification through their reliance on toolbars,

menus, buttons, etc. One example, the window-menu-pointer interface of Chem-

Draw1 requires a great deal of training and experience with just the interface before

a user can effectively use the system[47]. This problem of opaque interfaces arises

W
in many disciplines, and the time commitment is required even for experts in a

field, as offline experience often does not necessarily translate to familiarity with
IE
an application’s interaction methods.
EV
Sketch recognition interfaces have sought to address these common UI short-

comings by providing a freehand sketching-based input metaphor that mimics the

traditional low-tech diagramming methods with which people are already famil-
PR

iar. Systems in many various domains, from circuit design, to chemical analysis,

have eased the input process for state-of-the-art analysis tools by the leveraging

visual diagramming experience that comes from manually working in particular

domains. In these systems, computation occurs on the structures recognized from

hand-drawn diagrams, providing for direct, natural human input to the tool.

Despite the impressiveness of sketch-based systems, they all currently rely on

the capabilities of modern pen capture hardware to digitize the user’s drawing
1
PerkinElmer Inc., http://www.cambridgesoft.com/Ensemble for Chemistry/ChemDraw/

2
Chapter 1. Introduction

actions. Desktop capture surfaces are rarely seen outside of art studios, design

houses, or sketch recognition labs, and capturing digital strokes at whiteboard-

scale requires significant investment in large, wall-mounted combination display

+ capture surface hardware, or else requires carrying and calibrating a field-

deployable solution. Though the prices of tablets and pen digitizing surfaces

decreases with time like any technology, the devices needed to capture strokes

W
drawn by users are still bulky, relatively expensive, and ultimately relegated to a

niche market. IE
The end result of reliance on specialized hardware is that, though the sketch
EV
recognition applications can best serve students and novice designers or otherwise

untrained users, these people are the least likely to have the technology needed to

use sketch recognition software. Instead, the vast majority of users are doing their
PR

designs and analysis initially on simple, yet unaugmented, surfaces like traditional

whiteboards, chalkboards, and paper, before investing the time to transition a

more polished version to a computer. If these people are to benefit from sketch-

based systems, advances in recognition must be implementable on commodity

hardware that people largely already possess.

This dissertation discusses work we have completed in utilizing smartphones,

webcams, and common projectors to bring sketch recognition interfaces to chalk-

boards, whiteboards, and paper—surfaces where people are already drawing. By

3
Chapter 1. Introduction

employing computer vision with these pieces of common consumer devices, we en-

able people to directly interact with their diagrams through drawing and erasing

physical ink combined with virtual augmentations, either projected directly onto

the drawing surface or on a phone’s touch-sensitive screen.

W
IE
EV
PR

You might also like