You are on page 1of 29

Projected Interactive Display

for Public Spaces

--------------------

A Thesis Proposal
Presented to the Faculty of the
Department of Electronics and Communications Engineering
College of Engineering, De La Salle University

--------------------

In Partial Fulfillment of
The Requirements for the Degree of
Bachelor of Science in Electronics and Communications Engineering

--------------------

by

Arcellana, Anthony A.
Ching, Warren S.
Guevara, Ram Christopher M.
Santos, Marvin S.
So, Jonathan N.

June 2006
1. Introduction

1.1.Background of the Study

Human-computer interaction (HCI) is the study of the interaction between the

users and the computers. The basic goal of HCI is to improve the interaction between

users and computers by making the computers more user-friendly and accessible to

users. HCI in the large is an interdisciplinary area. It is emerging as a specialty

concern within several disciplines, each with different emphases: computer science,

psychology, sociology and industrial design (Hewett et. al., 1996). The ultimate goal

of HCI is to design systems that would minimize the barrier between the human’s

cognitive model of what they want to accomplish and the computer’s understanding

of the user’s task.

The thesis applies a new way to interact with sources of information using an

interactive projected display. For a long time the ubiquitous mouse and keyboard has

been used to control a graphical display. With the advent of increased processing

power and technology, there has been great interest from the academic and

commercial sector in developing new and innovative human computer interfaces in

the past decades. (Myers et. al., 1996). Recently advances and research in human

computer interaction (HCI) has paved the way for techniques such as vision, sound,

speech recognition, and context-aware devices that allow for a much richer,

multimodal interaction between man and machine. (Turk, 1998; Porta, 2002). This

type of recent research moves away from traditional input devices which are

essentially blind into the so called Perceptual User Interfaces (PUI). PUI are

interfaces that emulate the natural capabilities of humans to sense, perceive, and
reason. It models human-computer interaction after human-human interaction. Some

of the advantages of PUIs are as follows: (1) it reduces the dependence on being in

proximity that is required by keyboards and mouse systems, (2) it makes use of

communication techniques found natural in humans, making the interface easy to

use,(3) it allows interfaces to be built for a wider range of users and tasks, (4) it

creates interfaces that are user-centered and not device centered, and (5) it has design

emphasis on being a transparent and unobtrusive interface. (Turk, 1998).

What is interesting in this line of research is the development of natural and

intuitive interface methods that make use of body language. A subset of PUI is Vision

Based Interfaces (VBI) which focuses on the visual awareness of computers to the

people using them. Here computer vision algorithms are used to locate and identify

individuals, track human body motions, model the head, and face, track facial

features, interpret human motion and actions. (Porta, 2002) A certain class of this

research falls under bare hand human-computer interaction which this study is about.

Bare hand human interaction uses as a basis of input, the actions and gestures of the

human hands alone without the use of attached devices.

1.2.Statement of the Problem

Information-rich interactive viewing modules are usually implemented as

computer based kiosks. However placing computer peripherals such as touch-screens

and mouse and keyboard controlled computers in a public area would require

significant space and have maintenance concerns on the physical hardware being used

by the common public. Using a projected display and a camera based input device,
would eliminate the hardware problems associated with the space and maintenance. It

also attracts people since projected displays are new and novel.

1.3.Objectives

1.3.1.General Objectives

The general objective of the thesis is to create an interactive projected

display system using a projector and a camera. The projector would display the

interactive content and the user would use his hand to select objects in the

projected display. Computer vision is used to detect and track the hand and

generate the proper response.

1.3.2.Specific Objectives

1.3.2.1. To use a DLP or LCD projector for the display

1.3.2.2. To use a PC camera as the basis of user input

1.3.2.3. To use a PC to implement algorithms to detect hand action as seen

from the camera

1.3.2.4. To use a PC to host the information-rich content

1.3.2.5. To create an interactive DLSU campus map as a demo application

1.4.Scope and Delimitation

1.4.1.Scope of the Study

1.4.1.1. The proponents will create a real time interactive projected display

using a projector and camera.

1.4.1.2. The proponents will use development tools for image/video

processing and computer vision to program the PC. Algorithms for

hand detection and tracking will be implemented using these tools.


1.4.1.3. A demo application of the system will be implemented as an

interactive campus map of the school.

1.4.1.4. Only the posture of a pointing hand will be recognize as an input.

Other visual cues to the camera will not be recognized.

1.4.2.Delimitation of the Study

1.4.2.1. The display will be projected in a clean white wall.

1.4.2.2. The projector and the camera set-up will be fixed in such a way that

blocking the projector is not a problem.

1.4.2.3. Trapezoidal distortion which results from projecting from an angle

will be manually compensated if present.

1.4.2.4. Lighting conditions will be controlled to not overpower the projector.

1.4.2.5. The system will be designed to handle only a single user. In the

presence of multiple users, the system would respond to the first user

triggering an event.

1.5.Significance of the Study

The study applies a new way of presenting information using projected

displays and allows the user to interact with it. A projected display conserves space as

the system is ceiling mounted and there is no hardware that directly involves the user.

Using only the hands of the user as an input, the system is intuitive and natural-- key

criteria for effective interfaces. It presents an alternative to computer based modules

where space can be a problem.


Currently there is a high cost of acquiring and maintaining a projector. But it

is still viable when maintaining an information center is deemed to be important. The

system can be comparable to large screen displays that are used in malls and such.

Since the system is also a novel way of presenting information. It can be used to

make interactive advertisements that are very attracting to consumers. The display

can transform from an inviting advertisement into detailed product information. With

this said, the cost of the operation of the projector can possibly be justified with the

revenue generated from effective advertising.

The study is an endeavor towards the development of natural interfaces. The

use of a projector and camera provides a means of producing an augmented reality

that is natural-- requiring no special goggles or gloves that the user has to wear. In

public spaces where information is very valuable, a system that can provide an added

dimension to reality is very advantageous and the use of nothing but the hands means

the user can instantly tap the content of the projected interface. Computer vision

provides the implementation of a perceptual user interface and the projection provide

the means of creating an augmented reality. Further developments in these areas

means the presence of computers can be present in everyday life without being

perceived as such. With PUI there is no need for physical interface hardware, only the

use of natural interaction skills present in every human is needed.

1.6.Description of the Project

The system is comprised of 3 main components; (1) the PC which houses the

information and control content, (2) the projector which displays the information, and

(3) the PC camera which is the input of the system. Development of the study would
be heavily invested in the software programming of the PC. The functions of the PC

would be the following: detection of the position and action of the hands of the user

relative to the screen, generating a response from a specific action, hosting the

information rich content. Techniques of image/video processing and machine vision

will be used to facilitate the first two functions of the PC.

As a demo application an interactive map of the school is used. The projector

will project the campus directory of De La Salle University Manila. The camera will

capture the images needed and will upload to the computer. The user will then pick on

which building he/she would like to explore using his/her hand as the pointing tool.

Once the user has chosen a building, a menu will appear that will give information

about the building. Information includes brief history, floor plans, facilities, faculties,

etc. Once the user is finished exploring the building, he/she can touch the back button

to select another building in the campus. The cycle will just continue until the user is

satisfied.

1.7.Methodology

Development of the study would be heavily invested in the software

programming of the PC. We the researchers must spend time in acquiring skills in

programming for the implementation of the research. Research about video capture

and processing is greatly needed for the operation of the system. Quick

familiarization and efficiency with the libraries and tools for computer vision is

necessary for timely progress in the study.


The proponents of the research must first obtain the hardware that is needed

for the achievement of the study, materials such as the camera that will capture the

input, the projector which will give the output display (projected display) and the PC

which the system will be based. The appropriate specifications of the camera and the

projector will be carefully looked at to get the precise requirements. After which the

system has its working prototype, testing and making the necessary adjustments will

be needed upon detecting and fixing problems in the system.

Seeking advice from different people may be necessary for speedy progress of

the study. Advice in programming will be very helpful, since the implementation is

PC based. Additionally advice from the panel, adviser, and other people about the

interface will be helpful in removing biases the proponents may have in the system.

1.8.Gantt Chart

1.9.Estimated Cost

Projector P 50,000

PC Camera P 1500-2000
Open Source SDK Free

Development / Prototype Computer Available

Miscellaneous P5000

Estimated budget P 57000


2. Review of Related Literature

2.1.PC Camera

PC Camera, popularly known as web camera or webcam, is a real time camera

widely used for video conferencing via the Internet. Acquired images from this device

were uploaded in a web server hence making it accessible using the world wide web,

instant messaging, or a PC video calling application. Over the years, several

applications were developed including in the field of astrophotography, traffic

monitoring, and weather monitoring. Web cameras typically includes a lens, an image

sensor, and some support electronics. Image sensors can be a CMOS or CCD, the

former being the dominant for low-cost cameras. Typically, consumer webcams offers

a resolution in the VGA region having a rate of around 25 frames per second. Various

lens were also available, the most being a plastic lens that can be screwed in and out

to manually control the camera focus. Support electronics is present to read the image

from the sensor and transmit it to the host computer (Wikipedia).

2.2.Projectors

Projectors are classified into two technologies, DLP (Digital Light Processing)

and LCD (Liquid Crystal Display). This refers to the internal mechanisms that the

projector uses to compose the image (Projectorpoint).

2.2.1.DLP

DLP technology uses an optical semiconductor known as the Digital

Micromirror Device, or DMD chip to recreate the source material. Below is an

illustration of how it works (Projectorpoint).


2.2.1.1. Advantages of DLP projectors

There are advantages of DLP projectors over the LCD projectors.

First, there is less ‘chicken wire’ or ‘screen door’ effect on DLP because pixels

in DLP are much closer together. Another advantage is that it has higher

contrast compared to LCD. DLP projectors are much portable for it only

requires fewer components and finally, claims had shown that DLP projectors

last longer than LCD (Projectorpoint).

2.2.1.2. Disadvantages of DLP projectors

Certainly, DLP projectors also have disadvantages to consider. It has

less color saturation. The ‘rainbow effect’ is appearing when looking from one

side of the screen to the other, or when looking away from the projected image

to an off-screen object and sometimes ‘Halo effect’ appears (Projectorpoint).

2.2.2.LCD

LCD projectors contain three separate LCD glass panels, one for red,

green, and blue components of the image signal being transferred to the projector.

As the light passes through the LCD panels, individual pixels can be opened to

allow light to pass or closed to block the light. This activity modulates the light

and produces the image that is projected onto the screen (Projectorpoint).

2.2.2.1. Advantages of LCD projectors

Advantages of LCD projectors over the DLP projectors include: It is

more ‘light efficient’ than DLP. It produces more saturated colors making it

seem brighter than a DLP projector. It produces sharper image

(Projectorpoint).
2.2.2.2. Disadvantages of LCD projectors

Disadvantages of LCD projectors over DLP projectors are: It produces

‘chicken wire’ effect causing the image to look more pixellated. LCD

projectors are more bulky because there are more internal components. Dead

pixels, which are pixels that are permanently on or permanently off, appear

which can be irritating to see. LCD panels can fail, and are very expensive to

replace (Projectorpoint).

2.3.Similar Researches

2.3.1.Bare-Hand Human –Computer Interface

Human-computer interaction describes the interaction between the user

and the machine. Devices such as keyboard, mouse, joystick, electronic pens and

remote controls were commonly used as the means for human-computer

interaction. Real-time barehanded interaction is the controlling of computer

system without any device or wires attached to the user. The position of the

fingers and the hand is to be used to control the applications (Hardenburg, 2001).

2.3.1.1. Applications

Bare-hand computer interaction is more practical than traditional input

devices. A good example is during a presentation, the presenter may use hand

gestures for selecting slides therefore minimizing the delay or pauses caused

by moving back and fourth to the computer system to click for the slide.

Perceptual interface allows systems to be integrated in small areas and allows

users to operate at a certain distance. Direct manipulation of virtual objects

using fingers is made possible with this system. Also, with this system,
indestructible interface could be built by mounting the projector and camera

high enough for the user not to access or touch it. With these, the system will

be less prone to damage caused by the users (Hardenburg, 2001).

2.3.1.2. Functional Requirements

Functional requirement includes the services for a vision-based

computer interaction system. The three essential services needed in the

implementation of the aforementioned system are detection, identification and

tracking. Detection determines the presence and position of the objects

acquired. The output of detection could be used for controlling applications.

Identification service recognizes if the object present in the scene is within the

given class of objects. Some of the identification tasks were the identification

of certain hand posture and number of fingers visible. Tracking service is

required to be able to tell which object moved between two frames since the

identified objects will not rest in the same position over the time (Hardenburg,

2001).

2.3.1.3. Non-Functional Requirements

Non-functional requirements describe the minimum quality expected

from a service. The qualities to be monitored and maintained are latency,

resolution and stability. Latency is defined as the lag between the user’s action

and the response of the system. Eventually there is no system without latency

therefore the acceptable latency of the system is of given importance since the

application requires real-time interaction. Minimum input resolution is

important in the detection and identification processes. It is difficult to


identify fingers with a resolution width below six pixels. Tracking service is

said to be stable as long as the tracking object does not move and as long as

the measured position does not change (Hardenburg, 2001).

2.3.2.Dynamically Reconfigurable Vision-Based User Interfaces

Vision-based user interfaces (VB-UI) are an emerging area of user interface

technology where a user’s intentional gestures are detected via camera, interpreted

and used to control an application. The paper describes a system where the

application sends the vision system a description of the user interface as a

configuration of widgets. Based on this, the vision system assembles a set of

image processing components that implement the interface, sharing computational

resources when possible. The parameters of the surfaces where the interface can

be realized are defined and stored independently of any particular interface. These

include the size, location and perspective distortion within the image and

characteristics of the physical environment around that surface, such as the user’s

likely position while interacting with it.

The framework presented in this paper should be seen as a way that vision

based applications can easily adapt to different environments. Moreover, the

proposed vision-system architecture is very appropriate for the increasingly

common situations where the interface surface is not static (Kjeldsen, 2003.).

2.3.2.1. Basic Elements

A VB-UI is composed of configurations, widgets, and surfaces.

Configurations are a set of individual interaction dialogs. It specifies a

boundary area that defines the configuration coordinate system. The boundary
is used during the process of mapping a configuration onto a particular

surface. Each configuration is a collection of interactive widgets. A widget

provides an elemental user interaction, such as detecting a touch or tracking a

fingertip. It generates events back to the controlling application where they are

mapped to control actions such as triggering an event or establishing a value

of a parameter. A surface is essentially the camera’s view of a plane in 3D

space. It is able to define the spatial layout of widgets with respect to each

other and the world but it should not be concerned with the details of the

recognition process (Kjeldsen, 2003).

2.3.2.2. Architecture

In this system, each widget is represented internally as a tree of

components. Each component performs one step in the widget’s operation.

There are components for finding the moving pixels in an image (Motion

Detection), finding and tracking fingertips in the motion data (Fingertip

Tracking), looking for touch-like motions in the fingertip paths (Touch Motion

Detection), generating the touch event for the application (Event Generation),

storing the region of application space where this widget resides (Image

Region Definition), and managing the transformation between application

space and the image (Surface Transformation) (Kjeldsen, 2003).

The figure below shows the component tree of a “touch button” and the “tracking area.”
2.3.2.3. Example Applications

One experimental application developed that used the dynamically

reconfigurable vision system is the Everywhere Display Projector (ED). This

provides information access in retail spaces. The Product Finder Application

is another example. Its goal is to allow customer to look up products in a store

directory, and then guide him/her to where the product is (Kjeldsen, 2003.).

2.3.3.Computer Vision-Based Gesture Recognition for an Augmented Reality

Interface

Current researchers are discerning the realization of taking out computers

in other places than in our desktops while eyeing everywhere computation as one

of their objectives. The idea of wearable computers to enhance human visual

sensors by augmenting image generated information on a visual input is one of


these issues. One of the main proponents of the research is Gesture-Recognition

as the input such as pointing and clicking of a finger. It has been classified that

gesture recognition has two steps: 1.) capturing the motion of the user input and

2.) Classify the gesture to its predefined gesture classes. Capturing is either

performed by glove–based or optical-based system. Optical-based gesture

recognition comprise of model-based and appearance-based category. In a model-

based system a geometric model of the hand is created where it is matched to the

image data to define the state of the hand. While in appearance-based system the

recognition is based on a pixel representation learned from training images.

Because both approaches require a lot of computational complexity which is not

desirable for Augmented Reality (AR) systems it requires enhancements like

markers and infrared lightings. Gesture recognition will be introduced and the

main topic in this paper in order to make useful interface, as well as having a low

computational complexity. Outline of the paper will be done to show how the

research is implemented (Moeslund T., 2004).

2.3.3.1. Defining the Gestures

Two primary gestures are introduced, pointing and clicking gesture of

the hand. Consideration of minimum requirements to control the application is

done also it include other easy to remember gestures that will help in short-cut

commands to be able to avoid numerous pop-up menus (Moeslund T., 2004).


2.3.3.2. Segmentation

Task of the segmentation will be for the recognition and detection of

the placeholder objects and pointers where the visual output of the system will

be projected as well as hands in the 2d image captured. In order to achieve

invariance to changing size and form of objects to be detected the research

used colour pixel-based approach to segment spots of similar colour image.

Problems like lighting settings, changing illumination and skin colour

detection is discussed and was given solutions to (Moeslund T., 2004).

2.3.3.3. Gesture Recognition

A basic approach is done to solve this problem, by counting the

number of fingers. Hand and fingers can be approximated by a circle and a

number of rectangles, where it equates to the number of the finger that is

projected. Polar transformation around the centre of the hand and count the

number of fingers (rectangles) present in each radius. The algorithm does not

contain any information regarding the relative distances between two fingers,

because it makes the system more general, and secondly because different

users tend to have different preferences in the shape and size of their hands

(Moeslund T., 2004).


2.3.3.4. System Performance

Gesture-recognition has been implemented as part of the computer

vision system of a computer vision system of an AR multi-user application.

The low level segmentation (section 3) can robustly segment 7 different

colours from the background (skin colour and 6 colours for PHO and

pointers), given there are no big changes in the illumination colour (Moeslund

T., 2004).

Segmentation Result

2.3.4.A Design Tool for Camera-Based Interaction

Constructing a camera-based interface can be difficult for most

programmers and would require a better understanding of machine algorithms that

are involve. Basically a camera-based interface is that a camera will serve as the

sensor/eyes of the system regarding with your input. The goal is to make the

system interactive while not wearing any other special devices to detect the input

rather than having other traditional inputs like keyboard etc. This makes

computing set in the environment rather than in our desktops. Problem lies in the

designing of a camera-based system, the programming and the mathematics part


is complicated that ordinary programmers do not have the skill for it especially

when we are considering bare-hand inputs. The main item to be considered in a

camera-based interaction is a classifier that takes an image and identifies pixels

that is considered. Acquiring skills in building a classifier is greatly needed to

pursue the idea (Fails, J.A., 2003).

Crayons is one of the tools to make a classifier which can be exported in a

form that can be read by java. Crayons help User Interface (UI) designers to make

the camera-based interface even without detailed knowledge on image processing.

But its features are unable to distinguish shapes and object orientation but do well

in object-detection and hand and object tracking (Fails, J.A., 2003).

Classifier Design Process

The function of the Crayons is to create a classifier with ease. Crayons

receive images and then after the user gives its input a classifier is created then a

feedback is displayed (Fails, J.A., 2003).

2.3.4.1. User Interfaces

There are four pieces of information that a designer must consider and

operate in designing a classifier interface which are: 1.) set of classes to be

recognized, 2.) Set of training images to be used, 3.) classification of pixels as

defined by the programmer and 4.) the classifier’s current classification of the

pixels (Fails, J.A., 2003).

2.3.4.2. Crayons Classifier


Automating the classifier creation is the main function of the crayon

tool. It is required to extract features and generate classifiers as quickly as

possible. Current Crayons prototype has about 175 features per pixel (Fails,

J.A., 2003).

Lastly to accomplish the application a machine learning algorithm that

can handle a large number of examples with a large number of features is

required (Fails, J.A., 2003).

2.3.5.Using Marking Menus to Develop Command Sets for Computer Vision

Based Hand Gesture Interfaces

The use of hand gestures for interaction, in an approach based on

computer vision. The purpose is to study if marking menus, with practice, could

support the development of autonomous command sets for gestural interaction.

Some early problems are reported, mainly concerning with user fatigue and

precision of gestures (Lenman, S., 2002).

Remote control of electronic appliances in a home environment, such as

TV sets and DVD players, has been chosen as a starting point. Normally it

requires the use of a number of devices, and there are clear benefits to an

appliance-free approach. They only implemented a first prototype for exploring

pie- and marking menus for gesture-based interaction (Lenman, S., 2002).

2.3.5.1. Perceptive and Multimodal User Interfaces

Perceptive User Interfaces (PUI) strives for automatic recognition of

natural, human gestures integrated with other human expressions, such as

body movements, gaze, facial expression, and speech. The second approach to
gestural interfaces will be the Multimodal User Interfaces (MUI), where hand

poses and specific gestures are used as commands in a command language. In

this approach, gestures are either a replacement for other interaction tools,

such as remote controls, mouse, or other interaction devices. The gestures

need not be natural gestures but could be developed for the situation, or based

on a standard sign language.

There is a growing interest in designing multimodal interfaces that

incorporate vision-based technologies. It contrasts the passive mode of PUI

with the active input mode addressed here. It claims that although passive

modes may be less obtrusive, active modes generally are more reliable

indicators of user intent, and not as prone to error.

The design space for such commands can be characterized along three

dimensions: Cognitive aspects, Articulatory aspects, and Technological

aspects.

Cognitive aspects refer to how easy commands are to learn and to remember.

It is often claimed that gestural command sets should be natural and intuitive,

meaning that they should inherently make sense to the user.

Articulatory aspects refer to how easy gestures are to perform, and how tiring

they are for the user. Gestures involving complicated hand or finger poses

should be avoided, because they are difficult to articulate.

Technological aspects refer to the fact that in order to be appropriate for

practical use, and not only in visionary scenarios and controlled laboratory
situations, a command set for gestural interaction based on computer vision

must take into account the state-of-the art of technology (Lenman, S., 2002).

2.3.5.2. Current Work

The point of departure for the current work is cognitive, leaving

articulatory aspects aside at the moment. A command language based on a

menu structure has the cognitive advantage that the commands can be

recognized rather than recalled. Traditional menu based interaction is not

attractive in a gesture-based scenario. Pie- and marking menus might provide

a foundation for developing directness and autonomous gestural command

sets (Lenman, S., 2002).

Pie menus are pop-up menus with the alternatives arranged radially.

Because the gesture to select an item is directional, users can learn to make

selections without looking at the menu. The direction of the gesture is

sufficient to recognize the selection. If the user hesitates at some point in the

interaction, the underlying menus can be popped up, always giving the

opportunity to get feedback about the current selection.

Hierarchic marking menus are a development of pie menus that allow

more complex choices by the use of sub-menus. The shape of the gesture

(mark) with its movements and turns can be recognized as a selection, instead

of the sequence of distinct choices between alternatives.

The gestures in the command set would consist of a start pose, a

trajectory defined by menu organization for each possible selection and, lastly
a selection pose. Gestures ending in any other way than with the selection

pose would be discarded (Lenman, S., 2002).

2.3.5.3. A Prototype for Hand Gesture Interaction

Here remote control appliances in a domestic environment were

chosen as the first application. So far, the only designed hierarchic menu

system is for controlling some functions of a TV, a CD player, and a lamp

(Lenman, S., 2002).

The hand was chosen as a view-based representation which includes

both color and shape cues. The system tracks and recognizes the hand poses

based on a combination of multi-scale color feature detection, view-based

hierarchical hand models and particle filtering. The hand poses are represented

in terms of hierarchies of color image features at different scales, with

qualitative interrelations in terms of scale, position and orientation. These

hierarchical models capture the coarse shape of the hand poses. In each image,

detection of multi-scale color features is performed.

The particle filtering allows for the evaluation of multiple hypotheses

about the hand position, state, orientation and scale, and a possibility measure

determines what hypothesis to choose. To improve the performance of the

system, a prior on skin color is included in the particle filtering step. In fig. 1,

yellow (white) ellipses show detected multi-scale features in a complex scene

and the correctly detected and recognized hand pose is superimposed in red

(gray).
Detected multi-scale features and the recognized hand pose superimposed in

an image of a complex scene

There is a large number of works on real-time hand pose recognition in

the computer vision literature. Some of the most related in this approach is by

using normalized correlation of template images of hands for hand pose

recognition. Though efficient, this technique can be expected to be more

sensitive to different users, deformations of the pose and changes in view,

scale, and background.

However, the performance was far from real-time. The approach

closest was representing the poses as elastic graphs with local jets of Gabor

filters computed at each vertex. In order to maximize speed and accuracy in

the prototype, gesture recognition is currently tuned to work against a uniform

background within a limited area, approximately 0.5 by 0,65m in size, at a

distance of approximately 3m from the camera, and under relatively fixed

lighting conditions (Lenman, S., 2002).


The demo space at CID

2.4.Similar Product

An Interactive Whiteboard (IW) is a projector-screen, except that the screen is

either touch sensitive or can respond to a special ‘pen.’ This means that the projector-

screen can be used to interact with the projected user image. This provides a more

intuitive way to interact rather than using input devices such as the mouse/keyboard

for navigation of the computer screen being projected. There are two basic functions

of an IW, writing on the board and acting as a mouse. All common IWs have

character-recognition and can convert scrawls into text-boxes.

There are two market leaders in IWs. They are Promethean ActivBoard and

SmartBoard. Promethean has its own presentation system, web browser, and its own

file system. SmartBoard uses the computer’s native browser. Promethean uses stylus

pen to interact with the board while the SmartBoard are touched to operate. The

reason to prefer one to the other will depend on its applications.

There are some issues regarding IWs. One of which is that it requires a

computer with an IW software installed. The need for a software makes it awkward to

use an IW with individual laptops. Another issue is that all IWs used were “front-lit”
meaning that the user’s shadow will be thrown across the screen. Backlit IWs

currently are very expensive. Lastly, although IWs have both character-recognition

and an onscreen keyboard, it is not a good technology for typing. The user can easily

go back to the computer keyboard when he/she needs to do a lot of typing. (Stowell,

2003)

2.5.Computer Vision and Image Processing Development Tools

2.5.1.Open CV

OpenCV which stands for Open Computer Vision is an open source library

developed by Intel. This library is cross-platform which runs both on Windows

and Linux and mainly focuses on real-time image processing. This library is

intended for use, incorporation and modification by researchers, commercial

software developers, government and camera vendors as reflected in the license

(Open Source Computer Vision Library).

2.5.2.Microsoft Vision SDK

Microsoft Vision SDK is a library for writing programs to perform

manipulation and analysis on computers running on Microsoft Windows operating

systems. The library was developed to support researchers and developers of

advanced applications, including real-time image processing applications.

Microsoft Vision SDK is a C++ library of object definitions, related software, and

documentation for use with Microsoft Visual C++. It is a low-level binary,

intended to provide a strong programming foundation for research and application

development. It includes classes and functions for working with images but it

does not include image processing predefined functions (Intel, n.d.).


References:

DLP and LCD Projector Technology Explained. (n.d.). Retrieved June 2, 2006, from
http://www.projectorpoint.co.uk/projectorLCDvsDLP.htm.

Fails, J.A., Olsen, D. (2003). A Design Tool for Camera-Based Interaction. Bringham
University, Utah. Retrieved from
http://icie.cs.byu.edu/Papers/CameraBaseInteraction.pdf

Hardenberg, C., Bérard, F., (2001). Bare-hand human-computer interaction. Orlando, FL


USA. Retrieved from

Hewett, et. al. (1996) Chapter 2: Human Computer Interaction. ACM SIGCHI Curricila
for Human Computer Interaction. Available:
http://sigchi.org/cdg/cdg2.html#2_3 retrieved June 2, 2006.

Intel, (n.d.). Open source computer vision library. Retrieved June 4, 2006 from
http://www.intel.com/technology/computing/opencv/index.htm.

Kjeldsen, R., Levas, A., & Pinhanez, C. (2003). Dynamically Reconfigurable Vision-
Based User Interface. Retrieved from
http://www.research.ibm.com/ed/publications/icvs03.pdf

Lenman, S., Bretzner, L., Thuresson B., (2002, October). Using Marking Menus to
Develop Command Sets for Computer Vision Based Hand Gesture Interfaces.
Retrieved from http://delivery.acm.org/10.1145/580000/572055/p239-
lenman.pdf?key1=572055&key2=1405429411&coll=GUIDE&dl=ACM&CFID=
77345099&CFTOKEN=54215790

Moeslund T., Liu Y., Storring M., (2004, September). Computer Vision-Based Gesture
Recognition for an Augmented Reality Interface. Marbella, Spain. Retrieved from
http://www.cs.sfu.ca/~mori/courses/cmpt882/papers/augreality.pdf

Myers B., et. al. (1996) Strategic Directions in Human-Computer Interaction. ACM
Computing Surveys Vol.28 No.4

Porta, M. (2002) Vision-based user interfaces:methods and applications. International


Journal of Human Computer Studies. Elsevier Science

Stowell, D. (May, 2003). Interactive Witeboard. Retrieved June 1, 2006 from


http://www.ucl.ac.uk/is/fiso/lifesciences/whiteboard.

The Microsoft Vision SDK. (2000, May). Retrieved June 4, 2006 from
http://robotics.dem.uc.pt/norberto/nicola/visSdk.pdf
Turk, M. (1998). Moving from GUIs to PUIs. Symposium on Intelligent Information
Media. Microsoft Research Technical Report MSR-TR-98-69

Webcam. (n.d.). Wikipedia. Retrieved June 03, 2006, from Answers.com Web site:
http://www.answers.com/topic/web-cam.

You might also like