Functionality Requirements for the Successful use of 3D Environments

Key Learnings from Pilot Projects Utilizing Immersive 3D Platforms and Technology for Applications Involving Synchronous Training and Collaboration

May 20, 2010

Table of Contents
Abstract .....................................................................................................................................................................3 Introduction ................................................................................................................................................................4 Sound, Volume and VoIP Control ..............................................................................................................................4 Individual Microphone Mute and Volume Control ..................................................................................................4 Open Microphones and Echo Cancellation ...........................................................................................................5 Positional Sound ...................................................................................................................................................6 Multi-Channel VoIP and Proximity Zones ..............................................................................................................6 Dial-in Capabilities .................................................................................................................................................7 Individual Granular Control of Attendees ...................................................................................................................8 Content Control .....................................................................................................................................................8 Mobility Control......................................................................................................................................................8 Speaking Control ...................................................................................................................................................9 Appearance and Animation Control .......................................................................................................................9 Presence Control...................................................................................................................................................9 Moderator Control .................................................................................................................................................9 Access: Firewalls and Proxy Servers .........................................................................................................................9 Behind the Firewall Installations .......................................................................................................................... 10 Flash and Java - based 3D Applications ......................................................................................................... 10 Independent Firewall Friendly Solutions .............................................................................................................. 10 Content Integration .................................................................................................................................................. 11 Screen Sharing.................................................................................................................................................... 11 Host Controlled Distributed Content .................................................................................................................... 11 Browser Content.................................................................................................................................................. 13 Fidelity and Realism ................................................................................................................................................ 13 Avatar Face Fidelity ............................................................................................................................................. 13 Realism ............................................................................................................................................................... 15 Focus and Eye Contact ....................................................................................................................................... 15 Self Expression and Non-Verbal Communication................................................................................................ 16 Ease of Use ............................................................................................................................................................. 16 Interacting with Objects ....................................................................................................................................... 16 Navigation ........................................................................................................................................................... 17 Gesturing ............................................................................................................................................................. 18 Set and Forget..................................................................................................................................................... 18 Viewing Content .................................................................................................................................................. 19 Conclusion ............................................................................................................................................................... 20 About VenueGen ..................................................................................................................................................... 20 Appendix: Capability Checklist................................................................................................................................. 21
® ®

2

Abstract
The purpose of this paper is to assist those who are using or considering the use of 3D virtual environments. It is a guide to identifying and prioritizing a process for selecting and evaluating 3D technology, functionality and virtual platforms. It is written to assist both technical and nontechnical individuals who may be tasked with teaching, informing, collaborating, supporting or persuading a geographically dispersed audience and who are considering the use of various technologies and platforms to create immersive 3D virtual environments as their modality. It is a practical guide based on the first hand experiences of those who piloted various virtual 3D technologies for these applications. These findings are relevant to almost all 3D applications for business and learning with the exception of custom built simulators. It is formatted as a summary of technical requirements, best practices, pitfalls and techniques that can be utilized to create virtual experiences. Questions this paper answers 1. What are the most common technical problems and obstacles reported by those utilizing 3D virtual environments technology today? 2. How are 3D platform vendors innovating today to address the issues reported by those who pioneered the use of their technologies? 3. What technical capabilities are considered most essential for conducting consistently successful virtual meetings? 4. Why is granular individual VoIP control a requirement for all immersive meetings? 5. How important is graphical fidelity and realism to creating immersive 3D environments? 6. What are the benefits and drawbacks of the various technical approaches vendors use to integrate content into 3D virtual environments? 7. What are the most common constraints that firewalls and proxy servers place on the use of 3D virtual technology and what are the various approaches platform vendors use to address these challenges? 8. What are the most common interface design mistakes that can cause 3D environments to be confusing and difficult to learn and use? 9. Which technological features are most important for driving adoption of 3D technology within most organizations?

3

Introduction
3D environments can be incredibly engaging, interactive and cost effective when supported by the right technology and functionality. For several years now, trainers, business professionals and educators have been experimenting with 3D technologies and techniques. Their pilot projects generally were targeted at one or more of the following goals: o o o o Achieve better distance learning outcomes Create more engaging personal and cost-effective virtual meetings Increase productivity and creativity through online collaboration Extend the reach and accessibility of real-world meetings and events

Over the last three years, these initiatives reported mixed results. Early adopters of the 3D modality endured many obstacles including immature software and limited technical functionality to gain early firsthand experience with what many believed could be a major advancement in how we meet and learn. These pioneers were excited about the potential of a more engaging and personal online experience. They were emboldened by numerous studies and surveys that indicated how almost all of the important metrics from participation to retention seemed to dramatically improve when virtual attendees were immersed in an environment that felt and functioned more like a real-world experience. Business professionals piloted 3D initiatives in the hopes of moving their distance presentations beyond passive screen sharing to a more natural and personal interaction with staff, partners and customers. Interviewing those who piloted 3D virtual initiatives has been encouraging in that they, for the most part, still believe in the promise of immersive distance meetings and training. Today’s generation of 3D technologies are delivering successful solutions and are beginning to address the issues raised by these early adopters and to incorporate the specific functionality they found to be critically important. Following is a listing of some of the technical factors and capabilities cited as most important to any successful 3D immersive experience.

Sound Volume and VoIP Control
First and many would argue “foremost”, sound has to be right in order to conduct successful online meetings. You only need to attend one or two virtual events to realize how critically important granular VoIP control is to the meeting’s success. The CEO of Vivox Corporation put it best, “If you don’t have spot on VoIP then you don’t have anything when it come to virtual gathering”. There are several technical requirements to making VoIP work well in virtual environments.

Individual Microphone Mute and Volume Control
Most 3D platforms today still tend to put all virtual attendees into a single VoIP channel with one master volume for the group as a whole. This approach is the reason for most virtual meeting starting late as each participant is asked to adjust his volume manually one at a time based on feedback from the group. Often attendees are unable to figure out how to get their microphone volume to an acceptable level because their headset is managed by volume controls found in 4

the operating system control panel, proprietary device driver software and a physical control on the headset itself; any of which could be the problem. Add to this confusion the fact that there are often multiple microphone control slider bars in Windows®, line-ins and “microphone boost” options hidden in various windows depending on which version of the operating system you are using. It seems simple enough to ask someone to turn their microphone volume down but in fact they may not be able to figure it out and unfortunately one bad volume can ruin the entire group’s experience. The answer is simple. Each attendee must be able to control every other attendee’s volume individually for themselves. In other words, if Bill’s volume is too low I should be able to click on Bill’s avatar and increase his volume but only for me (i.e. this will not affect how others hear Bill). With this ability each attendee simply makes a few adjustments on the fly as others are speaking and the meeting just hums along. Without individual volume and mute control you are asking for a frustrating experience. Some vendors now offer auto-volume leveling technology which attempts to automatically adjust each participant’s volume to a similar level. This technology is based on quick initial samplings and can help some but it is not a replacement for individual volume control. Most experienced VoIP users will agree that it is rare for any auto-leveling technology to adjust every attendee’s volume perfectly. Individual and granular control of the volume at which you hear others is an absolute must for painless virtual meetings.

Open Microphones and Echo Cancellation
A single attendee or student with an open microphone can ruin your virtual training class or meeting. If a participant does not have a headset, the sound of other’s speaking comes through his speakers and into his open microphone which then rebroadcasts the sound back to those who are speaking. This is extremely frustrating as speakers hear everything they say echoed back to them making it almost impossible to focus on what they are saying. It is often very difficult to figure out which attendee in a larger group has the open microphone or even to explain to them what the problem is when communications are so hampered. Some vendors have attempted to address this issue by adding a “push to talk” feature that forces virtual attendees to hold down a particular key while speaking like using a CB radio. One’s microphone is muted automatically except when the talk key is being held down. This does stop open microphones echoes but it also adds a level of complication to the virtual meeting as communication becomes less free flowing and natural. Attendees often forget to hold down the required key when they talk leading to long silences before apologies and backtracking. Not everyone has an available headset and even if they do they still may not be able to turn off a built-in open microphone in their laptop, etc. The best solution to this problem is having a feature that gives the host control to force only the attendees with open microphones into “push to talk” mode and leaves everyone else as is. Echo cancellation is a sophisticated technology that is also targeted at this problem. It attempts to automatically remove echoes from open microphones via sound pattern recognition. Echo cancellation sometimes works well but other times it does not work at all. Echo cancellation and granular host control of “push to talk” are both required capabilities for any online meeting of any size especially if there will be attendees that are new to virtual meetings. 5

Positional Sound
Positional sound significantly enhances the virtual meeting experience and dramatically reduces audio fatigue for attendees. Positional sound attempts to simulate how we hear sound in the real world. It allows virtual attendees to hear others speaking predominately through their left or right speaker based on the relative location of others to them in the virtual space. It also reduces the volume of others proportionately to how far away they are from the listener. Positional sound gives acoustical clues as to where others are virtually standing or sitting and as such significantly increases the sense of presence and immersion in virtual environments. This effect is hard to describe without experiencing it firsthand. For example, when someone at the end of the table begins speaking, without even seeing that person, attendees automatically know that the speaker is sitting to their left several seats away. When compared to everyone being in a non-positional VoIP channel i.e. a single volume without stereo, the difference is literally miraculous. As will be discussed later, fidelity is an important part of what makes a virtual environment work and positional sound is critically important to creating that sense of presence. In the real world, we tend to look at other people when they speak to us. Without directional sound clues, no one knows where to look so they either begin scanning the virtual room trying to see whose mouth is moving or they simply stare off into space not even attempting to look at the speaker which creates an equally weird effect. Audio fatigue is a known and well-documented problem in the telecom and virtual meeting industries. We are conditioned since birth to associate sound with a source. Presenters at the 2009 Basex Conference on Teleconferencing argued that when attendees cannot see who is speaking as much as 30% of their mental energy is spent trying to “fill in” the missing data asking “who said that” and “who was that said to”. That is a potential 30% less mental cycles available to focus on what is actually being presented. This effect is the reason why we tend to feel more tired and mentally drained after a long telephone conference call than we do after a real world meeting. Positional sound is not just a VoIP gimmick, it frees up mental energy and focus which enables attendees to learn and retain more while making better and faster decisions. It is an important capability that will help your virtual attendees reduce audio fatigue. It will also enable them to quickly look at others who are speaking to communicate their attention which dramatically increases the fidelity and realism of the virtual experience.

Multi-Channel VoIP and Proximity Zones
A single VoIP channel works OK for smaller virtual gatherings but for larger classes, meetings and events with dozens of attendees a multi-channel VoIP solution is required. Imagine you are seated with a few of your colleagues at a virtual lecture or conference with fifty to a hundred other attendees. You need to hear the presenter or panelists but you also would like to be able to make comments to your colleagues sitting near you. If everyone in the room is in a single VoIP channel then you will hear everyone in the room, thus making it very difficult to hear the presenters. This is especially true in a positional VoIP channel where you hear those closest to you the loudest. Single channel VoIP solutions try to avoid this problem by muting those who are not presenters but this removes your ability to whisper to colleagues and it creates an unnatural silence in the venue that dramatically reduces the fidelity of the experience.

6

A two-channel VoIP technology addresses these issues allowing each attendee to be in both a small directional proximity-based VoIP channel and an all inclusive non-directional VoIP channel simultaneously. The large inclusive channel is used by presenters like a house microphone. Everyone can hear the house channel but only presenters and those granted permission can speak into it. This has the effect of a real world house microphone or PA system. Anyone using the house microphone VoIP channel can be heard clearly by everyone regardless of distance. At the same time, attendees are also in a small proximity-based VoIP channel into which they can speak and hear. Small proximity-based VoIP zones can encompass each attendee and a few seats to either side. They allow colleagues to whisper to each other and share insights without disturbing the entire room. If the proximity zones are sized properly they will also provide just the right amount of ambient noise to create the sense of being in the audience at an event of similar size and crowd. If an attendee is seated close by (within the proximity range) of another chatty attendee he can individually mute that person if the VoIP channel also supports this feature. Getting directional, non-directional and proximity-based VoIP ranges right is as much of an art as it is a science. The size, shape and configuration of the virtual venue must be taken into consideration. For example, in a large lecture hall one would want small individual proximity zones but in a smaller, more collaborative, venue such as a board room with facing chairs one would want a single proximity zone that includes all attendees. In effect, the VoIP zones and proximity sizes must be configured per venue and for the meeting experience that venue is designed to facilitate. Larger training rooms are the most difficult of all venues to configure correctly from a VoIP perspective because of their duel functions. For example, at times a trainer may want to use the class room like a lecture hall with a very small proximity VoIP zone for each student. At other times the teacher may want to play the role of a facilitator for a class open discussion requiring everyone to be in the same proximity-based VoIP zone. Other configurations might include larger proximity zones for teams meeting in the corners of the room. The best VoIP system of all is one enabling the event host to dynamically adjust the size/range of the attendee’s proximity based VoIP zone on the fly. Preconfigured settings/use-cases make for the simplest user interface here such as a menu option that allows hosts to configure their virtual training room for “lecture”, “open discussion”, etc.

Dial-in Capabilities
Dial-in refers to the ability to dial a telephone number that connects the caller to a VoIP channel within a virtual environment. Most 3D platforms, if they support dial-in at all, think of it in terms of an attendee who does not have access to a computer at the meeting time and is therefore limited to audio-only participation. This is a common use case but a far more common use case involves an attendee who does not have a headset or whose firewall settings do not allow VoIP traffic. These participants can be and should be virtually present. They have an avatar, see 7

content and can text chat with others. They just need a sound solution that incorporates their virtual presence. The best dial-in solution is one that turns the attendee’s telephone receiver into a VoIP headset for all practical purposes. Although directional sound is not possible with a single-ear telephone receiver, the participant’s avatar’s lips should move as the user speaks into the phone. This capability alone dramatically increases the number of attendees who can participate in a virtual training event because it makes the event available to those who do not have a headset or cannot connect to a VoIP channel.

Individual Granular Control of Attendees
In general, individual granular control refers to an instructor or meeting host’s ability to control each attendee’s experience and permissions individually and not just as a group. This capability is very important to achieving both the level of control and audience participation that makes for the best interactive virtual experiences. Individual granular control falls into several categories.

Content Control
Meeting host/instructors must be able to pass content control individually to an attendee. The goal of 3D immersion is increased engagement. Allowing others to take control of content allows them to be more than passive observers. It enables them to demonstrate their learnings and participate interactively in the process. The ability for an instructor or facilitator to remove this permission is equally important to maintain control.

Mobility Control
Meeting host/instructors must be able to individually grant and revoke the ability to stand and move about. This is important especially in larger virtual venues and public events where one disruptive attendee can compromise everyone’s experience. The ability to come to the front of the class or lecture hall or to move to a breakout session is critically important to maintain the feeling of active participation and engagement.

8

Speaking Control
Meeting host/instructors ability to individually grant and revoke an attendee’s right to speak publicly is equally important for both creating an interactive experience and maintaining control. Attendees should be able to mute other attendees if they become a distraction. Meeting hosts should have the ability to grant speaking privileges by VoIP channel to any attendee.

Appearance and Animation Control
The host/instructor must have the ability to control any factor that might compromise the learning process or event. For example, meeting attendees or students should not be allowed to dress or undress in a way that would be offensive or a distraction to others. Likewise, students should not have unchecked abilities to perform gestures or animations that would also prove counterproductive or disruptive. The best balance is to allow individual freedom of expression while empowering the instructor or presenter with the ability to restrict any attendee who crosses the line.

Presence Control
Meeting hosts and Instructors must also have the ability to limit access to and even expel a disruptive participant. Corporate trainers often control access to sensitive and proprietary information that must be kept secure. Public events particularly require the presenter to have an ability to control access and to expel invalid or distracting attendees. This is one lesson from public virtual worlds that cannot be ignored. The event host and those granted moderator rights must have total control of their virtual gathering.

Moderator Control
Finally, instructors and meeting hosts must have the ability to give another virtual attendee or instructor the ability to moderate. Moderator control is the ability to grant and to revoke all of the abilities discussed thus far to another attendee.

Access: Firewalls and Proxy Servers
Most surveys and reports being published today continue to find “access” as the single biggest problem facing the expanded use of 3D virtual technology especially for corporate training applications. 3D applications are particularly offensive to firewalls because they tend to require diverse types of internet traffic including VoIP, positional data and all types of content. No matter how sophisticated the virtual offering, if your intended audience cannot participate then your initiative will fail. Corporate security administrators will open firewall ports for approved products that they purchase after a detailed evaluation but they rarely will do this for the one-off person who wants to attend a virtual class or event. The bottom line is that your chosen platform must run seamlessly through the vast majority of corporate firewalls and proxy servers.

9

Behind the Firewall Installations
3D vendors tend to tackle the firewall problem in one of three ways. The first way is to require that their platform be installed behind their corporate customer’s firewall. Historically this has created a lot of difficulties for virtual learning pioneers. Besides adding upfront expense, risk, time and administrative complexity to the pilot project, this approach also only resolves the firewall issue for employees that are in fact behind that particular firewall. When they try to use the platform to include customers, business partners or anyone who does not work for their company, they discover that they have not solved their firewall problem but rather they have simply moved it.

Flash® and Java® - based 3D Applications
A second way that 3D vendors attempt to work around firewalls is to develop a 3D Flash or Java-based application. Adobe® Flash is a popular browser plug-in that allows movies and simple interactive programs to run within a browser. Flash is very popular and as such it is approved for download and use by most corporations. If a vendor can make their 3D application look like Flash content then they can avoid many firewall obstacles. The problem with this approach is that Flash content and interpretively run scripting languages like Java are very limited graphically in what they can do compared to stand alone software programs. Java and 3D Flash applications do tend to run through corporate firewalls but they have rather limited functionality and offer very poor fidelity. This presents a problem because, as will be discussed later, the fidelity of the virtual experience is important to how others use and view this technology within their organization. In the tradition of Jeff Foxworthy’s popular “You might be a redneck” monologue, If your avatar looks like a cartoon; If you are unable to turn your neck to look around; If your virtual venue resembles Sponge Bob’s living room; Then…you might be a Java or 3D Flash application!

Independent Firewall Friendly Solutions
The third and by far the most difficult way 3D vendors work around firewalls is to engineer every aspect of their platform to run seamlessly through every type of firewall and proxy server configuration. Very few vendors attempt this approach because getting this right, and addressing the related security issues, etc., can be as time consuming and difficult as building their 3D platform originally. This is by far the best approach because it provides for a platform that is accessible from anywhere, can be used for any application or target audience and provides for a full-featured high fidelity experience. Such offerings can also be offered as a service running as a browser plug-in. These applications require no firewall changes or complicated and expensive onsite installation of servers. As such, they tend to require no upfront cash outlay or hardware and can be purchased as a monthly subscription. Such pilots have far less risk and tend to be approved faster.

10

Firewalls and proxy servers are no longer confined to large corporations. They are starting to appear in small companies and even as part of home cable and DSL provider networks. Firewall technology is also constantly evolving. Make sure that your chosen 3D platform not only works well through existing firewalls and proxy servers, but also that your vendor is committed to maintaining that functionality on an ongoing basis into the future.

Content Integration
A major component of most virtual training and meetings is the ability to share and collaborate around various types of content. There are two ways to achieve this in a 3D virtual environment.

Screen Sharing
The ability to broadcast one’s screen buffer to others has been around for many years but the ability to share that image on a viewer inside of a 3D environment creates a much more immersive experience. Screen sharing is a common way to allow others to view your content without actually distributing that content to their local computers. Screen sharing, often called web conferencing, is the only way to allow others to see any real time edits you are making to a document or other content. There are, however, some major drawbacks to real time screen sharing if it is a platform’s only method for sharing content. Broadcasting many frames-per-second from your screen buffer is very bandwidth intensive and as such generally displays poorly except on the most static of content. Screen sharing is woefully inadequate for sharing videos for example and even PowerPoint documents with embedded slide transitions and animations can appear jerky and delayed. Worst of all, real time 3D applications using VoIP are typically already maximizing available bandwidth. When real time screen sharing is added to the mix VoIP often becomes scratchy and overall 3D performance can become slow and out of sync.

Host Controlled Distributed Content
Another more complex approach for sharing content within a 3D environment involves distributing the actual content to all participants so that it can run locally on the user’s computer while being controlled by the instructor or meeting host. For example a PowerPoint slide show or video can be distributed to attendees but controlled by the 11

meeting host. When the host decides to play or pause the video or to advance to the next slide, a small command is broadcasted to all attendees who then see the content change as if they were looking at the instructor’s copy of that content. There are many advantages to the distributed document approach. Content, especially content with moving graphics, displays smoothly just as if it were running locally because, in fact, it is running locally. Another advantage of this approach is that it requires far less real time bandwidth than screen sharing. More available bandwidth means better performance in general within the 3D environment especially where VoIP is concerned. There can be a major limitation to the distributed document approach if the 3D platform requires that each attendee have the actual software viewer installed locally, i.e. PowerPoint®, Word®, the QuickTime® player, etc. Sophisticated platforms avoid this requirement by converting all content to a common and ubiquitous format such as Flash prior to distribution. This removes the requirement to have each content players or application installed locally. Another limitation of the distributed document approach is that the distributed content must be downloaded before it can be viewed. If large video files for example are added to a meeting or training session they cannot be viewed by attendees until they have been completely downloaded. Well architected platforms minimize this delay by allowing presenters to select content for a virtual class or meeting when that meeting is created. Anyone who registers to attend that event will automatically have the content downloaded prior the event’s start time. This approach keeps almost all bandwidth available for use once the meeting has begun. A final concern regarding distributed content is security. Ask to see your vendor’s security whitepaper. All content should be converted and encrypted before distribution with temporary unlocking keys sent only when that content is being used by the presenter or instructor. The distributed and encrypted content should also be promptly removed from local hard drives as soon as the virtual event ends. Ideally, your 3D platform should support both distributed content and real time screen sharing. Distributed content provides better performance and a superior viewing experience while screen sharing is the only way to edit content in real time once a meeting has begun. One last point to keep in mind is that the way content is reproduced within a 3D environment varies widely from vendor to vendor with varying degrees of fidelity. For example, most platform vendors support PowerPoint content but they do so by converting individual PowerPoint document slides into static JPEG images. This approach looses all PowerPoint slide transitions, animations, embedded video etc. Platforms that have the ability to run Flash content natively within their 3D engines will have the most faithful content conversions producing in-world documents that look and respond exactly as they do when launched from your desktop.

12

Browser Content
Browsing the Internet and viewing web-base applications is a requirement for most 3D virtual training and meeting platforms. To avoid screen sharing almost all 3D platform vendors who support shared browsing use Linden Labs® open source version of the Mozilla® browser. It allows an instructor or presenter to control each attendee’s local Firefox browser in such a way as to appear that everyone is viewing the host’s browser within the 3D environment. Unfortunately, there are many well-documented problems that have plagued the Linden Labs browser implementation on Window’s based PCs. Because the Mozilla browser is a third party application it is not tightly integrated with the Window’s operating system the way that Internet Explorer® (IE browser) is. For example, it does not share the Windows caching system or information about what browser plug-ins are installed. This often creates unexpected results causing some attendees to see one web page while others see something totally different. Platforms that have taken the time to integrate the operating system’s native browser into their 3D engine will have much better compatibility and provide a more consistent viewing experience for virtual attendees. An added benefit of integrating native browsers is that they ship with the operating system so they do not have to be included in the 3D platform vendor’s installation process which reduces the average download size by dozens of megabytes.

Fidelity and Realism
There has been much debate about how important fidelity and realism are to the adoption and use of immersive 3D applications. Fidelity refers to how lifelike and natural the avatars and venues appear i.e. how faithfully they represent the real world experience they are simulating. Some argue correctly that one can experience content and directional sound equally well with a low polygon cartoonish avatar. Likewise one does not need shadows and high quality textures to participate effectively in a class lecture. Some studies have even shown that children, for example, often relate better to less realistic stylized virtual environments. Most 3D vendors, especially those offering low fidelity applications, will point to older concepts such as the “Uncanny Valley” which stated that most people would prefer to have a cartoon-like representation of themselves rather than something that looks close to human but strange. In other words between photorealism and a cartoon lies the uncanny value where 3D gets weird and creeps people out.

Avatar Face Fidelity
Our ability to create photo-realistic avatars and environments has come of age and the most practical argument for 3D fidelity involves avatar face creation. If a trainer, professor or executive can create his avatar with good fidelity from an uploaded photograph then everyone who knows him will instantly recognize him at virtual events. This is valuable because it empowers virtual staff meetings to get off to a quick start as attendees do not have to introduce their unrecognizable proxies. There are advantages when virtual attendees can tell immediately who is waving at them and professors, at a glance, can know who is asking the question without having to endure clouds of jumbled floating name banners obscuring attendees and content.

13

The demand and use of 3D photorealism today is unprecedented. We routinely watch movies that flow seamlessly between real and computer-generated characters with most of us nonethe-wiser. We are told that the Uncanny Valley lies between photorealism and 3D art. If a person’s photograph can be converted to a texture and accurately mapped onto his avatar then one could argue effectively that we have at last crossed over the valley to the side of avatar photorealism.

Business professionals generally want their avatar to be a fairly accurate representation of themselves so that they can be recognized immediately at virtual events. We talk about the importance of getting “face time” i.e. the feeling of a personal experience with others. One CEO was recently quoted as saying, “My face is my brand”. The president of a large university said he could see tremendous applications for the immersive internet from distance learning and faculty office hours to student recruiting and alumni meetings. However, he continued, “…but there’s no way I’m going to address my staff or students looking like some cartoon!” The same sort of response has been echoed by many business leaders. Many believe that 3D fidelity is important if for no other reason because it is linked to the adoption of 3D technology in general. For years some management teams laughed off 3D pilot projects even after they demonstrated excellent metrics and solid ROI for no other reason than that they could not get past the fact that this technology’s typical cartoonish characters remind them of children’s games. Fidelity is important because it positions 3D meetings as a serious tool that can provide faithful representations of the venues and people who teach, learn and work within them. The Proteus Effect is a well-documented principle of psychology. It basically states that the way we act is influenced by how we feel about the way we look. In other words, humans have a tendency to “play the part” based on how we feel about our appearance each day. It has often been stated that people only buy into and support a virtual environment to the extent that they buy into and relate to their representation of themselves within that environment. It is interesting to watch the subtle changes in how focus group attendees act after they see their photo-created 14

avatar for the first time. They often stop referring to the avatar as “it” and begin to refer to it as “me”. These subtleties are the essence of what creates the immersive sense-of-presence and self that makes virtual reality immersive. If for no other reason, fidelity is important because it can have a very positive impact on the success of your pilot project and the future adoption of your 3D workplace applications.

Realism
Realism is an important part of fidelity that addresses how lifelike the 3D experience appears. For example, do avatars move in a natural and convincing way? Do they make eye contact when they look at others? Do their lips move appropriately while speaking? Can they communicate using expression? Realism is important for many of the same reasons that fidelity is important. Would you want to use virtual conferencing and meeting tools if the fidelity was poor? For example, would you use video conferencing if the video images on your screen where washed out and blurred? What if video attendees did not look like themselves and appeared to jerk around in unnatural looking ways? Probably not; it is important that 3D fidelity be of high enough quality to basically “not be all that noticeable” or at least not be a distraction. The attendee’s mind should have minimal obstacles in accepting the virtual venue and other attendees as reasonable likenesses of what is being simulated. If others appear as flat cartoons incapable of neck movement or the ability to walk without “ice skating”, then the business professional or student is constantly being reminded that the immersion is not real. Realism is important because it helps the virtual world to “get out of the way” so that attendees can focus on the goals of their gathering. The key to an immersive movie viewing experience is the suspension of disbelief. We have all watched low budget movies with poor and unconvincing graphics and sets. No matter how great the dialogue or story line, you probably struggled to “get into” the movie. Many 3D pioneers report a similar experience when using low fidelity platforms. It can take additional years of technical and artistic polishing, even after a 3D platform has been launched, to create consistently smooth and natural looking avatar movement. Sophisticated 3D platform providers, in business for the long haul, know that this time consuming and expensive effort is critical to their product’s success and the long term adoption of the 3D modality.

Focus and Eye Contact
A fundamental question that teachers and presenters ask themselves many times in both real and virtual meetings is “Do I have their attention?” A huge advantage that realistic high-fidelity platforms afford is the ability for users to 15

communicate focus and attention. Users can move their focus causing their avatar to adjust head, neck, eyes and posture appropriately so others can see exactly where they are looking. When an avatar turns and look directly at you, then you know you have that person’s attention. This fidelity dramatically increases the overall sense of presence and makes virtual events much more personal and engaging.

Self Expression and Non-Verbal Communication
Another important part of 3D fidelity and realism involves a virtual attendee’s ability to express himself. If you watch participants in a real world meeting or class room you will immediately notice that they do not just sit in passive trances. On the contrary, they constantly communicate even when not speaking. They laugh, frown, fidget, slump, grimace and do a host of other things to communicate their engagement in and feelings about what is happening or being said. The ability for virtual attendees to use non-verbal communication is an important part of what keeps them engaged and actively participating. In lectures, conferences and larger gatherings we may not all be afforded an opportunity to state our opinions verbally but we can all still communicate and this ability keeps us from becoming passive disengaged bystanders. Although some platform vendors dismiss non-verbal communication as simply being “cute” or a gimmick, 3D pioneers have discovered that it is these degrees of control that keep virtual attendees from slipping into the all-too-common disengaged state associated with passive 2D web conferencing.

Ease of Use
Over two consecutive years at one virtual learning conference the audience was asked to vote on what it believed to be the single biggest obstacle to the adoption and use of immersive 3D environments within their organization. Both years the number one response was the same; 3D environments are too difficult to learn and use. The issues sited included installation, navigation, gesturing and content integration. No matter how full featured and sophisticated your 3D platform, if it’s difficult to learn and use you will have a major uphill battle reaping the benefits it has to offer. Rather than show specific 3D interface blunders (and they do abound) this section will focus on some broad guidelines and generally accepted best practices for interfacing and controlling one’s experience in a virtual environment. Most of these best practices were derived from surveys and from watching hundreds of hours of users who were new to 3D immersion, attempting to use various interfaces without any instructions or previous training.

Interacting with Objects
A common mistake most platform designers make is in employing 2D interface design techniques rather than taking advantage of the 3D environment itself. Finding an icon on a tool bar that opens a menu of options may be your only choice when using a 2D product but 3D environments can provide for much more intuitive interactions. The best 3D interfaces are those that allow users to mouse click directly on the object, person or content with which they 16

want to interact. This can cause a default action or a small menu of options. For example, clicking on a chair might popup a menu with the option to “sit here”. Clicking on an in-venue view screen might offer options to add, advance, enlarge or close content, etc. Rather than hunting through icons and menus, this interface approach limits what new users need to remember to one thing, “To interact with or control something just click on it”. Interface consistency is critical for ease of use so to interact with someone in the 3D environment users should likewise simply be able to click directly on another avatar. This should produce a list of actions that can be performed directly on the other person such as point at, change volume level, view profile, or grant the ability to control content, etc. Interfaces that take advantage of the 3D environment itself are the most intuitive, natural and easy to learn. Actions that are the most common should always be the most accessible in the interface. For example, the ability to point at something or someone is a very common and useful communication tool. After watching and documenting the pointing references from hundreds of hours of real world video of teachers, presenters and collaborators, it was discovered that nearly 25% of all non-idle movement involves reference pointing. Presenters predominately point at their audience, themselves, and their content. By making the ability to point a function of clicking on the object being pointed at, this high-runner interaction becomes intuitive, quick to activate and easy for users to remember.

Navigation
A first problem that new users in 3D environments tend to face is navigation. How do I get my avatar where I need to be or sit down at that table? Non-gamers have a hard time walking around using arrow keys and tend to collide with things. They often lose orientation by looking straight up at the sky or ceiling. This initial experience can be very frustrating and embarrassing and it can even sour some new users against immersive 3D environments altogether. Many issues can be minimized by having attendees appear already in their seats. If ambulatory, users should not have to use arrow keys to walk around. They should simply be able to click on the ground where they want to go and their avatar automatically navigates there avoiding obstacles and other avatars i.e. click on what you want to interact with even the floor. Field-ofview should also be constrained to help prevent users from looking straight up and becoming disorientated. Another common navigation problem can be easily avoided by limiting the size and complexity of the virtual environment itself. Military simulators and games may need acres of land and many buildings but the typical training class or boardroom meeting can be easily accommodated in a single room. Modeling your entire campus or headquarters might seem very cool but in reality you are probably just adding a lot of complexity and expense to your project and creating additional navigation problems for your virtual attendees. The vast majority of conference rooms, classrooms, etc. is fairly generic and can be covered using a relatively small and cost-effective group of pre-built venues. Likewise, flying, teleporting and using portals within a virtual world is also very cool but not the kind of things people want to figure out when late for a meeting or class. The simplest and most practical way to get to a meeting or class is to have it listed on a 2D webpage or as a hyperlink in an email invitation. Once clicked, the virtual attendee should find himself seated in the event and ready to go with no need for additional navigation or training.

17

Gesturing
As previously discussed, the ability to gesture and express oneself is an important part of creating an engaging virtual experience. All 3D platforms support basic gesturing such as hand raising, clapping and agree/disagree. These platforms, however, support a very limited number of other gesture animations. This is mainly because it is too difficult for presenters to look through a list of hundreds of potential gestures to find the right one on-the-fly while talking and controlling content at the same time. Unfortunately, limiting the number of available gestures is not a good solution either. A limited number of gestures that everyone in the room repeats over and over again can dramatically reduce realism and does not provide the repertoire presenters really need. This is an interface dilemma that has perplexed 3D platform providers for years. A novel solution to this problem has recently emerged. The study of hundreds of cross cultural gestures led to the concept of gesture archetypes. (VenueGen White Paper: Gesture Archetypes) There are a limited number of gesture classes (types) that communicate the same core meaning. The palms-up gesture archetype, for example, always communicates a lack or need for something. Any gesture from this class communicates the same meaning. Interfaces that use gesture archetypes are much simpler to use and significantly increase the number and variety of available gestures. When a user selects an archetype his avatar will automatically and randomly display one of dozens of appropriate gestures from that meaning class. In other words, the presenter does not have to select a particular gesture from a long list but rather he focuses on what he wants to communicate and the avatar gestures appropriately from that gesture group. Interfaces that allow multiple related uses from a single button or icon can also significantly simplify an otherwise complex 3D interface. For example, a gesture icon can be single-clicked to play a normal gesture archetype; double-clicked to play a high profile or more intense version of that gesture or clicked-and-held to continue doing the gesture until released. This approach not only feels very natural and intuitive it also provides support for hundreds of gestures through relatively few onscreen icons.

Set and Forget
Another best practice for all interface design but especially important in 3D virtual environments is the concept of setand-forget or autopilot. In less than a minute, users should be able to create a profile of how their avatar speaks, gestures, sits, and acts so they don’t have to click icons and buttons during meetings. Your avatar should look and act like you automatically. This interface technique allows users to select options or general behavior patterns for their avatar as a default so as not to have to manually drive 18

changes. An example of a set-and-forget behavior might include selecting how much one typically moves his hands while speaking. Once selected, one’s avatar randomly moves its hands automatically (to the extent selected) whenever the user is speaking. Posture is another excellent candidate for a set-and-forget interface. One should be able to manually instruct his avatar to change its idle sitting animation from legs crossed to hands-in-lap but does one really want to use mental energy to keep up with this? Selecting a default posture level between formal and casual is much easier. The avatar then randomly changes postures automatically from time to time but remains true to the formality class selected. Set-and-forget interfaces can make a tremendous contribution to a 3D platform’s realism and ease of use. It is amazing to see one’s avatar automatically acting like its owner. The less new users have to do to appear natural and at ease in a virtual environment the easier it will be to drive adoption of 3D technology in your enterprise.

Viewing Content
A final interface issue often sited involves how users view content. The ability to clearly read content from anywhere in a virtual environment is critical for most types of virtual meetings and training events. If a user is required to move his avatar close to content in order to read it then there will be navigation and crowding issues. If the content pops up in another window forcing the 3D window to minimize or shrink proportionately then a large part of the sense of presence and immersion is lost. The best interface for viewing content involves two capabilities. First, users should be able to zoom their focus in on content without actually having to move their avatar closer to it. This can be accomplished via a mouse wheel, track pad, or the plus and minus keys. This feels very natural allowing users to control their perspective. They can zoom out to see speakers or zoom in to read fine content or even choose to split the difference. A second important capability for viewing very small content is an in-venue floating window. This technique forces the selected content to enlarge and float in reading position within the 3D environment. This also feels natural simulating how a user might hold up a piece of paper while reading it. When used, this function should also trigger an animation that shows the avatar looking at a paper or handheld device to communicate to others that the user is concentrating on the content. These combined techniques empower simple and natural viewing of content while carefully preserving emersion within the 3D environment.

19

Conclusion
Much has been learned in the last few years about what is required to successfully pilot the use of 3D virtual meeting and learning technology. Buyers should carefully review vendors to evaluate how well they have incorporated these learnings into their platforms. Most 3D platforms today still lack many of the features and capabilities necessary to host high fidelity, full featured, problem-free virtual events. Early adopters of virtual technology, however, remain optimistic about platform vendor’s ability and commitment to meeting their needs. The 3D immersive Internet is being adopted today for meetings, training, and conferences. Virtual platforms can now be delivered as a service and have become accessible to smaller businesses, providing a powerful competitive advantage, once only available to larger corporations with huge budgets. The age of 3D adoption is upon us. Virtual platforms are smarter, and users now have more video power and bandwidth than ever before. Innovative businesses can improve information delivery to a dispersed workforce, getting everyone “in the same room” instantly. Opportunity exists today for businesses to use maturing 3D technology to transform processes, increase productivity, extend reach, save time and reduce costs.

About VenueGen
VenueGen is a browser-based 3D immersive internet meeting platform that enables professionals to meet, train, collaborate, share and present information quickly and cost effectively via virtual venues such as boardrooms, classrooms and social halls. VenueGen customers simply select a meeting room, upload any type of content, and instantly enter a high fidelity virtual room with directional VoIP. VenueGen enables users to start realistic and immersive virtual meetings that are more personal and engaging than typical web conferencing and more practical and scalable than video-based solutions. With VenueGen, attendees communicate more fluently, make decisions and learn faster, and are more productive than with other online virtual meeting technologies. No more boring conference calls, complex and expensive video equipment or time consuming travel. VenueGen is “Business Ready”. Based in Research Triangle Park, NC, VenueGen offers a 30-day free trial. If you have three minutes and an internet browser, you have all you need to see the future of virtual meeting technology.

venuegen.com

919.228.4997

info@venuegen.com

20

Appendix: Capability Checklist
Following is a checklist of platform capabilities discussed in this paper and what many experienced 3D pioneers believe to be must-have functionality for any successful pilot. This checklist may be helpful in evaluating your current 3D platform or while in discussions with vendors as part of your platform selection process.

Platform Requirement/Functionality Check List

Availability

Sound, Volume and VoIP Control (pages 4-7)
1. Each attendee independent microphone mute and volume control 2. Microphone volume auto-leveling 3. Echo control ability to force push-to-talk (closed microphone) 4. Automatic echo cancellation 5. Direction/positional VoIP sound 6. Simultaneous multi-channel VoIP and configurable proximity zones 7. Dial-in (turning phone receiver into VoIP headset)

Individual Granular Control of Attendees (Pages 8-9)
1. Ability to individually grant and revoke content control 2. Ability to individually grant and revoke mobility 3. Ability to individually grant and revoke proximity zone speaking 4. Ability to individually give and retake house microphone 5. Ability to control attendee appearance and animations 6. Ability to control event access and expel attendees 7. Ability to individually grant and revoke moderator rights

Access: Firewalls and Proxy Servers (pages 9-10)
1. Ability to run as a service without behind-fire-wall installation 2. Ability to run as a browser plug-in 3. Independent firewall friendly engine (not a Java/Flash-based application) 4. Proxy server support

Content Integration(pages 11-12)
1. Support for real time screen sharing

21

Content Integration (pages 11-13)
1. Support for real time screen sharing 2. Ability for host to distribute and control content running locally 3. Ability to run Flash content and applications natively 4. Ability to convert content to Flash for 100% 3D compatibility 5. Integrate IE browser for full browser/OS compatibility

Fidelity and Realism (pages 13-16)
1. High fidelity photo-realistic (4000 polygon plus) graphics 2. Ability to create avatar face from uploaded photograph 3. Realism (suspension of disbelief, natural head and shoulder turning) 4. Ability to convey focus and attention (eye contact) 5. Facial expression 6. Self expression (posture, interest level, etc)

Ease of Use

(pages 16-19)

1. Interact directly by clicking on objects, viewers and others 2. Simplified and minimized navigation requirements 3. Integrated gesture archetype interface 4. Set and forget –automatic movements based on user profiles 5. Zoom focus capability to read fine content 6. Click to enlarge content to reading position within 3D environment

22