You are on page 1of 5

Real - Time Motion Capture Technology on a Live Theatrical Performance with Computer Generated Scenery

Anthousis Andreadis, Alexander Hemery, Andronikos Antonakakis, Gabriel Gourdoglou, Pavlos Mauridis, Dimitrios Christopoulos, John N. Karigiannis
Foundation of the Hellenic World Department of 3D Graphics & Virtual Reality Poulopoulou 38, 118 51 Athens Email: john@fhw.gr

AbstractThe employment of innovative forms of technology in the areas of art and entertainment is receiving signicant attention from the research community in the context of evaluating new forms of expression. Recent developments in tangible interaction, pervasive sensing, wearable computing, and mobile communications bring about the potential to connect, in an unprecedented manner, persons to places (real and virtual) and media, as well as to other persons and objects. The work presented here brings together a research team that is multidisciplinary, scientists, engineers, 3D-artists along with members from the art domain such as stage director, theatrical costumier, choreographer and actors, in order to synthesize a live theatrical performance that jointed technology and art. Real time motion capture data was streamed on a multi-screen topology, while real time generated virtual scenery was creating the virtual context of each act. Live actors were interacting with a digital avatar that was rigged to a motion capture suit through out the theatrical act. The entire setup was evaluated on a theatrical case study that was presented to audience at the Theatron, a recongurable space at the cultural center of Hellenic Cosmos, providing signicant feedback on the acceptance of utilizing such a technology both from the audience perspective as well as the experts that live theatrical performances involve.

I. I NTRODUCTION Theater has always used external tools and technology to enhance the storytelling and provide visual feedback. The Ancient Greeks used various forms of stagecraft in their dramas, most notably the Deus ex Machina or god from the machine. In Medieval Mystery plays gunpowder was used to represent the devil [1]. As technology advanced at scenery and static sets were used to depict various backgrounds [2]. With the widespread usage of computers, theater has adopted innovative technologies, initially used as a method of handling stagecraft and now as part of the performance process. Recently, the concept of Digital Theater has been adopted, dening the coexistence of live performers and digital media in the same unbroken space with a co-present audience [3]. Several theater companies and groups have experimented with the notion of Digital Theater. The Gertrude Stein Repertory Theatre has been experimenting with what it calls digital puppetry, [4]. The Builders Association uses video and computer graphics alongside live actors, having them communicate with virtual

ones in order to bring out the complex relationship between humans and computers [5]. Blast Theory, a UK based art and performance group, uses a range of media such as video and computer graphics [6]. The Wooster Group is known for using video and digital media in their plays [7]. The theatrical duo of Kraft & Purver has experimented by utilizing video in different ways. At times they project video as a backdrop while in other cases images are projected directly onto the actors to make a statement or set a mood [8]. These groups differ in their methodologies, but they all seek to use technology to enhance and enrich the theater experience. It is expected that these tools will take their place alongside stagecraft such as recorded sound and computer controlled lighting as methods of enhancing the theater experience and augmenting the powers of the actor. This project aims to investigate the feasibility and interplay of virtual reality (VR) technology for theatrical presentation and live performances. The notion of VR Theater was conceived in the mid 90s examining future uses of VR in theatrical entertainment [11]. After a decade of research and innovation in the eld of VR by producing public shows for the immersive VR systems of the Athens based museum Hellenic Cosmos [9] [10], the 3D &VR Department of the Foundation of the Hellenic World attempted to perform certain experimentation in the framework of a new theatrical show. The theatrical performance was presented at the theater stage of the Hellenic Cosmos the Theatron [12], employing all the technological infrastructure of the facility such as multiple projecting surfaces, transparent screens, surround sound, high denition projection, real time playback and interaction, cluster rendering and real time motion capture. As a result, a unique experience was created, which certainly augmented traditional actor performances. In the following sections the proposed framework along with the components that comprised it and the theatrical setup are presented, concluding with the presentation of the live performance that was based on the proposed approach.

Fig. 1.

Software Architecture of EVS Clustered Engine.

II. C OMPUTER G ENERATED S CENERY The virtual reality software used for implementing the real time interaction and rendering of the digital scenery was a high level framework the Enhanced Visualization System (EVS) [13], [14], [15] an in-house developed engine for creating and running immersive Virtual Reality applications in a variety of hardware congurations. EVS is script based, meaning that ASCII script les describe the virtual world and the user interactions. It uses various open source external libraries to control different aspects of the virtual environment like the user input, the synchronization, the audio reproduction and the visualization of the three-dimensional graphics. The actual rendering framework was developed on top of the OpenSceneGraph library [16], which was mainly used for its high performance graphics characteristics. Fig. 1 shows the architecture of the whole environment that was developed using the Linux OS. A custom protocol was developed [17] and designed for the synchronization of multiple cluster units consisting of a central unit (master) and multiple subunits (slaves), through which the master could synchronize the slaves in order to have consistent representation of the virtual world from multiple viewpoints on multiple screens. The communication between the units was via a 1Gbps LAN network, and the synchronization was achieved through special synchronization packets sent to all units by the central unit. In order to enable the visualization on a multiple channel setup, allowing monoscopic, passive stereo, active stereo or left/right individual eye operation, we developed a library that could handle arbitrary display surfaces and viewing modes [17]. Therefore we were able conguring an application to run on a CAVE-like environment, a Reality Center, a domed theatre, a Power wall, or any other display conguration such as a theater stage, as well as on single screen desktop VR systems and HMDs. It was easily congurable through a simple yet effective XML script, which allowed multiple congurations to be present in a single le and share some common features if necessary. For handling the various input devices a generic open source architecture network interface was implemented, the Virtual Reality Peripheral Network (VRPN) [18] which provides a device-independent and network-transparent interface to virtual reality peripherals. VRPN is a set of classes that enables interfacing between application programs and physical devices (trackers, buttons, dials, force feedback devices etc.) usually employed in virtual reality (VR) systems. In the

Fig. 2. From top-left to bottom-right you see screens from the use of the behavior plug-ins: Birds ocking, school of sh, pedestrian gather action and pedestrian interest point visiting.

specic setup the system could manage all the external inputs needed for realtime navigation during the show as well as selective input from the mocap and tracking PC in order to let the application react to various events. III. AI-BASED F RAMEWORK In order to enhance the crowded city areas of our scenery, we had to design a modular Articial Intelligence (AI) solution for the humans and the animals (Fig. 2). Our AI solution was built based on the OpenSteer C++ library [19] which was greatly modied to our needs and incorporated into our engine. In the current project we needed to build behavioral models for humans, ock of birds, school of sh and simple wandering for animals. For the humans we designed a generic pedestrian behavior that would perform avoidance of other AI entities, obstacle avoidance, terrain following and non-strict path following. Furthermore to augment the effect of realism in human behaviour, such as to watch at specic locations of interrest like market shops or chat to a friend when they meet, we added the ability to insert points of interest along the path and enable gathering of multiple humans on specic locations based on a probability function. The birds utilized a ocking behavior following a non-strict path which produced very good results, while for the sh we developed a variant of school of sh [20] behavior to achieve a realistic simulation model. Finally for all the other animals we developed a generic behavior that performs wandering within a restricted area and execution of simple actions such as eating. IV. R EAL T IME C HARACTERS For creating animated models (Fig. 3) we used a twofold modelling approach to achieve the detail expected from nextgen graphics applications. A very high polygon instance of the model was sculpted in Zbrush, exported to 3D Studio Max where a second simplied low resolution copy was created. This simplied copy was then used to apply diffuse and specular map textures. In order to retain a realistic lighting model for this low resolution copy, a normal map texture was

Fig. 3.

Pipeline of Creating 3D Animated Character Model.

backed from the higher resolution model and applied to it. This normal map retained all the lighting information of the higher model and with the help of pixel shaders provides the desired lighting quality. Once our 3D model with its textures had been completed, the process of rigging began. This involved the creation/adjustment of a bone based system suitable for animation, and the binding of the 3D model to it, so that it could be deformed accordingly, when the bones were animated. The binding of the mesh to the bones was achieved by assigning weights on the vertices to one or more bones. This system allowed the use of both forward and inverse kinematics at the same time, thus facilitating greatly the animation process. Motion to our characters came from three different sources. Through mocap where by a system of magnetic sensors worn by an actor, each sensor mapped to a bone of our character, transmitted motion data in 3D space and was recorded for application on our models. Video data, where motion was recorded as a 2D sequence of images and was then used as reference for animation. And nally hand animation, for entire motions or for correcting and altering mocap and video data. For our real time crowd simulations, we had to use sets of cycling motions of varying lengths. Each human character would have a walk, talk and listen cycle and at least two idle poses. Also cheering motions were added for the characters in the stadium of Priene. As far as animals were concerned, animation cycles included a minimum of an idle, walk and eat. Once exported into our engine, these animations would then be blended live in our real time application by our AI system. Our pipeline involved 3D Studio Max and Zbrush for modeling and unwrapping, Photoshop, Bodypaint and xNormal for texturing, 3D Studio Max for rigging, Animazoo suit and Animaview software for mocap, 3D Studio Max once more for animating and nally Cal3D [25] for exporting into our engine. V. S HADING L ANGUAGE All the shading and lighting calculations in EVS, are performed through programmable shaders. Shaders are small computer programs, written in a shading language, that are executed on individual elements (vertices or pixels) and affect the properties of this element (pixel color, vertex position). The concept of shaders was introduced with the RenderMan Shading Language (RSL)[21] and since then it is widely used in the creation of special fx for motion pictures, using ofine rendering systems. The evolution of the graphics hardware in the last years permitted the introduction of shaders in

Fig. 4. From top-left to bottom-right you see screens from the use of ESL Shading Language for rendering: Metalic Surfaces, Marbel, Trees, Sea.

real time graphics applications, with the introduction of the OpenGL Shading Language. GLSL is a lower level language than RSL, and lighting calculations require explicit knowledge about the number of lights in the scene and their properties. EVS provides an enhanced version of the OpenGL Shading Language (GLSL) [22], called ESL (Enhanced Shading Language). ESL is a pure superset of the GLSL, meaning that any GLSL shader will work on EVS, but also exposes a set of convenience functions and uniform variables for the computation of lighting, shadowing and vertex skinning. These functions are compiled during the runtime to the equivalent GLSL code, permitting fast and easy development of shaders, and more importantly reusability of the shaders between different scenes and projects. Shaders are essential in our project, since the appearance of the objects and actors in the virtual scenery should match the appearance of the real life props and actors on the theatrical stage. Also these virtual objects should interact with light as the real life equivalents. So, in our project we have used different shaders to model, among others, human skin, cloth, bronze and other metallic surfaces, the appearance of sea water, sea shore and marble (Fig. 4). VI. S PEED T REE V EGETATIONS Vegetation is always a very important part of an environment, its presence or absence, the look of the trees and ora, enhances and species the mood of any scenery. In this project the aesthetic aim of the vegetation was to enhance the liveliness of the ancient cities, creating interesting views that buildings alone could not achieve (Fig. 4). Speed Tree framework [26] was used as specialized tree modeling and compiling program to produce lightweight models optimized for realtime rendering. The modeling procedure consisted of creating tree species in different forms and variations needed for this project. The next step was to compile the models using the Speed Tree Compiler to produce highly optimized models. During the process, texture atlases of 2048X2048 pixels were composed, containing the individual albedo maps

and any other type of maps that the trees use. Additionally a texture atlas of impostors were created containing images of various viewing angles. The nal trees had approximately 7000 polygons in their highest LOD and approximately 700 in their lowest, the impostor images were used dynamically by the rendering engine in far distances. VII. M OTION C APTURE F RAMEWORK Mixed reality environments that combine digital and real assets allow elements from both worlds to coexist augmenting the experience that is provided. For improving the interaction communication mechanisms between virtual and real world we employed real time motion capture system, in a topology that is depicted in Fig. 5. The Motion Capture system of Animazoo [23] that we used was based on miniature inertial sensors, biomechanical models and sensor fusion algorithms. The actor whose action was to be transfered to the digital avatar had to wear a special suite with inertial sensors (inertial guidance system) whose motion data was transmitted wirelessly to a computer (PC1) where the motion could be initially previewed on a skeleton rig. From there the animation data was transmitted over ethernet to another machine (PC2) where the MotionBuilder [24] animation suite was running. Inside Motionbuilder all animation data was linked to an actual 3D character model which afterwards was animated and rendered providing the visual feed for the front projection screen. The system used gyroscopes to measure rotational rates sampling at 30-120 FPS. These rotations were translated to a skeleton in the MotionBuilder software and mapped on a 3D character (avatar). Much like optical markers, the more gyros used the more natural the data was replayed. No external cameras, emitters or markers are needed for relative motions. Inertial mocap system was able to capture the full six degrees of freedom body motion of a human in real-time. By using inertial based system we were able to solve the problem of large capture area next to the stage, something that would be necessary in the case of optical tracking system. The disadvantages obviously were the lower positional accuracy and positional drift which were compounded over time or due to electromagnetic interference. To minimize electromagnetic interference and positional drift, the actor was placed away from metal objects/machinery and the frequency of sensor calibration/resets was increased. Furthermore a powerful 9db antenna was used for the receiver along with a 5m wire extension to prevent data loss. VIII. L IVE P ERFORMANCE T OPOLOGY The overall topology that we designed for the specic live performance is presented in Fig. 6. The perfomance employed a group of actors, acting in the space between the two projection screens. The real time generated scenery was presented in the back and front projection screen in XGA resolution (1400x1050). The actor wearing the motion capture suit was placed back stage and via wireless communication its body data was captured at real time. The mapping of its capture data was rendered and projected on the front projection

Ethernet

Wireless Communication

Signal to Projector

Fig. 5.
Front Projectors

Real-Time Motion Capture Topology

Semi Transparent Front Projected Screen Avatar Projection Fixed Opaque Projection Screen

Back Projectors

Actor with Motion Capture Suit

PC1 & PC2

Fig. 6. Overall Setup of the Actual Theatrical Stage, Indicating the Projection surfaces, rendering sources, actors placement and Motion Capture System.

screen, allowing the real time interaction of the avatar with the real actors in the middle of the stage. It is importand to note that the front projection screen was semi transparent, allowing the actors and the back projection screen to be visible if nothing was projected on it, or if the projected image had black areas through which the background was visible. The front projection screen was also motorized and was rolled up during the show when not needed. This setup provided an efcient way to represent the computer generated environment. The two layers of projection allowed the seamless change of scenery and produced a depth parallax perception allowing the immersion of the actors into the projected digital scenery. The digital scenery was rendered using two video feeds, one feed was provided by a video playback PC used for the cut and trasition scenes of the show and the other feed was rendered using a real time cluster consisting of two PC nodes. The real time cluster provided the main scenery which was interactive and changed during the show according to the actions of the live actors. The control of the real time scenery was handled using a wireless joypad by a navigator sitting at the last rows of the theater since optical contact was required. This navigator determined the transitions and changes in the real time scenery interacting with the live actors on stage and the avatar. The actual story of the show was a 45 min. play, describing a father and his little sons quest to understand more about Ancient Greece. During this quest they were able to travel back in time and visit the Ancient Greek cities of Asia Minor

R EFERENCES
[1] Bamber Gascoigne, World Theatre (Boston: Little, Brown and Company, 1968) p. 72. [2] Frank Mohler, The Development Of Scenic Spectacle April 2006 http://www1.appstate.edu/orgs/spectacle. [3] Wikimedia Foundation, February 2006 http://en.wikipedia.org/wiki/Digital theatre [4] The Gertrude Stein Repertory Theatre, February 2006 http://www.gertstein.org/mission.html. [5] The Builders Association / dbox, February 2006 http://www.superv.org. [6] Blast Theory, Video February 2006 http://www.blasttheory.co.uk/bt/type video.html. [7] The Wooster Group, About the Group February 2006 http://www.thewoostergroup.org/twg/about2.html. [8] Ed Purver, February 2006 http://www.kraftpurver.com [9] A. Gaitatzes, G. Papaioannou, D. Christopoulos, Media Productions for a Dome Display System, Proc. ACM Symposium on Virtual Reality Software and Technology VRST pp. 261-264, Limmasol Cyprus, November 2006. [10] Gaitatzes, A.G., Christopoulos D., Roussou M., Virtual Reality Interfaces for the Broad Public, in the Proceedings of Human Computer Interaction 2001, Panhellenic Conference with International Participation, Patras, Greece, 7-9 December 2001. [11] Artaud Unleashed in Theater And Cyberspace. Dr. George Popovivh 1996. [12] Theatron, the new ultramodern cultural era, April 2010, http://www.theatron254.gr/ [13] D. Christopoulos, A. Gaitatzes, Multimodal Interfaces for Educational Virtual Environments, IEEE Proceedings of the 13th Panhellenic Conference on Informatics (PCI09), pp. 197-201, Corfu, Greece, 10-12 September 2009. [14] D. Christopoulos, A. Gaitatzes, G. Papaioannou, G. Zyba, Designing a Real-time Playback System for a Dome Theater, Proc. Eurographics 7th International Symposium on Virtual Reality, Archaeology and Intelligent Cultural Heritage VAST, Cyprus 2006. [15] Gaitatzes A., Christopoulos D., Papaioannou G., Virtual Reality Systems and Applications: The Ancient Olympic Games, Proc. 10th Panhellenic Conference on Informatics (PCI05), Springer LNCS 3746, Volos, Greece, November 11-13 2005. [16] http://www.openscenegraph.org (last visited 25 March 2008) [17] A. Gaitatzes, G. Papaioannou, D. Christopoulos, G. Zyba. Media Productions for a Dome Display System, Proc. ACM Symposium on Virtual Reality Software and Technology (VRST 06), pp. 261-264, 2006. [18] R. M. Taylor II, T. C. Hudson, A. Seeger, H. Weber, J. Juliano, A.T. Helser, VRPN: A Device-Independent, Network-Transparent VR Peripheral System, Proceedings of the ACM Symposium on Virtual Reality Software & Technology 2001, VRST 2001. Banff Centre, Canada, November 15-17, 2001. [19] Reynolds, C. W. (1999) Steering behaviors for autonomous characters. In Proc. Game Developers Conference, 763782 [20] Reynolds, Craig (1987), Flocks, herds and schools: A distributed behavioral model., SIGGRAPH 87: Proceedings of the 14th annual conference on Computer graphics and interactive techniques (Association for Computing Machinery): 2534 [21] Pat Hanrahan , Jim Lawson, A language for shading and lighting calculations, ACM SIGGRAPH Computer Graphics, v.24 n.4, p.289-298, Aug. 1990 [22] Randi J. Rost, OpenGL(R) Shading Language, Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, 2004 [23] Animazoo Motion Capture Systems. http://www.animazoo.com/ [24] Autodesk Motion Builder. http://usa.autodesk.com/ [25] Cal3D 3D character animation library. http://cal3d.sourceforge.net/ [26] Interactive Data Visualization I., Speedtree. http://www.speedtree.com/ (2010).

Fig. 7. Actual show photos of the motion captured controlled avatar interacting with the actors and usage of front and back projection for immersing the live actors into the digital scenery.

Miletus and Priene, meet people of these cities, participate in their everyday life and celebrations, experience their myths and legends. For guidance through this journey the avatar of the Ancient Greek storyteller Aesop, famous for his fables, was employed. This scenario was entirely realized using the technology decribed previously, the father and son were actual actors who were immersed into the projected real time digital scenery of accurate representations of the ancient cities of Miletus and Priene. The live actors incarnated the people met during the travel, while the avatar of Aesop was created by a back stage actor wearing the motion capture suite. IX. C ONCLUSION The aim of this project was to investigate the feasibility and interplay of VR technology for theatrical presentations and live performances. Employing real time motion captured data along with computer generated scenery and live actors we achieved to design an implement an entire theatrical perfomance. Although the next step to this process is to perform a qualitative analysis of this framework, the initial feedback received, suggests that motion capture data along with computer generated scenery indicates signicant potential in the domain of live theatrical perfomances and maybe it will become an essential tool by directors to realize or enrich their visions. ACKNOWLEDGMENT The authors would like to thank the stage director Mr. Y. Kakleas, choreographer Mr. K. Kosmidis, theatrical costumier Mrs Tsakiris, the actors K. Konstadopoulos, F. Kokkinopoulou, M. Mettis, K. Fioretos, S. Kostoulas, G. Ramos, K. Papavasileiou, A. Rivero, E. Dimroci, S. Michail, D. Charalampous, N. Xenariou, the museumeducators N. Tsiouni and P. Stellaki, the audiovisual team: P. Stefas, P. Galanos, N. Sarafoglou, T. Perdikouris, A. Dimitrakopoulou, F. Papagiannis and G. Karabelias. Moreover we thank the 3D graphics and VR team for their help in the process of realizing this experiment.