Video, Computer-Generated Environments and the Future of the Internet

By Ian Lamont (For graduate credit) HUMA E-105: Survey of Publishing, from Text to Hypertext Harvard University Extension School January 16, 2008

Almost since the first stuttering video clips appeared on the World Wide Web, observers have predicted that video will come to dominate the Internet. Mitchell Stephens, writing in the mid-1990s, foresaw the rise of sophisticated video production and narrative techniques derived in part from the “merger” of computers and video.1 He also believed the Web would play an important role for video, primarily as an on-demand distribution platform that would allow viewers to be finally freed from television schedules.2 Another commentator, writing more recently about the future of the Internet, proclaimed video as “king,” thanks in large part to the popularity of amateur videos and fan websites, and the rush of advertising dollars to online video content.3 Google, Microsoft, Apple, Cisco, Verizon, and many other technology companies apparently agree with these sentiments, spending billions of dollars on fiber-optic networks, massive data centers, and robust hardware and software platforms to deliver video over the Internet. While their technologies and business models are often in direct competition, there seems to be widespread consensus that the Internet will evolve into some sort of universal cable channel that showcases all kinds of video — from brief amateur video clips to Hollywood films — to potentially everyone with broadband Internet access, whenever and nearly wherever they choose. In such an environment, goes the reasoning, text, audio, still images, and everything else will play secondary roles.

Mitchell Stephens, The Rise of the Image, the Fall of the Word (New York: Oxford University Press, 1998), 164.
2 3


Stephens, 171.

Bambi Francisco, “Net Sense: The Future of the Internet,” MarketWatch. Available from 1

I would like to offer an alternative to this video-centric vision outlined by Stephens and others. While video is a compelling medium that may one day rival textbased websites in popularity, it will not dominate the Internet for long. I will argue that another type of content — one that shares video’s visual appeal, yet currently falls into the “everything else” category — will eventually overshadow video. That content will consist of sophisticated computer-generated environments, delivered in a variety of formats and serving many different types of customer needs, including entertainment, news, and community. These formats will use advanced computer graphics to deliver photorealistic, three-dimensional representations of real and imagined spaces to a vast, online audience, and allow audience members to interact with these environments and each other in ways that are not possible with video. Video — which I define as television, film, home movies, and any other moving images derived from the movements of lit subjects and scenery in front of a camera lens — was the dominant visual mass medium of the 20th century. It has had a profound impact on society and world history, as evidenced by the power of moving images to educate, propagate, agitate, inform and entertain. Stephens called video humankind’s “third major revolution,” after writing and print.4 Indeed, many of the major events and societal trends of the last century were shaped by this mass medium. Charlie Chaplin, Al Jolson, and Lillian Gish can be considered among the first international superstars, beloved by tens of millions across all social classes and in many countries all over the world, thanks to their leading roles in Hollywood films in the teens and 20s. Stardom was not unknown before film, but pre-


Stephens, 11. 2

mass media musicians, actors, orators and authors were restricted to live performances and personal appearances, which limited their popularity. Film made it possible for actors to simultaneously reach millions of people in cities and towns across America, and for performances to be watched over and over again. The impact on the public was tremendous. Politicians similarly expanded their audiences and platforms using the power of moving pictures. The rise of Adolf Hitler and the Nazi party in Europe in the 1930s was partially due to the influence of Leni Riefenstahl’s Triumph of the Will and other propaganda films that promoted core Nazi beliefs while casting Jews and other groups in harshly negative terms.5 John F. Kennedy’s political rise has been linked to an uneven televised presidential debate with Richard Nixon in 1960,6 and his death in a Dallas motorcade — captured on an 8 mm film camera by a bystander named Abraham Zapruder — sparked a national sense of mourning. Nearly three decades later, another amateur video showing of a group of police officers beating a black taxi driver named Rodney King on a Los Angeles street eventually led to several days of deadly urban riots across the U.S. Besides changing the course of history, video has come to govern our daily lives, and serves as an important means of understanding our world. While film was a just a fringe entertainment in 1900, it became a regular part of public life within a few decades.

Elliot Aronson and Anthony Pratkanis, Age of Propaganda: The Everyday Use and Abuse of Persuasion (New York: Henry Holt, 2001), 323. “1960: Kennedy-Nixon Debates.” Electronic Government Project, Eagleton Digital Archive of American Politics, Rutgers University. Available from 3


By 1921, one source estimated annual U.S. box office receipts totaled $850 million dollars, and the film industry was supporting hundreds of thousands of jobs.7 Television also made rapid inroads, expanding from just seven thousand sets nationally by the end of World War II to ten million receivers in 1950.8 Comedy, dramas, rebroadcast films and other entertainment formats were not the only popular types of television programming. Generations of children have been raised on a regular diet of educational programs and cartoons, and television news became one of the primary sources of news, rivaling the popularity of newspapers and magazines. As recently as December 2005, a survey of American consumers found that 59% got news the previous day from local television and 47% from national television, compared to 44% from radio, 38% from a local newspaper, 23% from the Internet, and 12% from a national newspaper.9 Clearly, video continues to have a strong hold over audiences. Its ability to show events, tell stories, and faithfully reproduce the words and actions of living beings gives it an advantage over text-based formats such as printed periodicals, books and blogs. Stephens also noted video’s ability to take viewers “elsewhere,” thanks to the way they dominate the input to our eyes and ears:

“Revolutionary Talking Movies: Widespread Changes That Are Predicted If New Invention Is a Success — Elimination of Numerous Stars.” The New York Times, September 10, 1922. Available from res=9F07E5DE1F3AE433A25753C1A96F9C946395D6CF
8 9


Stephens, 46.

John B. Horrigan, “Online News: For many home broadband users, the Internet is a primary news source.” Pew Internet and American Life Project, March 22, 2006. 4

We misunderstand moving images when we think of them merely as a form of communication, a type of entertainment, a means of information or an art form. Perhaps books, newspapers or radio can squeeze under such headings. Moving images with sound, because they occupy both of our major senses, cannot. They are more than that. They are a place we go.10 Stephens outlined a bright future for video in his 1998 book, The Rise of the Image, the Fall of the Word. According to his thesis, television and film throughout the 20th century was generally unoriginal.11 He said that video needed to be reinvented in a way that would enhance its strengths and eventually make it the pre-eminent medium for telling stories and conveying information, even complex information that has traditionally been the realm of print discourse.12 “... Once we move beyond simply aiming cameras at stage plays, conversations, or sporting events and perfect original uses of moving images, video can help us gain new slants on the world, new ways of seeing,” he said. “It can capture more of the tumult and confusions of contemporary life than tend to fit in lines of type.”13 The “new video” outlined in Rise of the Image, Fall of the Word incorporated some of the techniques developed by avant-garde filmmakers and directors working with music videos and television commercials, as well as new conventions and technologies envisioned by Stephens. Juxtapositions, fast cutting, densely packed imagery, new symmetries, an “excess of perspectives,” musical structure, new symbols and forms of

10 11 12 13

Stephens, 124-125. Stephens, 91. Stephens, 179. Stephens, 18. 5

representation, surrealism, and computer graphics would characterize new video.14 The tastes and preferences of audiences, he continued, would evolve to embrace new video, while spoken languages and the printed word would “increasingly be a less precise, less subtle language — one designed for use with images.”15 In his new video paradigm, Stephens described the importance of computing technologies. Graphics would play central design and production roles. Computergenerated imagery would be used for transitions, charts, and creative expression that would allow directors to express their artistic visions and sense of fantasy, while emphasizing the juxtapositions that he felt were so crucial to new video.16 After production was completed, computers would serve as a more effective distribution medium than movie theaters, terrestrial television and cable. The “network computer” — in the form of larger computers situated in people’s living spaces, or portable wireless devices, would serve as the primary conduit for anywhere, anytime video:

… As more and more video is produced, with the mass marketing of digital video editing, and as more and more video is stored in databases and accessed on Web-like networks, it seems inevitable that the screens of those network computers will be filled much of the time with moving images. … A whole range of full-screen video — tracked down in archives, discovered through hypertext (hypervideo?) forwarded by friends, crafted by artists, assigned by professors.17

14 15 16 17

Stephens, 182-199. Stephens, 209. Stephens, 196-197. Stephens, 172. 6

Stephens accurately predicted current technologies such as viral video, YouTube, and the video iPod. The character of video has been slower to evolve in the direction he predicted, but it may some day come close to matching his vision. However, Stephens and many of the other boosters who have predicted the eventual dominance of video on the Web have failed to adequately address the inherent conflict between the two mediums. In the video world, the director and others behind the camera lens tell linear stories. The audience watches the screen passively. Stephens readily accepted this drawback, noting “we have absolutely no influence whatsoever, free or otherwise, on anything that transpires. Movies and television shows proceed entirely without us.”18 The Internet, in contrast, is optimized for interactivity. It is a massive, distributed computer network that was originally envisioned in the late 1960s as a robust communications and file-transfer tool linking geographically dispersed computers and networks.19 In the 1970s and 1980s, Internet traffic consisted of data transfers, text messages, and relatively simple games. The audience was mostly limited to a small population of computer-savvy users who had a connection through work, school, or one of the early commercial service providers. By early 1993, there were just three to four million Internet users worldwide, and only several tens of thousands of network nodes.20

18 19

Stephens, 126-127.

Gina Smith, “Unsung innovators: Robert Kahn, the ‘stepfather’ of the Internet.” Computerworld, December 3, 2007. Available from action/


That would shortly change. In 1994, the World Wide Web — a set of protocols and technologies that sit on top of the Internet — leapt out of research labs and college campuses, sparking a communications and media revolution. Instead of typing in text commands to access information or send messages, people were presented with a screen full of information and options. A Web page might consist of text, photographs, and music. Almost all pages were linked with at least one other page. Pages could also access software applications, databases, and other computing resources. Users could navigate these interconnected pages with a Web browser and a mouse. This greatly simplified access to the Internet, and opened up Internet-based content to mainstream audiences. Content quickly moved beyond static pages containing text, links and photographs. On some sites, pages contained forms that allowed people to input text and software commands. This enabled developers to create front-end interfaces to back-end databases and other networked resources, which in turn gave users access to online discussion forums, search engines, online shopping, and a variety of registration-based services. In other words, the audience was not just limited to looking at content. They could react to it, respond to it, alter it, and discuss it or, for that matter, anything else on their minds. The impact of the Web on the Internet, mass media, business, and society has been enormous. As of late 2007, nearly half of all Americans had high-speed Internet connections at home. Many were using the Internet for social interactions, publishing blogs or personal web pages, or looking for information in order to make important life

Bruce Sterling, “Short History of the Internet.” The Magazine Of Fantasy And Science Fiction, February 1993. Available from nethistory.html. 8


decisions, such as making a major investment, getting career training, choosing a school, or helping someone with a health-related decision.21 Among young people, the Internet has emerged as a central tool for socialization and interaction with friends. According to survey data released by the Pew Internet and American Life Project, 55% of Americans aged 12-17 have created a profile on Facebook, MySpace, or another social networking website. Almost all of the teens that use these sites say they do so to keep in touch with friends (including those whom they often see in person) and to make plans with them. The conversations extend to other Web services. Twenty-eight percent of teens say they blog, and many of them — especially those who already have social networking profiles — like to leave comments on others’ blogs. Even posting a video or digital photograph “often starts a virtual conversation” through the commenting features of such services.22 This trend points to the fact that the Web is more than just on-demand distribution channel for video. The Internet lets audiences and organizations link, categorize, comment on, rate, map, tag, buy, sell, market, and edit video in ways that very few people imagined just five years ago. Moreover, the Internet has eroded the control of the television and film industries and traditional “gatekeepers” who work for them — writers, reporters, editors, publicists, publishers, etc. The decline of industry power and control goes beyond the ability of viewers to visit online forums to criticize a television news network’s supposed political bias, read film blogs written by unpaid amateur film critics, or apply descriptive

John B. Horrigan, “Broadband: What’s All the Fuss About?” Pew Internet & American Life Project, October 18, 2007. Amanda Lenhart, Mary Madden, Alexandra Rankin Macgill, Aaron Smith, “Teens and Social Media.” Pew Internet & American Life Project, December 19, 2007. 9

21 tags to the official websites of Hollywood stars. Members of the public have been transformed into an army of self-propelled video producers with an international audience, thanks to easy-to-use software tools, cheap consumer electronics, and the widespread availability of broadband Internet connections. The video they produce tends to consist of home movies, simple dramas and humor, and amateur recordings of live events. However, some of the content is entertaining enough to attract large numbers of viewers. On a recent evening, the dozen featured videos on the front page of YouTube had between 135,530 and 2,237,096 views apiece.23 Other content, while amateurish, is compelling to small numbers of people. An example are the home-made music videos and live-action plot recreations based on the movie Cars, and readily available on YouTube. These clips do not meet Hollywood production values, clearly violate copyright law, and conflict with the marketing and public relations campaigns maintained by Disney/Pixar, yet are a minor hit with a small group of fans who are starved for Carsrelated video content. The audience for an average amateur clip on YouTube might consist of just a few dozen people — a manifestation of the so-called “Long Tail” niche consumption pattern that characterizes Internet content.24 The strength of the Long Tail becomes apparent when one considers the millions of clips that are available on YouTube or other video-hosting sites. Most clips have a few dozen or few hundred views, but the aggregate audience is actually quite large, numbering in the millions or tens of millions

135,530 views for “Drunk History vol. 1 - Featuring Michael Cera” and 2,237,096 views for “The Original Human TETRIS Performance by Guillaume Reymond.” Data gathered at 10:20 pm on January 7, 2008. Chris Anderson, “Long Tail 101.” The Long Tail: A Public Diary of Themes Around a Book. Available from 10


of people. Video created by the masses is now competing with the professional video produced by entertainment industry. In the news industry, amateur video footage provides a different sort of competition. Members of the public not only happen to witness news, they often gain access to people and places that broadcast news professionals cannot or will not see. They are able to capture vivid, on-scene accounts of major and minor events. The Zapruder and Rodney King home movies were early examples of this movement. Then, the devices were relatively expensive and there was no way to distribute the video to a wide audience, except through traditional media outlets such as television news. Now, cheap webcams, video cameras and mobile phones with built in cameras make it possible for practically anyone to record news events. The Internet lets them distribute the footage to a huge audience, and lets them bypass traditional gatekeepers, their professional editing requirements, and ethical codes. The footage they shoot is raw and real. It can be brutally honest and compelling, but also provocative and biased. The December 2004 Indian Ocean tsunami was a watershed moment in this respect. For the first time, global awareness of a major news event was shaped in large part by footage shot by amateurs and distributed via the Internet. The footage was disturbing, but captured the scope of the destruction far more effectively than broadcast news outlets, which had no reporters on scene when the waves first struck the beaches. Despite the rise of amateur video and the new modes of distribution and discussion, the Internet and computer technologies have not been able to change the fundamental character of video. Whether someone watches video on a television screen, or plays it on YouTube, video is a linear, passive experience, designed to be watched


from beginning to end without alterations or input from the audience. For Web video, interactivity is limited to tangential content — the text links in the navigation column, the comment field below the Flash video player, the icon-based ratings systems, and the offsite commentary on blogs and discussion boards. The video itself has none of these features. Objects on the video screen are not linked. An audience member cannot easily reshoot it, to make it more to his or her liking. What the viewer sees depends upon whatever lit subject or scenery passed in front of the lens, and whatever creative choices the people controlling the camera and editing the footage decided to apply. This has always been the fundamental character of video. In this sense, a two-minute clip of an Independence Day parade on YouTube is not much different than Fred Ott's Sneeze, an 1893 kinetoscope film produced by Thomas Edison showing one of his employees sneezing.25 The failure of video, or new video, to move beyond a static, linear storytelling device does not mean Web video is doomed. It has a healthy future, as experimentation with formats continues and more members of the public learn to use cameras, editing software, and Internet publishing tools. In addition, video is the best tool to accomplish certain tasks, or tell certain stories — such as documenting nature, showing news events, and recording living people. Video will also benefit from sophisticated applications that use metadata, descriptive pieces of information assigned to individual pieces of content by humans or software programs. For instance, a video clip stored on my computer may

Mary Hanlon, “Movie Audiences, Movie Myth: Early Cinema as Invention, Entertainment, Instruction.” Early Films, The Silent Western: Early Movie Myths of the American West. Available from movie.htm 12


have metadata that identifies the make and model of camera used to shoot it, the date it was created, and the dimension of the frame in pixels. I may further “tag” it with simple descriptive labels that help me categorize it “home,” “kids,” and “Fido.” If I post it on the Web, friends and family members may add their own tags: “funny,” “cute,” “Golden Retriever.” This data may help other people’s Web searches and online activities — for instance, someone searching for pictures of Golden Retrievers may find mine, and then republish it on her blog post about cute Golden Retrievers. It is through the power of metadata and tagging that a YouTube clip of a new electronic gadget can get hundreds of thousands of views in just a few days. Two emerging metadata applications that will help audiences more precisely find and use video content are geotagging and autotagging. Geotags are geospatial data in a file or software program that identify the location of some object, such as a building or person. Some cameras with built-in Geographic Positioning System (GPS) devices can automatically geotag images, which can later be associated with addresses on a map or searched more effectively (“find all photos taken in the 02166 area code”). Autotagging is the automatic application of descriptions to a piece of content, without human review. Penn State researchers Jia Li and James Z. Wang have developed software that can be trained to automatically recognizes the objects in images, and apply metadata to them.26 Once this technology is applied to moving images, it will be possible to more effectively organize video content and design online applications that let people view and use video in a profoundly different manner than we use it now. This goes beyond simply entering a term in a search engine and finding the most closely matching videos; it will enable video Jia Li and James Z. Wang, Automatic Linguistic Indexing of Pictures - Real Time. Available from 13

to be more precisely integrated into the other software applications that will operate in our homes, classrooms, and places of work. Imagine generating a personalized report for a family trip to the zoo that displays recent amateur video of new exhibits and live camera feeds of the traffic situations along two potential routes. Metadata will make this possible. Nevertheless, I believe a family of graphics technologies will eventually overshadow video and realize the true interactive potential of moving images accessed via the Internet. The technologies employ three-dimensional computer-generated environments. These environments are not science fiction, or obscure laboratory experiments — they are already widely used in certain industries, as well as in the home and over the Internet. In addition, they rival video for clarity and visual beauty, allow creative options not possible with video, can be customized according to audience preferences and situational factors, and can enable social interaction, cooperation, and competition. In the coming years, new formats and tools will be made available to audiences and content creators, further accelerating the adoption of computer-generated environments and ensuring its dominance over other Internet media formats. What are computer-generated environments? “Virtual reality” is an alternate term that many people know, but I am reluctant to use it here. It carries with it several misleading connotations, and does not necessarily include some of the formats that I believe will play an important role in the future. The concept dates to the 1950s and 1960s, when Ivan Sutherland envisioned computer technologies being used to “render sensations that would seem real to their recipients.”27 Programmer Jaron Lanier coined


the term “virtual reality” in the 1980s, but admits that virtual reality is a “somewhat broad idea” with no fixed definition.28 Thanks to a wave of media hype and imaginative Hollywood fictionalizations in the early 1990s, many members of the public associate virtual reality with special goggles and wired gloves that allow people to manipulate virtual objects in a digitally rendered, three-dimensional space. However, this excludes other 3D environments that do not require gloves and headgear. Edward Castronova additionally noted that “virtual” and “reality” are themselves loaded terms when used to describe simulated environments that are driving “real” experiences based on artificial sensory input. In his book, he avoided this semantic minefield, instead using the term “synthetic worlds” to describe persistent, interactive 3D spaces simultaneously accessed by large numbers of people.29 I prefer “computer-generated environments.” It is more inclusive than “synthetic world” or “virtual reality.” It encompasses any graphics technology that displays computer-generated 3D representations of real, simulated, or imagined environments, and allows users to control motion, perspective, and elements within these environments. It also avoids the semantic issues that Castronova described. However, computer-generated environments do not include certain types of computer-generated imagery (CGI) and 3D effects, such as static 3D images (e.g., a 3D model of a car engine displayed as a still

Edward Castronova, Synthetic Worlds: The Business and Culture of Online Games (Chicago, The University of Chicago Press: 2005), 287. Janice J. Heiss, “The Future of Virtual Reality: Part Two of a Conversation with Jaron Lanier.” Articles and Tips, The Sun Developer Network. Available from
29 28


Castronova, 287-294. 15

image) or linear narratives made with 3D animation, such as Shrek, M&Ms television advertisements, and some children’s television programs. There are many examples of computer-generated environments in the workplace. The U.S. military has been one of the most active users of such tools. Tank crews in the early 1980s learned how to target their cannons using a simple 3D simulator based on the popular Battlezone arcade game. Pilots have used flight simulators for decades, and the Army offers a free, 3D video game called “America’s Army” over the Internet as an interactive recruiting advertisement and simple training tool.30 In the business world, computer-generated environments are widely used in architecture and industrial design, as well as in several science-related fields. Since the 1980s, the drafting program AutoCAD has been used to design buildings, vehicle parts, and other products, with current versions supporting “3D walkthroughs” and various methods of viewing objects from multiple perspectives.31 General Motors used AutoCAD and several other construction-oriented 3D modeling applications to build a 2.4-million square foot plant in Lansing Delta Township, Michigan. The software tools helped GM complete construction 5% to 8% under budget and 25% ahead of schedule, by letting architects, builders, and plant managers plan the layout of the facility and all of the equipment and infrastructure before the foundation was poured.32

Harold Kennedy, “Computer Games Liven Up Military Recruiting, Training.” National Defense, November 2002. Available from http://www.nationaldefensemagazine. org/issues/2002/Nov/Computer_Games.htm. Shaan Hurley, Unofficial AutoCAD History Pages. Available from



Outside of the workplace, video games are the oldest and most popular type of computer-generated environment used by the public. A significant portion of the population has grown up with them, and among younger people — the so-called “digital natives” who have never known life without personal computers, broadband Internet connections, and 3D games — game play is pervasive. As noted by John Palfrey, not all young people can be considered digital natives, but people who are “born digital” are more likely to interact with such technologies as digital natives.33 Battlezone, mentioned earlier, let players control a tank in a simple 3D environment consisting of green polyhedrons, a distant mountain range, and a neverending assault of enemy tanks. Other 3D games from the early 1980s let players wander through dungeons or castles, killing monsters and gathering treasure. The graphics of these games, while not sophisticated, introduced millions of people to computergenerated environments and the concept of doing things — manipulating objects, completing missions, and sometimes cooperating with others — in simulated, threedimensional spaces. In the mid-1990s, players were exposed to more sophisticated graphics and networked play, either over local-area networks or the Internet. Another important development during this period was the rise of “modding,” which let players of 3D game titles such as Doom modify characters and missions to suit personal preferences, or make gameplay more interesting. Game studios or talented

Robert Mitchell, “Field Report: GM builds on 3-D model.” Computerworld, September 11, 2006. Available from command=viewArticleBasic&articleId=112739. John Palfrey, “Born Digital.” John Palfrey from the Berkman Center at the Harvard Law School, October 28, 2007. Available from palfrey/2007/10/28/born-digital/. 17


player/programmers developed the modding software, which could be downloaded from official game websites or fan sites. Now, an estimated 38% of U.S. adults and 81% of children people ages four to 17 play video games,34 ranging from 3D games based on sports (Madden NFL ’08), futuristic combat (Halo 3), and even real life (The Sims). While these games are popular as single-player pastimes or entertainment for small groups of people, an interesting new game format has begun to attract large numbers of players. Massively multiplayer online role-playing games (MMORPG) allow thousands of geographically dispersed people to simultaneously play in a persistent, shared, online world, usually built around a medieval setting with lots of group campaigns and missions. These environments allow a high degree of independence, creativity, and customization. In Battlezone, the player was a standard tank, it was always night, and the starting level and location were always the same. Now, a World of Warcraft player can choose his or her sex, race, class, continent, gaming server, default language, guild, and numerous other variables — explained in great detail in a 208-page guide.35 There are more than nine million active World of Warcraft subscriptions.36 Second Life, a socially oriented virtual world accessed through the Internet, gives “residents” even more extensive options to shape their own characters and in-world

Alexander Wolfe, “Who's The Child Now, Or Wii (Why) Most Adults Don't Play Video Games.” Wolfe’s Den, Information Week, December 2, 2007.
35 36


World of Warcraft Game Manual. Blizzard Entertainment, 2004.

“World Of Warcraft Surpasses 9 Million Subscribers Worldwide.” Press Release. Blizzard Entertainment, July 24, 2007. Available from press/070724.shtml. 18

experiences. Using simple 3D building tools, they can model buildings, clothing, vehicles, furniture, landscapes, plants, animals, and other objects. If it is nighttime in their part of Second Life, and they cannot see, they can “force sun” to make it daylight — even if others around them still see the same nighttime features. They can also customize the appearance of their “avatars,” or personal 3D characters. Changing one’s face to have a big nose, red eyes, and a mullet involves clicking through a few menus and adjusting sliders that control nose size, eye color, and rear hair length. A resident can even change his or her head to that of a cat, dog, or other animal.37 Both World of Warcraft and Second Life encourage socialization and cooperation through shared missions or shared interests, whether it is conquering a monster-filled cave system in World of Warcraft and splitting the treasures within, or building shops and other virtual facilities for a Brazilian community in Second Life. In most game-oriented virtual worlds, it is impossible to reach certain areas of the gaming world and achieve high point levels without cooperating with other players and developing teams that most effectively draw upon the various skills of different types of players. In Second Life, ambitious building projects require groups of avatars, and enjoyment is often derived from interaction with friends and strangers. As with the text-based Internet, interaction in virtual worlds is not required, but it makes for richer and more rewarding social experiences. In addition, the mechanics of socialization in these worlds parallel the tools used in the text-based Internet, such as buddy lists and simple text messages. For someone who has already been exposed to 3D games, instant messaging, and social


These avatars are referred to as “furries.” 19

networking, it is not difficult to make the leap to using an avatar, communicating in group chats, and joining a guild in an MMORPG. The 3D graphics for World of Warcraft look cartoonish, and Second Life’s graphics look even more primitive — avatars move stiffly, textures look blurred, and walls and other features often do not render at peak times or in locations where lots of avatars congregate. These issues will gradually disappear as the technical infrastructure of such services improves, and more advanced 3D hardware and software enters the marketplace. Moore’s Law, a hypothesis put forth by Intel engineer Gordon Moore in 1965, stipulates that the number of transistors on a chip will double every two years.38 It was originally envisioned for predicting the increase in the power of computer processing units (CPU), but can be applied to advances in the abilities of graphics processing units (GPU) produced by specialized manufacturers such as nVidia. Every few years a new generation of CPUs and GPUs is released to market, increasing the processing power of desktop computers, gaming consoles, and portable devices. These advances allow game designers to strive for the Holy Grail of the gaming industry — achieving advanced 3D effects that approach photorealism: The goal for many developers was now to create an experience identical to reality: rippling waters, flowing hair, shifting wind, dynamic moving lights, reflections on moving objects, facial lip syncing, varied character animation and emotions, and real physics and collisions.39

Gordon E. Moore, “Cramming More Components Onto Integrated Circuits.” Electronics, Volume 38, Number 8, April 19, 1965. John Hight, Jeannie Novak, Game Development Essentials: Game Project Management (Clifton Park, NY: Thomson Delmar Learning, 2008), 17. 20


The drive to photorealism in computer-generated environments potentially involves sampling real-life objects. This is already done for 3D textures — instead of painstakingly recreating the rough ochre color of a brick, a designer can take digital photographs of the six sides of a brick and map them onto a 3D mesh in a software application. There are also technologies for capturing real-life actions, such as human movements, and applying them to models in 3D animation or computer-generated environments. Microsoft is now developing software called Photosynth that pastes geotagged photographs of a building or object onto a 3D model associated with the same geospatial coordinates. An application called Fotowoosh turns 2D pictures into simulated 3D images. Such applications open up the possibility of computer-generated environments or game worlds that mirror real-world places and people.40 Another gaming technology that should be considered in any discussion about the development of computer-generated environments is the narrow application of artificial intelligence used to drive the behavior of monsters, enemies, and non-player characters (NPC) that populate video games. For years, game AI has been based on programmed logic — e.g., if an avatar in World of Warcraft opens a certain dungeon door, a troll will launch an attack. In recent years, developers have been experimenting with more complex game AI that actually “learns” from environmental variables, or is trained by observing the behavior of human players. Jeff Orkin, a game developer and researcher at the MIT Media Lab, has developed an online 3D game called The Restaurant Game that teaches a game AI how to interact with human players, by recording the interactions of

Ian Lamont, “Transforming 2D photos into 3D models.” The Digital Media Machine, Computerworld, April 24, 2007. Available from http://blogs.computerworld. com/node/5418. 21


thousands of real human volunteers playing the game online. Orkin says that this technology can potentially be applied to virtual worlds, as a way to make the actions of NPCs more realistic to human players or residents.41 Besides gaming and virtual worlds, another popular application of computergenerated environments involves simulations of buildings and representations of realworld locations. It is now possible for potential homeowners to “tour” a 3D simulation of a condominium development. Millions of vehicles in the United States have small computers that capture location data from GPS satellites, and display a live, threedimensional representation of their locations and nearby streets. Google Earth, a software program that uses geospatial information, satellite images, and 3D graphics, lets users simulate flying over or through cities and geographical features. Google Earth users can also geotag two-dimensional photographs, and map them on a corresponding Internetaccessible 3D map. As of mid-2006, an estimated 72 million Americans had taken “virtual tours” of another location online, with more than five million taking such tours on a typical day.42 The computer-generated environments described above, and the functionality available to users within them, are impossible to recreate with standard video technologies. And why should they? Computer-generated environments and video are oriented toward different applications. However, this may soon change, as the digital

“The Restaurant Game: New forms of Artificial Intelligence for Immersive Education.” Jeff Orkin, MIT Media Lab. Presented at Immersive Education Day, Harvard Interactive Media Group/Harvard Graduate School of Education, December 8, 2007. Xingpu Yuan and Mary Madden, “Virtual Space is the Place.” Pew Internet & American Life Project, November 27, 2006. 22


natives begin to enter maturity, 3D graphics achieve photorealism, and new Internetbased software tools open up an expanded universe of online experiences that overlap with those currently provided by video. Audiences and content creators will discover that computer-generated environments can not only duplicate many types of video programming, but also can provide customization, interactivity, and even social options that amplify the ability of moving images to entertain and inform. In recent years, there have been a number of experiments that indicate the direction in which computer-generated environments are heading, and how they will compete with and eventually displace video. Machinima — short for “machine animation” or “machine cinema” — is one example. It involves the use of 3D animation tools to make dramas, music videos, and other entertainment-oriented content. Professional CGI and 3D animation tools have been used by Hollywood studios for decades, but machinima is largely a grassroots phenomenon that relies upon inexpensive technologies to create and distribute content. The content creators are individuals or small teams, the tools are free or cheap game modding engines or games, and the distribution platform is usually the Internet. One example is Red vs. Blue. Starting in 2003, and ending 100 episodes later in 2007, a small team of writers and programmers used the Halo game engine to create a comedy series depicting the hapless antics of two opposing squads of soldiers.43 The humor was juvenile, the voice actors were amateurs, and the 3D graphics were simple, but the series became a cult hit on the Internet and was eventually distributed via DVD.

“Red vs. Blue: A Machinima Series Based on Halo.” Available from 23


Another machinima, The French Democracy, was created in 2005 by Alex Chan, a French industrial designer who had no previous experience with video production. He wanted to explain the causes of the urban riots that tore through France that summer, and he used The Movies — a $70 PC game — to create a drama that described the conditions and factors that he believed were responsible for the riots. The quality of the animation was primitive, and Chan had to rely on subtitles and music instead of voice actors for audio, but the message was powerful. The 13-minute long clip was downloaded by tens of thousands of viewers44 and generated a great deal of mainstream press reaction. Machinima has barely made an impact on public awareness, but that will eventually change as the quality of the graphics in machinima productions approaches photorealism, high-quality synthetic speech synthesizers are developed, software tools improve, and amateur writers/content creators become more skilled at scripting and programming. Further, while current machinima are like video in that they tell linear narratives, the descendents of this technology will allow customization and interaction. For instance, a machinima might let viewers preselect the appearance of the avatar stars, the sounds of their voices, the location of the dramas, and other plot elements. So, I may opt to watch a soap opera machinima in the default mode — a standard plot involving a love triangle between two men and a woman in Los Angeles. However, another viewer may want to see a love triangle with two women and a man in a small town in the Rockies, change the name of the lead male character to “Walter,” set the appearance of both of the women to

The French Democracy had 31,102 views in QuickTime format, and 14,451 views in Windows Movie format as of January 8, 2007, on the website. 24


blondes, and restrict close-up shots to less than 3% of the total plot length. A third viewer in Japan may transfer the story to Tokyo, and have all of the characters speaking in Japanese. Such options will be possible with the more advanced development tools and user interfaces. Another possibility for future machinima is to let viewers bring their own avatars into the story. Most 3D games already have a background story and a plot that players are supposed to follow. With the exception of some MMORPGs, it is seldom possible to deviate from the pre-arranged mission, let alone play a part in a love triangle. Flexible game AIs and plot templates could make interactive machinima a reality. Conceivably, groups of friends could join each other in a drama, helping to support the plot in some way — for instance, distracting or disabling a character who seeks to harm the protagonist. Or, a historical drama or documentary depicting the start of the American Revolution could also serve as a virtual classroom that lets elementary school explore the Boston of the 1770s. The application could also let students interact with the period avatars, whose reactions would be partially driven by advanced game AIs. In terms of news and documentaries, computer-generated environments cannot replace compelling video footage of live events, natural phenomena, and recordings of personal moments. However, computer-generated environments may be used for realistic simulations when video is not available. For instance, they may let viewers see a two-car accident from multiple angles — including the points of view from each of the drivers’ seats starting five seconds before the collision — based on the geospatial data gathered from the police report and other sources. Or, they could let students in an astronomy class see a simulated asteroid impact on the moon in visible light or infrared light, from 50,


100, or 1,000 kilometers away. Author and inventor Ray Kurzweil predicts computergenerated environments may one day be overlaid upon our real-world views, through eyewear that displays text, icons and other information corresponding to objects in our field of view:

… If you look at someone, little pop-ups will appear in your field of view, reminding you of who that is, giving you information about them, reminding you that it’s their birthday next Tuesday. If you look at buildings, it will give you information, it will help you walk around. If it hears you stumbling over some information that you can’t quite think of, it will just pop up without you having to ask. The “augmented reality” described by Kurzweil would reduce our dependence upon information delivered through computer monitors and small liquid crystal displays on mobile phones. It would also rely upon geotags, facial recognition software, speech recognition technologies, and a brain-machine interface that lets people input information or commands into these systems without speaking or pressing keys. Computer-generated environments could also replace live human anchors and newsrooms. Most anchors simply read from scripts that either describe the footage that is being shown on the screen or introduce segments by reporters in the field. This method of presenting news is expensive and inflexible. Anchors are expensive. They can only work at pre-arranged times throughout the day. They can get sick. Some viewers do not like the appearance of a certain anchor. Avatars and software can remedy these shortcomings, and allow a viewer to customize the appearance of his or her anchor, the type of news the anchors narrates, and the time the newscast starts and finishes. Developers at Northwestern University’s Intelligent Information Laboratory have created a prototype


application called News At Seven that features an avatar anchor reading news from eight different categories — Business, Entertainment, Health, Politics, Science, Tech, U.S., and World news. News At Seven is delivered over the Web, and is automated. Scripts, still images, and video are pulled from other online news sources.45 It can be launched at any time of the day or night. Until October of 2007, News At Seven used avatars from the Half Life 2 game engine, but the high processing requirements associated with generated a talking, 3D avatar on demand forced the designers to switch to simple, 2D avatars for the limited beta launch of the application.46 In the future, similar news applications could allow 3D avatars to be customized to mimic real news anchors (Walter Cronkite, Katie Couric, Jack Williams), other real people (someone’s father, a favorite teacher, a politician), characters based on a set of self-selected attributes, or one’s own avatar. The avatars might be seated in a simulated newsroom, or could be moved to a computer-generated environment that mirrors the reallife location where the news that he or she is describing took place. The environment might be based upon geotags and other metadata that were generated by the original reports and video footage. The news itself can also be fine-tuned, based on specific categories, locations, times, and keywords chosen by the viewer. I may choose to have the first half of my newscast consist of developments relating to the New York Stock Exchange in the previous 24 hours. For the second half, I may restrict my anchor to

Kate Goodloe, “Broadcast News Goes Human-Free.” The Wall Street Journal, January 6, 2007. Available from SB116803755568668612-7IG7wBl1Wpezld0friGmB0x1ONM_20070113.html. “News at Seven Beta Launch!” News At Seven Blog, October 29, 2007, Available from 27


reading reports that mention “China” or “Beijing” in the lede and have accompanying video footage sourced from any clip taken in Beijing or Shanghai within the past six hours. Detailed metadata would be crucial to creating such a report. Approximately 10 years ago, Stephens prophesied a mass media environment that would be increasingly dominated by video. Noting the failure of CD-ROMs and other early interactive video technologies such Time Warner’s Qube,47 he foresaw computers playing largely supportive roles, such as adding graphical flavor and creating distribution channels for new video. Even in the current Web 2.0 age, characterized by text-based media such as social networking websites and blogs, many observers still believe video will eventually triumph, thanks to its solid broadcasting track record, strong advertising revenue, and the popularity of online video. Other experts acknowledge that computergenerated environments will be important, but many are unsure what such formats will look like or how people will use them.48 While predicting the future is difficult, it is possible to identify trends based on quantitative research and an understanding of recent developments in computer software, hardware, and networking technologies. I believe many of the predictions outlined above, far from being the realm of science fiction, provide valid insights into the future of mass media. Computer-generated environments and other Internet technologies will not only change the ways in which we interact with each other, they will change the way in which we see our world.

47 48

Stephens, 169, 174.

Janna Quitney Anderson, Lee Rainie, “The Future of the Internet II.” Pew Internet & American Life Project, September 24, 2006. 28

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.