This action might not be possible to undo. Are you sure you want to continue?
Marc Levoy Stanford University
Editors: Lawrence Rosenblum and Simon Julier
he principles of photography have remained largely unchanged since its invention by Joseph Nicéphore Niépce in the 1820s. A lens focuses light from the scene onto a photosensitive plate, which records this information directly to form a picture. Because this picture is a simple copy of the optical image reaching the plate, improvements in image quality have been achieved primarily by refining the optics and the recording method. These refinements have been dramatic over the past few decades, particularly with the switchover from film to digital sensors, but they’ve been incremental. Computational photography challenges this view. It instead considers the image the sensor gathers to be intermediate data, and it uses computation to form the picture. Often, it requires multiple images to be captured, which are combined in some way to produce the final picture. Representative techniques include high-dynamic-range (HDR) imaging, flash/no-flash imaging, coded-aperture and coded-exposure imaging, photography under structured illumination, multiperspective and panoramic stitching, digital photomontage, all-focus imaging, and light-field imaging. In this article, I look at the lack of experimental platforms for computational photography (that is, cameras that are programmable), and I talk about one possible solution—the Frankencamera architecture designed in my laboratory at Stanford as part of our Camera 2.0 project (http://graphics. stanford.edu/projects/camera-2.0).
Richard Szeliski of Microsoft Research (http://scpv. csail.mit.edu). Since then, the field has grown enormously. Of the 439 papers submitted to Siggraph 2009, nearly 40 percent were about 2D imaging applied to photography or video, making this area larger than the traditional areas of modeling, animation, and rendering. The field has also evolved; it now spans computer vision and applied-optics topics, some of which were active research areas before anyone applied the term computational photography to them. Since 2009, the field has had its own yearly conference with peer-reviewed papers—the International Conference on Computational Photography—and it will soon have its first monograph.1
Why Can’t I Buy a Computational Camera?
Despite this buzz, a demon haunts the computational photography community. Besides panoramic stitching, few of these techniques have found their way into commercial cameras. Why is this so? I believe four factors are involved.
First, the camera industry is secretive to an extent unfamiliar to those of us who grew up in computer graphics. There’s little communication between companies and universities, as there has been, for example, between Silicon Graphics, Nvidia, ATI, and US academic graphics labs. Also, aside from image compression formats (such as JPEG) and color spaces (such as sRGB), the camera industry has few open standards or scholarly publications. Some of this secrecy is driven by culture, but it’s also quietly acknowledged that camera companies are violating each other’s patents. If you don’t disclose your technology, nobody can sue you. While pursuing the Camera 2.0 project, my colleagues and I have run into fear-driven secrecy among not only camera manufacturers but also the
IEEE Computer Graphics and Applications 81
A Bit about Computational Photography
The term “computational photography” has been reinvented several times over the past 20 years. Its current definition stems from a symposium I co-organized in 2005 with Frédo Durand of the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory and
P ublished by the IEEE Computer Society
0272-1716/10/$26.00 © 2010 IEEE
while published computational photography techniques might appear ready for commercialization. permit (or encourage) user-generated content. but profit margins are razor-thin. Google. traditional camera companies are inherently conservative. also constitutes a barrier to entry for new camera companies. When I proposed to a prominent camera manufacturer recently that they open their platform to usergenerated plug-ins. the Apache Web server. these manufacturers have been locked in a “megapixel war. published techniques often aren’t robust . This secrecy makes it hard to insert open source components into the value chain because these chip makers won’t reveal their hardware interfaces. key steps are sometimes missing. Fredo Durand likes to say that “software is the next optics”—meaning that future optical systems will be hybrids of glass and image processing. it isn’t Apple’s fault. manufacturer runs a photo-sharing site with significant penetration into Euro-American markets. So. not a point of departure. and most iPhone applications. Once one of them offers such a feature (which might be HDR imaging). In addition. the rest will join in. key steps are sometimes missing. Traditional camera companies are also uncomfortable with Internet ecosystems.” leading to cameras with more pixels than most consumers need. Besides placing more emphasis on software. driven by market pressure from the increasingly powerful and connected cameras on mobile devices. For example. as of this writing. it typically ends with negotiations and cross-licensing of patents. as it has between Nokia. although several are trying (for example. no camera manufacturer offers an HDR camera. and Apple in the mobile phone space. Although HDR imaging has a long history. even though the latter are not open source. Hardware versus Software Second. Of course. who often issue nonexclusive licenses. so companies are casting about for a new feature on which to compete. no traditional camera 82 September/October 2010 Research Gaps Finally. as has been proven with Linux. When a patent war erupts. metering for HDR. companies. As an example of this conservatism. computational photography represents a threat because its value comes largely from algorithms. computational photography results are being published in journals and either placed in the public domain or patented by universities. The open source software community advocates exactly the opposite strategy—“release early. but these companies treat software as a necessity. although algorithms for HDR imaging have existed since the mid-1990s. except for a few two-exposure models from Fuji and Sony. Starting a new camera company isn’t impossible. This strategy yields reliable products but slow innovation. Nikon’s My Picturetown or Kodak’s Gallery). unless you introduce technology that lets you charge significantly more for your camera. it’s now antiquated. So. traditional camera manufacturers are primarily hardware. Although published computational photography techniques might appear ready for commercialization.Projects in VR companies that make camera processing chips. Instead. Also. This will change soon. making money is difficult. They’ll compete fiercely on that feature. some of them quite broad. they worried that if a photographer took a bad picture using a plug-in. every component in a Nikon or Canon SLR is treated by the company as reflecting on the brand’s quality as a whole. iPhone users know that if a third-party application crashes. the research community has never addressed the question of automatically deciding which exposures to capture—that is. further raising the entry barrier. This war is finally winding down. and thrive or falter on the quality of their user interfaces. new technologies are refined for years before they’re introduced in a commercial product. digital cameras contain a tremendous amount of software. the Thunderbird mail client. release often. The tangle of underlying patents. These trends run counter to the camera industry’s hardware-intensive. secrecy-based structure. Although this attitude was understandable 20 years ago. partly due to licensing fees. Branding and Conservatism Third. not a single point-andshoot or single-lens reflex (SLR) camera offers a 3G connection (although a few offer Wi-Fi). to the exclusion of others that might be ready for commercialization and could be useful to consumers.”2 This strategy yields fast innovation and (eventually) high quality. not software. which typically involve multiple vendors. he or she might return the camera to the store for warranty repair. So. It’s worth noting that. This omission undoubtedly arises from the lack of a programmable camera with access to a light meter.
In these courses. denoising. programmable camera. Again. these platforms run real operating systems. transmitting.stanford. which vendors have begun opening to third-party developers. Elphel. at the time. You can change the aperture. in-camera algorithm for aligning successive frames captured by a Nokia N95 smartphone’s video camera. IEEE Computer Graphics and Applications 83 Frankencamera: An Architecture for Programmable Cameras My laboratory’s interest in building programmable cameras grew out of computational photography courses we’ve offered at Stanford since 2004. He waved a phone around a physical object to capture a light field. Real-time image alignment is a low-level tool with many immediate applications. The pivotal event for us was a 2006 visit to my laboratory by Kari Pulli. (a) Abe Davis’s system captured. The second phone computed its pose in real time using its camera. on which our cameras are based. and I began developing computational photography applications for commercially available cell phones. Abe Davis developed a system for capturing. enough for everyday photography. I found it frustrating that students could perform computations on photographs but couldn’t perform them in a camera or use these computations’ results to control the camera. such as automated panorama capture and frameaveraged low-light photography. I believe progress has been slow in these areas for lack of a portable. these kits (and the hardware interfaces to the underlying chips) are either completely secret (as with Canon and Nikon) or carefully protected by nondisclosure agreements (such as Zoran and Ambarella). Moreover. and ISO. and displaying a 4D light field (see Figure 1a). transmitted. and displayed a 4D light field. and high-quality audio. Kari. sharpening. displaying the appropriate slice from the light field so that the object appeared stationary behind the viewfinder. Hackers have managed to run scripts on some cameras (see http://chdk. Of course.wikia. a high-resolution display. (b) Derek Chen mounted five camera phones facing backward on his car’s ceiling. Two student projects with programmable cameras. But. but you won’t enjoy trekking to Everest Base Camp tethered to a laptop. shutter speed. on which you could view it by waving that phone. He pointed out that over the past five years. The resulting projects (see http:// graphics. you can always buy a machine vision camera (from Point Grey. He then transmitted the light field to a second phone. in a town hall meeting at the 2005 symposium I mentioned earlier. a Nokia research manager. For example. Another potential source of programmable cameras is development kits for the processing chips embedded in all cameras. the cameras in cell phones have dramatically improved in resolution. optical quality. 3D graphics. I raised this Achilles’ heel of computational photography . and image compression. white balancing. Early Experiments Among our early experiments in this area was a real-time. Other Problems Although most SLR cameras offer a software development toolkit (SDK). Perhaps most important. com/wiki/CHDK_for_Dummies).edu/courses/cs448a-08-spring) demonstrated the value of a programmable camera platform and the added value of having that platform portable and self-powered. we loaned N95s to every student. except for Texas Instruments’ OMAP (Open Multimedia Application Platform) platform. camera phones offer features that dedicated cameras don’t—wireless connectivity. and photographic functionality. Flash/no-flash imaging is still an active research area partly because existing techniques produce visible artifacts in many common photographic situations. With Nokia funding. But. creating a virtual rearview mirror with no blind spots.3 In my spring 2008 computational photography course. these SDKs give you no more control than the buttons on the camera. but you can’t change the metering or focusing algorithms or modify the pipeline that performs demosaicing. Mark Horowitz (chair of Stanford’s Department of Electrical Engineering). or others) and program everything yourself. but these scripts mainly just fiddle with the user interface.Image on screen Card with markers (a) (b) Figure 1. it wasn’t clear how to address it.
Instead. This makes it easy to program . This means that any change in settings will take effect at some unpredictable time in the future. then build a reference platform for it. In the experiment shown in this figure. Frankencamera F1 (see Figure 2a) employed an Elphel 353 network camera. 84 September/October 2010 The Frankencamera Architecture After two years of building cameras and forcing them on the students in my computational photography courses. Derek Chen mounted five N95s facing backward on his car’s ceiling (see Figure 1b). Chen could blur out the posts supporting the car’s roof. and set up for the next image. trigger a frame. Stanford PhD student Kayvon Fatahalian advised us to first build some cameras and see what their problems were. Because we stitched together our early prototypes from parts of other devices. wait for it to pass through the pipeline. thus solving the bandwidth problem. a list of actions to synchronize to the shutter—such as if and when the flash should fire—and a frame ID. and a Canon 10–22 mm EOS lens attached to a Birger EF-232 lens controller. This incurs a latency equal to the pipeline’s length for each image in the burst. The third camera (see Figure 2c) was a retail Nokia N900 smartphone with a Toshiba ET8EK8 5-megapixel CMOS (complementary metal-oxide semiconductor) sensor and a custom software stack. the same 5-megapixel chip used in Nokia N95 cell phones. we began building a succession of cameras. each frame in the pipeline specifies the recommended settings for producing that frame. This approach seemed prudent. The Elphel contains an Aptina MT9P031 sensor. However. while the previous image is still exposing and the one before that is being postprocessed. To capture a burst of images with different known settings. we called them “Frankencameras. separating the viewfinder from the camera forced us to burn most of the system’s bandwidth feeding video to the viewfinder screen. a Nokia 800 Internet tablet as a viewfinder. Frankencamera F2 (see Figure 2b) employed a Texas Instruments OMAP 3 system-on-a-chip mounted on a Mistral EVM (evaluation module). be displayed live on the rearview mirror. To address this problem. thereby providing a virtual rearview mirror with no blind spots. the key elements of our architecture began to take shape. in 2007. Because the aperture’s baseline was large. Attached to the Mistral is the Elphel 353’s sensor tile and hence the same Aptina sensor. we planned to define a programmable-camera architecture. Our most important insight was that to provide a live electronic viewfinder. he blurs out wide strips of blue tape arranged on the rear window in a way that partly obscured his view. letting us encase the entire camera in a laser-cut plastic body. and there’s no way to know which image was captured with which settings. (a) Frankencamera F1. or the APIs connecting the camera to the computing are too restrictive. often dead ones. you must clear the pipeline. because either the cameras’ sensors or optics aren’t good enough. set up for the first image. A Succession of Prototypes Despite these successes. in principle. Unfortunately. Figure 2 shows these prototypes. and (c) a Nokia N900 F— a retail smartphone with a custom software stack. there are computational photography experiments that we can’t implement on today’s camera phones. A gallery of Frankencameras. Note that you can barely see the tape in these synthetic pictures. the computing resources aren’t powerful enough. arranged in chronological order. before trying to define an architecture. So. The Mistral has an integral touch screen display. digital cameras must have a pipeline of images in flight—the sensor is being configured to capture one image. as anyone who has driven an SLR from a laptop knows painfully well. Our solution was to retain a pipeline but treat the camera as stateless.Projects in VR (a) (b) (c) Figure 2.” after Frankenstein’s monster in Mary Shelley’s novel. Fortunately. He combined the video streams to form a synthetic aperture picture that could. (b) Frankencamera F2. current digital cameras have only a single state representing the camera’s settings.
(This program.) We graded the assignment on the accuracy with which they could focus on a test scene we provided and on focusing speed in milliseconds. the Video for Linux 2 (V4L2) software layer on which our system is built has a pipeline with fixed-sized buffers. and schools offering vocational training in computational photography. It’s exactly the sort of ad hoc experimentation we had hoped our architecture would enable. ISOs. more powerful flash to fire once at the end of the exposure produced the three cards’ crisp images. Photography is a US$50 billion business—as large as moviemaking and video gaming combined. A weekend hack. thereby making its camera programmable. add-ons. it was a weekend hack in which PhD student David Jacobs mounted two Canon flash units on the Frankencamera F2 (see Figure 3a) and programmed them to obtain bizarre lighting effects on playing cards thrown into the air (see Figure 3b). This should foster a computational photography software industry. and hardware for computational cameras. I expect to see a dizzying array of cameras. David Jacobs programmed one flash to strobe repeatedly during a one-second exposure. (a) Frankencamera F2 with two flash units attached. and FCam. gyroscope-based hand-shake detection. (a) (b) Figure 3. That said. and fiddled with by hobbyists and photographers. although small. My hope is that. published in conferences and journals. it’s only one step on a journey. What’s cool about this application is that. By the time this article appears in print. Our goal is to create a community of photographer programmers who develop algorithms. how useful is FCam? In my 2010 computational photography course.a burst of images having different exposures. applications. we’re going to need more university-trained engineers in this area. (b) Using our FCam API. software. We then programmed six applications: HDR viewfinding and capture. facilitates many computational photography algorithms. An assignment such as this would have been impossible before FCam. we implemented it on the Frankencamera F2 and our modified Nokia N900. FCam’s Pros and Cons So. Bootstrapping a World of Open Source Cameras Defining a new architecture isn’t a goal. automated acquisition of extended-dynamic-range panoramas. my favorite application was none of these six. my laboratory will loan a Frankencamera F3 and 10 Nokia N900s to any US university that uses them to teach a computational photography course. Jacobs took only a few hours to assemble the hardware and write the code. we ran up against limitations in our reference platforms’ hardware and low-level software. in turn. although our API supports changing spatial resolution (the total number of pixels) on every frame in a video stream. To demonstrate our architecture’s viability. is IEEE Computer Graphics and Applications 85 . lowlight viewfinding and capture. including plug-ins for cameras. and even different spatial resolutions or regions of interest in the field of view. camera technology can be studied at universities. Headed out to shoot your daughter’s soccer game? Don’t forget to download the new “soccer ball focus” application everybody’s raving about! When I walk through the Siggraph 2020 exhibition. we presented these in Finland at the course’s conclusion. by December 2010. which will begin in 2011. once programmable cameras are widely available to the research community. our architecture consisted of a hardware specification. When we finish building Frankencamera F3 (based on the LUPA-4000 sensor).4 As it happens. an API with bindings for C++. we’ll have published our architecture and released the source code for our implementation on the Nokia N900. We’ll have also released a binary you can download to any retail N900 (without bricking the phone). photography asset management systems. we loaned Nokia N900 Fs to every student and asked them to replace the autofocus algorithm. focus settings. Programming the other. I’d like to see our Frankencameras used for not only research but also education. Two students submitted algorithms that were better than Nokia’s. and rephotography. has a movable lens.4 For example. we’ll make it available at cost to anyone who wants one— with luck. To seed this growth. Speaking of vocational training. If we succeed in opening up the industry. Changing resolutions requires flushing and resetting this pipeline. While implementing it. In the end. foveal imaging. This. FCam isn’t perfect. beginning with Frankencamera F2 and FCam. making it slower than we wanted. (The phone’s camera. a software stack based on Linux. producing the card trails.
A good example is video stabilization. OpenCL. However. would take too long to run on the camera or burn too much battery power. If we want to make computational photography robust. For introductory textbooks. breezes blow.” It’s a clever strategy. once researchers start doing this in a portable camera. experimental platforms with an open API (Silicon Graphics + OpenGL). video summarization. Computational Video Although this article emphasizes still photography. my students and I have written narrated Flash applets on the technical aspects of photography (see http:// graphics. we believe this situation will change. To fill this gap in my courses at Stanford.) In thinking through what it might take to triple the number of universities teaching computational photography. the experimental platform’s quality is paramount. you could design a domain-specific language akin to CUDA 86 September/October 2010 P rogress in science and engineering is a dialogue between theory and experimentation. Representative techniques here include video stabilization. and hands shake.stanford. the computational photography community is also interested in video. Feiner. you can imagine many applications that. spatiotemporal upsampling. but they don’t cover its technical aspects. In this dialogue. To find one. We built our Camera 2. not one how-to book on photography contains a formula for depth of field. As a glaring example. we must focus on single-frame solutions. then complain about speed. 1992). van Dam. and image signal processors—these aren’t designed with computational photography in mind. and object insertion and removal. you must turn to an optics monograph such as Rudolf Kingslake’s classic Optics in Photography (SPIE. they’ll discover that few photographic situations lend themselves to this solution—people walk. it’s interesting to compare this nascent field to the older. a special challenge is to ensure that the processing is temporally coherent— that is. and a high-quality publication venue (Siggraph). Somebody needs to write one. You can also imagine using images from photo-sharing sites to help control the camera—my photo albums prove that my dog is black. and Foley. Although cameras and cell phones have other computing resources—GPUs. To control such a processor. Once cameras are connected to the cloud. foliage moves. but better suited to computational photography. However. (Compute Unified Device Architecture). we envision photographic image processors that use software-reconfigurable execution stages. we’ll need the same three ingredients. But why must cameras record the world in discrete frames? Isn’t this a holdover from film days? Using novel optics and sensor designs we can measure the light falling on the focal plane when and where we wish. however. for the foreseeable future. they typically do their image processing on the CPU. just make sure my video is stable when it’s posted to YouTube. it doesn’t introduce jarring discontinuities from frame to frame. Similar to the revolution occurring in graphics chips. Single-Shot Computational Photography Many papers on computational photography take this form: “Capture a burst of images varying camera setting X and combine them to produce a single image that exhibits better Y. and their hardware interfaces are often secret. these aren’t an adequate substitute for a textbook. Grand Challenges Suppose the community succeeds in making cameras programmable and teaching a generation of students about camera technology.Projects in VR supported by a grant from the US National Science Foundation and a matching gift from Nokia.0 project on the . light field videography. Here. we’re starting from zero. It’s not their fault. The Frankencamera is a platform and an API. To accomplish this in computational photography. There are good photography books. but it’s only one—we need more. Fortunately. and Hughes).edu/courses/cs178/applets). I conjecture that the ubiquity and quality of university computer graphics curricula are due to three factors: the existence of good textbooks (such as Newman and Sproull. larger field of computer graphics. HDR video. digital signal processors. and continuous versions of still-photography algorithms—for example. this will change soon. Integration of In-Camera and In-Cloud Computing Although high-end cameras don’t yet contain 3G radios. why does my camera’s light meter insist on making her look gray? Shader Languages for Cameras When students write camera applications using FCam. or Adobe’s PixelBender. Good publication venues are already in place. What should these students work on? What are the hard problems “on the dark side of the lens?”5 Here are a few suggestions. I don’t need this to run on the camera.
Contact the department editors at email@example.com Phone: +1 310 836 4064 | Fax: +1 310 836 4067 Japan: Tim Matteson | Email: tm.premise that a tethered SLR isn’t an adequate experimental platform for computational photography. Univ. A. Business Development Mgr.de Phone: +49 202 27169 11 | Fax: +49 202 27169 20 IEEE Computer Graphics and Applications 87 . 5. 2010.edu. 2010. Eurographics Assoc. Adams et al.computer. A. The Structure of Scientific Revolutions. Academic Press. “Novelty ordinarily emerges only for the man who. The Cathedral and the Bazaar.firstname.lastname@example.org Phone: +1 310 836 4064 | Fax: +1 310 836 4067 Europe: Heleen Vodegel | Email: impress@impressmedia. 3. 597–606.ieeemedia@ieee. 4.org Phone: +1 212 419 7578 | Fax: +1 212 419 7589 Southeast: Thomas M.) Michael Faraday. The Camera 2.S. As Thomas Kuhn writes in his seminal study. Tumblin. 2008. AK Peters.org Phone: +1 847 498 4520 | Fax: +1 847 498 5911 PAGE Cover 4 Northwest/Southern CA: Tim Matteson | Email: tm.org Phone: +1 732 772 0160 | Fax: +1 732 772 0164 New England: John Restchack | Email: j. 29. Advertising Dir. O’Reilly. “Viewfinder Alignment.org. N. N. Marc Levoy is a professor in Stanford University’s Computer Science Department.org Phone: +1 732 772 0160 | Fax: +1 732 772 0164 US Central: Darcy Giovingo | Email: dg. Advertising Coordinator Email: manderson@computer. Graphics. the devices available to him in the early 19th century were neither flexible enough nor accurate enough for his delicate experiments in electromagnetism.ieeemedia@ieee. R. Flynn | Email: flynntom@mindspring. Gelfand. article 29.org Phone: +1 732 772 0160 | Fax: +1 732 772 0164 Europe: Sven Anacker | Email: sanacker@intermediapartners. Pulli.org Phone: +1 415 863 4717 Marian Anderson: Sr. 2. Phone: +1 714 821 8380 | Fax: +1 714 821 4010 Advertising Sales Representatives (Recruitment) Mid Atlantic: Lisa Rinaldo | Email: lr. built his own laboratory equipment. Kuhn. Raymond. Email: md. Adams.. of Chicago Press. 1992. and its lack of transparency makes it hard to do definitive research.ieeemedia@ieee. is able to recognize that something has gone wrong. (What are the noise characteristics of a Canon 5D Mark II’s image sensor? Only Canon knows. one of the greatest experimental scientists. an SLR’s lack of controllability makes it hard to do innovative research. In particular.restchack@ieee. Faraday wasn’t a fanatic. Eurographics.com Phone: +1 770 645 2944 | Fax: +1 770 993 4423 Midwest/Southwest: Darcy Giovingo | Email: dg. Computational Photography: Mastering New Techniques for Lenses. The Structure of Scientific Revolutions.org Phone: +1 847 498 4520 | Fax: +1 847 498 5911 US West: Dawn Becker | Email: db. Contact him at levoy@cs.”6 Kuhn further hypothesizes that intellectual revolutions occur only when a scientist casts aside the tools of the dominant paradigm and begins to invent wild theories and experiment at random. 6. 4.stanford.” ACM Trans.0 project’s primary goal is to facilitate this sort of wild experimentation. Lighting. Raskar and J.ieeemedia@ieee. E.ieeemedia@ieee. ADVERTISERINFORMATION•SEPTEMBER/OCTOBER2010 ADVERTISER Morgan Kaufmann Publishers Advertising Personnel Marion Delaney: IEEE Media. T. Selected CS articles and columns are also available for free at http://ComputingNow. no. Camera Technology: The Dark Side of the Lens. References 1. and Sensors. Goldberg. 2001.ieeemedia@ieee.” Proc. pp. 1962. vol. and K.. “The Frankencamera: An Experimental Platform for Computational Photography.ieeemedia@ieee. knowing with precision what he should expect.org Phone: +1 714 821 8380 | Fax: +1 714 821 4010 Sandy Brown: Sr.com Phone: +44 1875 825700 | Fax: +44 1875 825701 Advertising Sales Representatives (Product) US East: Dawn Becker | Email: db.
see computingnow. computingnow.This article was featured in For access to more content from the IEEE Computer Society. Top articles.org.org .computer. and more. podcasts.computer.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.