You are on page 1of 12
Rendering Synthetic Objects into Legacy Photographs Kevin Karsch Varsha Hedau Derek Hoiem David Forsyth University of Illinois at Urbana-Champaign {karsch 1, vhedau2,daf dhoiem }@ uiuc.edu z= L Figure 1: With only a small amount of user interaction, our system allows objects to be inserted into legacy images so that perspective, ‘clusion, and lighting of inserted objects adhere othe physical properties ofthe scene. Our methox works with anlya single LDR photograph, ‘and no access to the scene is required. Abstract We propose « methed! to realistically insert synthetic objects into existing photographs without requiring access to the soene or any ‘additional scene measurements. ‘With a single image and a small amount of annotation, our method creates physical model of the ‘scene that is suitable for realisically rendering synthetic objects with diffuse, specular, and even glowing materials while account ing for lighing interactions between the objects and the scene. We ‘demonstrate ina user study that synthetic images peeduced by our method are conusable sith real scenes, even for people who be- Tieve they are good at telling the difference. Further, cur study shows that our method is competitive with other inserdon meth- ‘ods while requiring less scene information. We ako collected new ilumination and reflectance datases: renderings produced by our system compare well to ground truth. Our system has applications in the movie and gaming industry, as well as home decorating and usercontent creation, among others (CR Categories: 1.2.10 [Computing Methodologies]: Artificial Intelligence—Vision and Scene Understanding: 1.3.6 [Comput ing Methodologies]: Computer Graphics—Methodology and Tech- niques Keywords image-based rendering. computational photography Tight estimation, photo editing Links: @DL TIPDF wes 1 Introduction Many applications require a user to insert 3D meshed characters, props, orother synthetic objects into images and videos. Currently, Toinser objets into the scene, some scene geometry must bef ually created, and lighting moviels may be produced by photograph ing mierored light probes placed in the scene, taking multiple pho tographs ofthe scene, or even modeling the sources manually. Ei ‘ther way, the process is painstaking and requites expertise We propose a method to realistically insert synthetic objects into ‘existing photographs without requiring access tothe scene, special ‘equipment, multiple photographs, time lapses, or any other aks ‘Our approach, outined in Figure 2, isto take advantage of small amounts of annotation (0 recover a simplisie madel of geometry tnd the position shape, and intensity of light sources. Firs, we ail eximat «rough geometie ode he ene ed ask the wer to specify (through image space annotations) any Sion geomeny ht synthe ebjoes sould merc mt, Nex and light shafts (strongly di ted n automatically generates a physical Hight in the ‘model ofthe scene using thse annotations, The models erated by ‘our method are suitable for realistically rendering synthetic objects with diffuse, specular, andeven glowing materials while accoaeting fee lighting interactions between the objects and the scene. In addition 1 ovr overall system, our primary technical contribu tion isa semiautomatic algorithm for estimating a physical lighing ‘model from a single image, Our method can generat a ful lighting model that is demenstrated to he physically meaningful through ‘round ruth evaluation, We als introduce novel image decompo: Sion algoitim that uses geometry toimprove lighiness estimates, tnd we show in ancther evaluation to be state-of the-art for single image reflectance estimation, We demonsirate with a user study ‘thatthe results of our method are confusable with real scenes, even for people who believe they are good at telling the difference. Our study also shows that our method is competitive with other inser tion methods while requiring less seene information. This methoxt has become possible from advances in recent literature. In the past few years, we have learned a great deal about extracting high level information from indoor scenes [Hedau etal. 2009; Lee etl. 200 Lee eta. 2010), straightforward (Guo etal. 2011], Grosse et al [2008] have also shown that simple lightness assumptions lead to powerful surface ‘estimation algorithms; Retinex remains among the best methods ‘Auto-tefine 3D seene ‘Compose seene ‘Annotate lights ren inal composite Figure 2: Our method for inserting symaheic objects into legacy photographs. From an inpe image (lop te), inital geometry is estimated ‘and a user annotaes other necessary geomet (Sop middle) as wel as ight positions (op tight From this input, our system automatically computes a 3D scene, including a physical ight model, surface materials, and camera parameters (bottom ltt. After auser places synthetic ‘bjects in the scene (betiom middle objects are rendered and composited into the original image (bottom right). Objects appear naturally lit and adhere to the perspective and geometry of he physical scene. From our experience, the markup procedaae takes only a mina or 0, ‘andthe user can begin inserting objets and authoring scenes in matter of minutes. 2. Related work Debevee’s werk [1998] is most closely related to ours. Debevee shows that light probe, such as aspherical mitrr, can be used 10 ‘apiurea physically accurate radiance map forthe position where a synthetic object isto he insert. Tis method requires a considee- able amount of user input: HDR photographs ofthe probe, convert ing these photos into an environment map, and manual modeling tf cene geometry and materials. Mex robust methods exis a the ‘ost of more setup time (e.g. the plenopter [Muy et al. 2009) Unlike these methods and hers (eg. [Fournier etal. 1993: AL- nasser and Foroosh 2006, Cossar eta. 2008; Lalonde etal, 2109)), ‘we require no special equipment, measurements, or multiple pho” tographs. Our method can be used with only a single LDR image, 28, from Flicks, or even historical photos that cannot be recapture. Image-based Content Creation. Like us, Lalonde et al [2007] sim toallow a non-expert user populate an image with objects. Ob- jects are segmented from a large database of images, which they ‘aulomatically orto present the user with source images that have ‘Similar lighting and geometry. Insertion is simplified by automatic ‘lending and shadow transfer, aa the objet region i resized as the user moves the cursor across the ground, This method is ony suit able if an appropriate exemplar image exis, and even in that case, the object cannot participate in the scene's illumination. Similar methods exist for translucent and refractive objets [Yeung eta 2011), but in either ease, inserted objects cannot reflect ight onto other objects or cast caustics. Furthermore, these methods do not allow for mesh insertion, because scene illumination is not caleu- lated. We avoid these prablens by using synthetic objects (3D tex- tured meshes, now plentiful and mostly free on sites like Google 3D Warehouse and turbosquid.com) and physical lighting models Single-view 3D Moding, Several user guided [Liebowitz t al, 1999; Criminisi etal. 2000: Zhang etal, 2001: Horry et al. 1997 Kang ct al. 2001; Oh et al. 2001; Sinha et al. 2008] oF auto ‘matic [Hoiem etal. 2008; Saxena eta. 2008) methods are able to perform 3D modeling from a single image. These works ae gener ally interested in consiructing 3D geometric models for novel view symthesis. Instead, we use the geemetty 1 help infer illumination and to handle perspective and occlusion effects. Thus, we can use simple box-ike models ofthe scene [Hedau etal. 2009] with pla- nar bilbord models [Kang etal 2001] of occluding objects. The geometry of background objects can be safely ignored. Our abil- ity to appropriately resize 3D objects and place them on supporting surfaces, such as table-top, is based on the single-view metrology work of Criminisi (2000): also described by Hartley and Zissor- rman [2003). We recover focal length and automatically estimate three orthogonal vanishing points, using the method from Hedau et ‘al (2008), which is based on Rother's technique [2002} Materials and llumination, We use an automatic decomposi- tion ofthe image into albedo, direct illumination and indirect ila- rmination terms Gntrinsic images (Barrow and Tenenbaum 1978). ‘Our geometric estimates are used to improve these terms and ma- {crial estimates, similar to Boivin and Gagalowicz [2001] and De- ‘bevee [1998], but our method improves eliciency of our illumins- tion inference algorithm and is suficint for realistic insertion (as demonsirated in Sections $ and 6). We must work with a single legacy image, and wish to capture a physical light source estimate so that our method can be used in conjunction with any physical rendering software, Such representations as an iradiance volume donot apply [Greger etal. 1998]. Yu et al show that when acom- prehensive model of geometry and Tuminaires is available, scenes ean be mili convincingly [Yu etal, 1999], We differ from them in ‘that our estimate of geometry is course, and donot require multiple images. Mumination i a room is not siongly directed, and cannot ‘be encoded witha smal se of pot light sources, so the methods of Wang and Samaras [Wang and Samaras 2003] and Lope Moreno ‘etal [Lopez-Moreno etal. 2010] do not apply. As we show in ‘our user stu, point light models fail 10 achieve the realism that physical movels do, We also cannot rely on having a known object present [Satoet al. 2003) In the pas, we have seen that people are unable to detect perceptual errors in lighting (Lopez-Mereno et al Input (a) ‘Geometry (5) ta ba Initial lights (e) Refined lights (0) ‘Albedo (€) 2010), Such observations allow for high level image editing using ‘ough estimates (eg. materials [Khan etal. 2006) and lighting [Kee and Farid 2010)). Lalonde and Etres [2007] consider the color s- ‘eibution of images to differentiate real and fake images: our user study provides human assessment on this problem as well ‘There are standard computational exes for estimating intrinsic im ages. Alhedo tends to display sharp, localized changes (which re- ‘sulin large image gradients), while shading tends to change slowly These rules-of-humb inform the Retinex method (Land and Me- Cann 1971] and important vasints[Hoen 1974; Blake 1985: Brel- ‘aff and Blake 1987]. Sharp changes of shading do occur at shadow boundaries oF normal discontinuities, hut cues such a8 chromatic ity (Punt et al. 1992] or differently lit images (Weiss 2001] can control these difficulties, as can methods that classify edges into albedo or shading [Tappen et al, 2005: Furenzena and Fusiello 2007), Tappen et al. (2006) assemble example patches of intrin- uided by the real image, and exploiting the constraint patches join up. Recent work by Grosse et a. demonstrates atthe color variant of Retinex is state-of-the-art fr single-image decomposition methods [Grosse ta 2009) sic image, a 3 Modeling ‘To render synthetic objects reslistically into 9 scene, we need esti- mates of geometry and lighting. At present, there are no methods for obiaining such information accurately and autematically; we im corporate user guidance wo symthesize suficient models, (Our lighting estimation procedure isthe primary technical conti ‘bution of our method, With a bic of user markup, we automatically decompose the image with « nove inrinsic image method, refine intial light sources based on this decomposition, nd estimate light shafts using a shadow detection method, Our method can be broken {nwo tree phases, The fist two phases interactively create models ‘of geometry and lighting respectively, andthe final phase renders ‘and composites the synthetic objects into the image, An overview ‘of our method is sketched in Algorihn 1 3.1. Estimating geometry and materials ‘Torealitcaly insert objects into a scene, we only nea enough ge- ‘omety to faithfully model lighting effects. We automatically obtain Tnitial result (2) Figure 3: Overview of our interior lighing algorithm. For an input Image (a), we use the modeled ge metry (visualization of 3D scene boundaries as colored wireframe imesh, (b)) to decompose the image {nto albedo (¢) and direct reflected light (d). The user defines initial lighting primitives in the scene (2), an the light parameters are re estimated (Q. The effectiveness of for lighting algorithm is demon- ‘trated by comparing a composited result (@) using the initial light p= ‘rameters to another composted re sult (h) using the optimized light parameters. Our awomatic light Ing refinement enhances the real lam of inserted objects. Lights are initialized aay from the ac= tual sources to demonstrate the ef Refined result (h) fectiveness of our refinement Direct (d) ‘coarse geometric representation of the scene using the technique ‘of Hedat eal, (2008), and eximate vanishing points w recover ‘camera pose automatically. Our interface allows a use to corect ‘errors i those estimates, and also create imple geometry (tables andor near-flat surfaces) through image space annotations. If nec- essary, other geometry canbe added manually suchas complex ob- jects near inserted synthetic objects. However, we have found that inmost cases our simple models suffice in creating reise results: all results in this paper require no additional complex geometry Refer to Section 4.1 for implementation details, 3.2. Estimating illu Estimating physical light sources automatically froma single image isan extemely dificult ask. Instead, we describe a method 10 ob- tain physical lighting model that, when rendered, closely resem. bles te original image, We wish lo reproduce two different types ‘of lighting: inerior lighting, emiters presen within the scene, and exterior lighting satis of sttngly directed light which lie outside ‘ofthe immediaie scene (e-, sunlight Interior lighting, Our geometry is generally ical, and our lighting model should account for this ‘be modeled such that renderings ofthe scene look similar to the ‘original image. This step should be wansparent tothe user We ask ‘the user fo mark intuitively whore light sources should be place. and then refine the sources so thatthe rendered image best matches ‘the original image. Also, intensity estimation and color cast can be dificult to estimate, and we correct these automatically (see Fig 3). Initializing light sources. ‘To begin, the user clicks polygons in the image corresponding to each source. These polyZons are pro: {jected onto the geometry to define an are light source. Out-f-view Sources are specified with 3D modeling tools, “Improving light parameters, Our technique isto choose light pa- rameters to minimize the squared pixel-wise differences between nage (with estimated lighting and geometry) and the nage (eg. the ginal image). Denosing 22(L) asthe rn dered image parameterized bythe curent lighting parameter vector 1L, R° asthe target image, and Lo as the iil lighting parameters, [EEGACYINGERTIONGImg, USER) adel geomeiry (See 4.1 auto-estinure materials (See 42) geometry + DETECTBOUNDARIES(img) geometry + USER( Comet boundatis’) ‘eometry + USER(‘Annotate/add adlitional geometry") eometrimar + ESTMATERIALS(img, geometry) [E93] Refine inital ights and estimate shafts (See 3.2) lights « USER("Anetatelighsshaft bounding boxes") lights © REFINELIGHTS(img, geometry) [Eq 1] lights — pevECTSHaFTs(img) insert objects, render and composite (See 3.3) scene < CREATESCENH( geometry, lights) seen + USER("Add symihtic objects) return COMPOSITEGm@, RENDER(scene)) [Eq 4 “Algorithm Iz Our mad for rendering objects int Tac Tages weseek to minimize the objective argqian S a(te(h) Re + wll by)? subject to: 0. < Ly < Lv o ‘where w is a weight vector that constrains lighting parameters near ‘heir inital values, and avis a per-pixel weighting that places less ‘emphasis on pixels near the ground, Our geometry estimates will generally be Worse near the botiom of the scene since we may not hhave geometry for objects near the flor. In practice, we set a= 1 for all pixels shove the spatial midpoint of the scene theight-wise, and a decreases quadratically from 1100 af floor pixels. Als, in ‘ur implemeniation, L contains 6 scalars per Haht source: RGB intensity, and 3D pesition. More parameters could also be opsi- mized. For all results, we normalize each ight parameter to the range [0,1], and sot the comesponding values of w to 10 for spatial pprameters and | for intensity parameters. A user can aso modify these weights depending on the confidence of their manwal source ‘estimates. To render the synthetic scene and determine F, we must firs estimate materials forall geomety inthe scene, We use our ‘vin intrinsic image decomposition algorithan to estimate su rellecance (albedo), and the albedo i then projected onto the scene ‘eometry as a diffuse texture map, as deserted in Section 4.2 Ineinsic decomposition. Our decomposition method exploits our ‘geometry estimates. Firs, indirect irradiance is computed by gath- ‘ring radiance values at each 3D patch of geometry that a pixel Projects onto, The gathered radiance values are obtained by sam- pling observed pixel values from the original image, which are pro- Jested onto geometry along the cameras viewpoint. We denote this Indirect iadiance image a8 I; this erm is equivalent to the integral in the radiosity equation. Given the typical Lambertian assurnp- tions, we assume that the original image can be expressed as the product of albedo p and shading Sas well asthe sum of reflected ‘ict light D and reflected indirect light 1. Furthermore, reflected ‘gathered iradiance is equivalent o elected indirect ehiing under ‘hese assumptions, This leads to the equations B=pS, B=D+I, T=, B= D+ 60. ‘We use the last equation as consiraitsin our optimization below. ‘We have developed an objective funetion to decompose an image B into albedo p and direct ight D by solving segnin 13a + um W pt + 2(Ds ~ Do, sa(U DF subjectto B=D+ pl, 0