This action might not be possible to undo. Are you sure you want to continue?
Animation: "The Entertainer"
CS184 Final Project · Fall 2008 Isaac Salier-Hellendag (cs), Richard Mar (an), Trevor Lee (ci), Shendy Kurnia (cq)
About the Project
Project Goal Our objective in this project was to produce a short animation with frames rendered in an NPR (Non-photorealistic) renderer. The renderer would have two major features: cel-shading and contour outlining, to provide a "toon" shading look for each frame. Further, we decided to create original models and animations for the project, a player piano with keys moving in synch with music, on display via a dynamic camera. We would then export each frame of animation into .obj and .mtl files, and combine individually rendered frames into a complete animation. The finished animation (with sound) is available in high definition (1280 x 720) at: http://www.vimeo.com/2486431 Additionally, we have included it on the disc submitted with this paper, in .mov format. The images within the paper are also included, in high resolution, on the disc. Project Approach We discussed implementing our renderer using an OpenGL shader, with contour outlines created using either OpenGL wireframe lines or the line drawing algorithm described in Shirley 9.3.1. However, this approach seemed too simplistic for the scope of the course and the project. Merely rendering our animation by letting OpenGL doing all the work would be too easy. With that in mind, we decided to use a modified raytracer to perform the rendering work. We decided to render each frame individually and compile the frames into the animation after rendering. First, to create a better "cartoonish" look for our models, we eliminated reflections and shadows, as they would both negatively impact the appearance of the scene and significantly increase rendering time. With post-processing contour outlining and rendering efficiency in mind, we also chose not to use multi-sample anti-aliasing. Our approach was inspired by several online articles on NPR shading that recommended the following basic method for frame-by-frame rendering: 1. Calculate cel-shading based on material color and an array of grayscale values using modified raytracer 2. Calculate edges for contour outlines 3. Overlay edges onto the cel-shaded image to produce the final image
Salier-Hellendag, Mar, Lee, Kurnia 2
We then proceeded to gut and rebuild the raytracer from Assignment 4 into a raytracer optimized for NPR rendering, throwing out the old code for calculating Phong shading, reflections, and shadows. There were three things that needed to be done: 1) the NPR shader to produce the "cartoonish" cel-shaded look, 2) post-process edge detection for the contour outlining, and 3) a new acceleration structure to cut down on render time.
The NPR Renderer
Cel-Shading Commonly found in comic books and hand-drawn animations, cel-shading -- or "toon" shading - is a form of shading intended to make objects look hand-drawn. As with our Phong shaded raytracer, we would calculate the following for each eye ray (in our case, 1280 x 720 = ~ 920,000 rays to produce an HD-valid resolution): • • • • The closest possible polygon intersection, using hierarchical bounding boxes to accelerate tracing The normalized direction vectors to light sources from that point (note: we opted to use a single white point light source for our scene) The diffuse RGB value (Kd) at the intersection point The surface normal of the intersected polygon
As we would when calculating the diffuse component for Phong shading, we need the dot product of the light vector and the surface normal to obtain the cosine of the angle between the two, a value between 0 and 1. At this point, we depart from Phong shading. Instead of using the cosine value itself, we use an array of 16 grayscale values. In our case, we used values recommended by a cel-shading article on GameDev.net:
Three grayscale values are used: 0.5, 0.75, 1.0. Our polygon's diffuse material value, Kd, will be multiplied by one of these values. Determining which is simple:
int P_grayscale = int ((l * n) * 16)
Then the cel-shaded value is simply:
vec3 cel = Kd * grayscaleArray[P_grayscale]
Note that given the grayscale array described above, this means that any cosine value greater than 0.5 would result in the pixel being colored with cel-shading equal to Kd, since the grayscale value would be 1.0.
Salier-Hellendag, Mar, Lee, Kurnia 3
We save the resulting RGB value for this pixel and move on to the next pixel. We do not use ambient or specular components, as they are unnecessary for cel-shading. We continue this process through every pixel. See Figure 1 for a sample resulting cel-shaded image. Afterward, the only task remaining is to add in contour outlines.
Figure 1. Cel-shaded piano without contour lines
Contour Outlines In order to enhance the hand-drawn appearance of the images produced by our renderer, our program traces the major figures and edges in the rendered images with bold black contour lines before it outputs the final image. Our program finds where to draw the contour lines by applying an edge detection algorithm that uses differential analysis of the depth and surface orientation features of the raytraced scene. Overall, the contour outlining process has three main phases: 1. The generation of feature maps describing the sampled depth and surface normal attributes of the scene. 2. Edge detection by applying a bi-dimensional Sobel filter algorithm to both feature maps. 3. Overlaying the detected edges as black contour lines on the color image. Depth Value and Normal Vector Maps During raytracing, for each eye ray that collides with an object in the scene, our program records in separate arrays both the distance the ray traveled before its collision, i.e. the depth value, and the normal vector of the object surface at the point of collision. After all of the eye rays have been traced, the depth value map is normalized and converted to an 8-bit grayscale value by dividing each entry by the maximum depth value and multiplying by 255. The depth value map
Salier-Hellendag, Mar, Lee, Kurnia 4
can then be visualized as a grayscale image, with lighter grayscale values corresponding linearly to higher depth values (Fig. 2). Likewise, the normal vector map can be visualized as a color image by multiplying the x, y and z coordinates of each of the normalized vector by 255 to generate a corresponding 8-bit RGB color vector that represents the normal vector's orientation (Fig. 3).
Figure 2. Linear 8-bit grayscale representation of depth values
Salier-Hellendag, Mar, Lee, Kurnia 5
Figure 3. Color representation of surface normal vectors.
Edge Detection Black and white edge line images were produced by applying a Sobel filter to the grayscale depth value map and to the normal vector map. The Sobel filter calculates approximate onedimensional intensity gradient values (i.e. partial derivatives) at each pixel and then marks the pixel as an edge if the root-mean-squared average of the x- and y-dimension gradient magnitudes exceeds a predetermined threshold value. In the Sobel filter algorithm used by our program, the values or vectors in the feature map are discretely convolved with the x-dimension Sobel kernel and then with the y-dimension Sobel kernel. The results for each dimension at each pixel were squared and summed, and then the square root of the resulting value is compared against an input edge-detection threshold value. If the computed value exceeds the threshold, then the pixel is determined to be located on an edge, so a black pixel is then stored in the corresponding position on an edge line image. In our program, separate Sobel threshold values were used for depth value and normal vector edge-detection, in order to optimize overall edge-detection sensitivity. Our final animation used threshold values of 8 and 4 for the depth value and normal vector edge-detection procedures, respectively.
Salier-Hellendag, Mar, Lee, Kurnia 6
Figure 4. Edge line image produced from depth value edge detection
Figure 5. Edge line image produced from normal vector edge detection using a bi-dimensional Sobel filter. Threshold value is 4.
Salier-Hellendag, Mar, Lee, Kurnia 7
Edge line merging Once we have the edge line images from both depth value edge detection and normal vector edge detection, we merge the two into a complete edge line image and blur the black pixels with neighboring pixels to produce softer edges. The resulting image is then composited with the celshaded image by multiplying the RGB values from the edge line image with the cel-shaded image. White pixels in the edge line image leave the cel-shaded pixels unchanged, where as grey edge line pixels from the blurring darken the cel-shaded pixels, and the black edge line pixels completely mask the cel-shaded pixels. Hierarchical Bounding Boxes Quick rendering time is a necessity for producing the number of frames we wanted within a reasonable timeframe. In order to accomplish this, we worked to create an effective structure for hierarchical bounding boxes as described in Shirley, 10.9.2. Instead of limiting the number of polygons within a parent box to only two as suggested in the text, however, we experimented with different values. We settled on a maximum of ten polygons per leaf node, as this seemed to bring about near-optimal rendering times. Too few polygons, and it seemed that the tree search needed to be too deep to be efficient. Too many polygons, and examining a leaf node would result in looping through too many unnecessary polygons within that node. Our improved bounding box structure brought our rendering times from 10-15 minutes to 10-15 seconds. This was a major step in improving the overall speed of animation production, and allowed us to execute our renderings within a matter of a few hours across multiple machines. Edge Blurring Because our edge detection implementation creates images composed only of black and white pixels, jagged contour and silhouette lines were impossible to avoid (Fig. 6). This resulted in final images that were not as crisp as we had hoped. Thus, we added a blurring routine to the outline image to create some "pseudo" anti-aliasing and smooth the edges of the outlines. Our blurring implementation consists of looping through pixels of the combined edge image (with both depth value edges and normal value edges already added together to produce our complete outline overlay). For each pixel, we compute a weighted average of the RGB values of the pixel itself and its immediate neighbors. We then place the averaged RGB (grayscale) value into a new array. When all pixels have been calculated, we use the final blurred outline image as our contour outline overlay.
Salier-Hellendag, Mar, Lee, Kurnia 8
Figure 6. Cel-shaded image with contour edges, pre-blurring. Note the jagged edges.
Bulk Processing Due to the large number of frames needed to produce our animation, it seemed sensible to include a bulk rendering feature. With all relevant obj and mtl files named for their respective frame numbers, we created a loop to allow arguments like "[anim_name] [start_frame_#] [end_frame_#]" to automatically render frames from start_frame_# through end_frame_# to produce PNG files as output. In our case, producing all 2,229 frames would be as simple as including "keys 1 2229" as arguments. Edge Image Output We wanted to ensure that our edge detection implementation would provide the best possible contour and silhouette outlines for the objects in the scene. We thus decided to include a feature to output black-and-white edge images to PNG format as we were with our final images. This enabled us to get a strong understanding of which specific outlines were produced by the depth and normal buffers, thereby allowing us to tweak our edge detection thresholds to optimal levels. See Figures 2 through 5 for examples. Quick .obj Parsing We found a useful obj loader at http://kixor.net/dev/objloader/ that meets our requirements. It is able to parse basically all information an obj file can contains such as vertices, texture coordinates, normals, 3 or 4 vertex faces, and .mtl files.
Salier-Hellendag, Mar, Lee, Kurnia 9
Camera Input The obj file parser we utilize contains support for camera data, using two vertex points, a vertex normal value, and a "face" set to use the vertex points. The points are the eye point and the camera direction and the normal is the camera "up" vector. For example:
# Camera start v 16.270052 26.948549 25.997269 v 15.863070 26.491430 25.206442 vn -0.419904 0.862497 -0.282451 c 1 2 1 # Camera end // Eye // Gaze direction // Up vector
Using this feature, each obj file includes camera data from Blender and camera movement through the animation becomes a trivial issue. By calculating the vector normal to the cross product of the gaze direction and the up vector, it is possible to easily determine the four corner points of the view plane. We set the view plane in the Camera class:
vec3 viewDir = this->gaze - this->eye; float t = 1.0; vec3 center = this->eye + (t*viewDir); // Parametric equation to get center of view float ratio = (float)this->h / (float)this->w; // For customizable resolutions vec3 view_plane_normal = viewDir ^ this->up; view_plane_normal.normalize(); vec3 upVec = ratio * this->up; this->UR = center + upVec + view_plane_normal; this->LR = center + view_plane_normal - upVec; this->UL = center + upVec - view_plane_normal; this->LL = center + (view_plane_normal + upVec) * -1;
These values are now in our Camera object after parsing the obj file, our view plane is set for the current frame, and from here we may execute our ray tracing algorithm.
Salier-Hellendag, Mar, Lee, Kurnia 10
Modeling and Animation
Modeling We decided to create a player piano for our animation, with animated keys, player scroll and pedals. We came to this decision for several reasons: • • • The simplicity of the animations. Keys and pedals are animated with simple rotations along a single axis. Our expertise in Blender is limited, so simpler animations would be better for our purposes. The high number of straight lines (keys, sharp edges of the piano, etc.) would provide a great opportunity to show off our contour outlining. A piano animation would be ideal for synching music -- an extra challenge, but well worth it for producing a high quality animation
We modeled the entire scene in Blender, the "free open source 3D content creation suite" available at www.blender.org. In total, the polygon count for the entire scene is approximately 35,000. Sample .obj and .mtl files are available on our project website.
Figure 7. The finished piano model
Animating the Keys There are several animations noticeable in the final animation: the movement of keys for different note types (eighth note, quarter note, half note, etc.), the upward spinning of the player scroll, and the movement of the foot pedals.
Salier-Hellendag, Mar, Lee, Kurnia 11
The movements of the keys are simple rotations about the X-axis, with the back end of each key as a pivot. The major difference between the note animations is the length of time for which the rotation is prolonged, just as on a real piano. Each key was linked to an armature bone (Fig. 8), and each note type was assigned an animation action that could be easily reused where necessary.
Figure 8. A white key with armature bone
The major challenge for animating the keys was matching the chosen music ("The Entertainer" by Scott Joplin, a ragtime classic) with the movement of the keys. Since we aimed to create an animation of about one to two minutes in length, this meant animating hundreds of individual notes. To expedite this procedure, we wrote a Python script to take advantage of Blender's Python API. The script could take as input a text file with the notes to "play" and the keyframe at which to insert the appropriate animations. Each line of the text file corresponds to a half-beat (eighth note). After inserting key animations at a given keyframe, the script would skip to the next keyframe to insert at. (In our case, we wanted four keyframes per half beat.) These input files were written manually. The following is some sample input:
E:w26,w28,w30 Q:w14,b12,w20 E:w33 E:w15 E:w26,w28,b12 E:w20 E:w30,w37|H:w16,w19,w21,w23 Empty Q:w38 Empty Q:w39 E:w18,w20 Q:w40|E:w17,w18 E:w20,b12,b13 E:w16,w18,w20,w34,w4
"E" stands for an eighth note, "Q" for a quarter note, and so on. "w20" corresponds to white key #20, "b12" for black key #12, etc.
Salier-Hellendag, Mar, Lee, Kurnia 12
"Empty" lines would mean that no animations were necessary for the keyframe corresponding to that line of input. The animation would then skip that keyframe. The animation script is available on our project website. Using this script, it would be a simple matter to program an animation for any piece of music on this piano model. Animating the Pedals and Scroll The movement of the scroll is a cyclic rotation about the X-axis. The pedals are also rotated, with the bottom of each pedal as the pivot. Animating the Camera Blender makes it simple to move the camera around a scene using keyframes and interpolated movement. We decided to move the camera around to show different parts of the piano at different angles, and to show better views of the animations. We would then utilize the view plane calculations described earlier to parse obj camera data into the correct camera values. Each frame (each obj file) would thus have its own unique camera values and the camera would appear to move throughout the scene during animation. Figure 9 shows the movement and rotation curves of the camera throughout the animation.
Figure 9. Camera movement and rotation throughout the animation
Salier-Hellendag, Mar, Lee, Kurnia 13
The default obj exporter in Blender does not include a way to export camera data, so we modified the script to include camera information from the Blender API. We added the following snippet of code to perform this task.
camera = Blender.Object.Get("*Camera") file.write('# Camera start\n') objmatrix = camera.matrix eyeV = Mathutils.Vector([0, 0, 0, 1]) targetV = Mathutils.Vector([0, 0, -1, 1]) upV = Mathutils.Vector([0, 1, 0, 0]) eyeV = eyeV * objmatrix eyeV = eyeV * mat_xrot90 file.write('v %.6f %.6f %.6f\n' % (eyeV, eyeV, eyeV)) targetV = targetV * objmatrix targetV = targetV * mat_xrot90 file.write('v %.6f %.6f %.6f\n' % (targetV, targetV, targetV)) upV = upV * objmatrix upV = upV * mat_xrot90 file.write('vn %.6f %.6f %.6f\n' % (upV, upV, upV)) file.write('c 1 2 1\n') file.write('# Camera end\n')
Blender provides an "object matrix" for each object in a scene, including the camera. This matrix includes all rotation, scaling and translation performed on that object. Simply using this matrix allows us to determine the exact eye and target points, as well as the "up" vector. (Note: We also perform a 90 degree rotation on the X-axis, which is the default for exporting obj files from Blender. We must do the same for eye values, hence the matrix multiplication by "mat_xrot90".) Compiling the Animation Due to the large number of frames, we divided up the rendering tasks into groups of 500 frames and spread them out for rendering on four computers. The rendering was complete within several hours. The groups of frames were compiled into .mov files using QuickTime Pro's image sequence feature, arranged in iMovie and synched with the Joplin piece.
Though our raytracer successfully rendered all the frames for our animation in a timely fashion, it is not entirely bug-free. We ran into several stumbling blocks along the way, with the most vexing being an issue with the edge detection algorithm. If we set the threshold too high, we lost all the detail in the piano's edge trim. If we set the threshold too low, large portions of the piano were shaded black. This was particularly true for the depth value edge detection- the low threshold needed to capture the detail in the piano face also blacked out the sides of the piano. Aside from aesthetic issues, we encountered a memory leak that crashed the raytracer after rendering about 750 consecutive frames. This was because we mishandled the C++ vectors
Salier-Hellendag, Mar, Lee, Kurnia 14
containing the polygons and their materials data, and forgot to clear the copies that were created during the render process; the memory leak has since been fixed. Also, our original acceleration structure was flawed -- the original raytracer took over ten minutes to render the Utah teapot, despite the implementation of a fast bounding box intersection test. We spent a significant amount of time trying to optimize the old tree, but finally opted to rewrite a new hierarchical bounding box tree that ultimately razed the old render times of 15 minutes to a mere 15 seconds. In the end, we successfully produced a working raytracer that implemented cel-shading with contour and silhouette outlining, as well as a full animation to demonstrate our renderer's capabilities.
Our renderer code and executable, Python scripts, and music inputs are available on our project website, http://inst.eecs.berkeley.edu/~cs184-cs/finalproject.
CS184 Course textbook: Peter Shirley, Fundamentals of Computer Graphics, Second Edition, 2005. Edge detection (depth and normal values): http://www.cs.utexas.edu/~msuwandi/cs384g/CS384G_Final_Project.html Cel-shading with contour outlines: http://en.wikipedia.org/wiki/Cel-shaded_animation Cel-shading approaches: http://www.gamedev.net/reference/programming/features/celshading/ Edge detection (Sobel filter): http://www.jasonokane.com/tu/351cos/sobel/ Wavefront (obj) loader: http://kixor.net/dev/objloader/ Fast bounding box intersection algorithm: http://jgt.akpeters.com/papers/MahovskyWyvill04/ Scott Joplin's "The Entertainer" (sheet music): http://www.8notes.com/scores/594.asp?ftype=gif Blender Tutorials: http://en.wikibooks.org/wiki/Blender_3D:_Noob_to_Pro Simple Blender Armature animation: http://www.blender3dclub.com/index.php?name=News&file=article&sid=24
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.