Professional Documents
Culture Documents
Computer Graphics and Multimedia
CONTENT
Unit 1
Raster Scan
Pixel
Frame buffer
Vector and character generation
Polygon scan conversion
Line drawing algorithms
Display Devices
Boundary Fill Algorithm
Flood Fill Algorithm
Unit 2
Transformations
Homogeneous coordinates
Line Clipping
The CohenSutherland LineClipping Algorithm
Clipping Polygons
WeilerAtherton Algorithm
Unit 3
3D Transformations
Parallel and Perspective Projection
Zbuffering
Bézier curve
Bsplines
Painter's algorithm
Unit 4
Diffuse Reflection
Specular reflection
Ray tracing
Gouraud Shading
Phong Shading
Color Models in Computer Graphics
Unit 5
Introduction
Multi Media Hardware
Audio in Multi Media
Data Compression
Lossless versus lossy Compression
Vedio
2
Computer Graphics and Multimedia
Unit I
Raster Scan
A raster scan, or raster scanning, is the rectangular pattern of image capture and
reconstruction in television. By analogy, the term is used for raster graphics, the pattern
of image storage and transmission used in most computer bitmap image systems. The
word raster comes from the Latin word rastrum (a rake), which is derived from radere (to
scrape); see also rastrum, an instrument for drawing musical staff lines. The pattern left
by the tines of a rake, when drawn straight, resembles the parallel lines of a raster: this
line-by-line scanning is what creates a raster. It's a systematic process of covering the
area progressively, one line at a time. Although often a great deal faster, it's similar in the
most-general sense to how one's gaze travels when one reads text.
In a raster scan, an image is subdivided into a sequence of (usually horizontal) strips
known as "scan lines". Each scan line can be transmitted in the form of an analog signal
as it is read from the video source, as in television systems, or can be further divided into
3
Computer Graphics and Multimedia
discrete pixels for processing in a computer system. This ordering of pixels by rows is
known as raster order, or raster scan order. Analog television has discrete scan lines
(discrete vertical resolution), but does not have discrete pixels (horizontal resolution) – it
instead varies the signal continuously over the scan line. Thus, while the number of scan
lines (vertical resolution) is unambiguously defined, the horizontal resolution is more
approximate, according to how quickly the signal can change over the course of the scan
line.
Scanning pattern
Interlaced scanning
To obtain flicker-free pictures, analog CRT TVs write only odd-numbered scan lines on
the first vertical scan; then, the even-numbered lines follow, placed ("interlaced")
between the odd-numbered lines. This is called interlaced scanning. (In this case,
positioning the even-numbered lines does require precise position control; in old analog
TVs, trimming the Vertical Hold adjustment made scan lines space properly. If slightly
misadjusted, the scan lines would appear in pairs, with spaces between.) Modern high-
definition TV displays use data formats like progressive scan in computer monitors (such
as "1080p", 1080 lines, progressive), or interlaced (such as "1080i").
Pixel
In digital imaging, a pixel (or picture element) is a single point in a raster image. The
pixel is the smallest addressable screen element; it is the smallest unit of picture that can
4
Computer Graphics and Multimedia
be controlled. Each pixel has its own address. The address of a pixel corresponds to its
coordinates. Pixels are normally arranged in a 2-dimensional grid, and are often
represented using dots or squares. Each pixel is a sample of an original image; more
samples typically provide more accurate representations of the original. The intensity of
each pixel is variable. In color image systems, a color is typically represented by three or
four component intensities such as red, green, and blue, or cyan, magenta, yellow, and
black.
In some contexts (such as descriptions of camera sensors), the term pixel is used to refer
to a single scalar element of a multi-component representation (more precisely called a
photosite in the camera sensor context, although the neologism sensel is also sometimes
used to describe the elements of a digital camera's sensor), while in others the term may
refer to the entire set of such component intensities for a spatial position. In color systems
that use chroma subsampling, the multi-component concept of a pixel can become
difficult to apply, since the intensity measures for the different color components
correspond to different spatial areas in a such a representation.
A pixel is generally thought of as the smallest single component of a digital image. The
definition is highly context-sensitive. For example, there can be "printed pixels" in a
page, or pixels carried by electronic signals, or represented by digital values, or pixels on
a display device, or pixels in a digital camera (photosensor elements). This list is not
exhaustive, and depending on context, there are several terms that are synonymous in
particular contexts, such as pel, sample, byte, bit, dot, spot, etc. The term "pixels" can be
used in the abstract, or as a unit of measure, in particular when using pixels as a measure
of resolution, such as: 2400 pixels per inch, 640 pixels per line, or spaced 10 pixels apart.
The measures dots per inch (dpi) and pixels per inch (ppi) are sometimes used
interchangeably, but have distinct meanings, especially for printer devices, where dpi is a
measure of the printer's density of dot (e.g. ink droplet) placement. For example, a high-
quality photographic image may be printed with 600 ppi on a 1200 dpi inkjet printer.
Even higher dpi numbers, such as the 4800 dpi quoted by printer manufacturers since
2002, do not mean much in terms of achievable resolution.
The more pixels used to represent an image, the closer the result can resemble the
original. The number of pixels in an image is sometimes called the resolution, though
resolution has a more specific definition. Pixel counts can be expressed as a single
number, as in a "three-megapixel" digital camera, which has a nominal three million
pixels, or as a pair of numbers, as in a "640 by 480 display", which has 640 pixels from
side to side and 480 from top to bottom (as in a VGA display), and therefore has a total
number of 640 × 480 = 307,200 pixels or 0.3 megapixels.
The pixels, or color samples, that form a digitized image (such as a JPEG file used on a
web page) may or may not be in one-to-one correspondence with screen pixels,
depending on how a computer displays an image. In computing, an image composed of
pixels is known as a bitmapped image or a raster image. The word raster originates from
television scanning patterns, and has been widely used to describe similar halftone
printing and storage techniques.
The number of distinct colors that can be represented by a pixel depends on the number
5
Computer Graphics and Multimedia
of bits per pixel (bpp). A 1 bpp image uses 1-bit for each pixel, so each pixel can be
either on or off. Each additional bit doubles the number of colors available, so a 2 bpp
image can have 4 colors, and a 3 bpp image can have 8 colors:
l 1 bpp, 21 = 2 colors (monochrome)
l 2 bpp, 22 = 4 colors
l 3 bpp, 23 = 8 colors
...
l 8 bpp, 28 = 256 colors
l 16 bpp, 216 = 65,536 colors ("High color" )
l 24 bpp, 224 ≈ 16.8 million colors ("True color")
For color depths of 15 or more bits per pixel, the depth is normally the sum of the bits
allocated to each of the red, green, and blue components. High color, usually meaning 16
bpp, normally has five bits for red and blue, and six bits for green, as the human eye is
more sensitive to errors in green than in the other two primary colors. For applications
involving transparency, the 16 bits may be divided into five bits each of red, green, and
blue, with one bit left for transparency. A 24-bit depth allows 8 bits per component. On
some systems, 32-bit depth is available: this means that each 24-bit pixel has an extra 8
bits to describe its opacity (for purposes of combining with another image).
Frame buffer.
A frame buffer is a video output device that drives a video display from a memory
buffer containing a complete frame of data.
The information in the memory buffer typically consists of color values for every pixel
(point that can be displayed) on the screen. Color values are commonly stored in 1-bit
monochrome, 4-bit palettized, 8-bit palettized, 16-bit highcolor and 24-bit truecolor
formats. An additional alpha channel is sometimes used to retain information about pixel
transparency. The total amount of the memory required to drive the framebuffer depends
on the resolution of the output signal, and on the color depth and palette size.
Framebuffers differ significantly from the vector graphics displays that were common
prior to the advent of the framebuffer. With a vector display, only the vertices of the
graphics primitives are stored. The electron beam of the output display is then
commanded to move from vertex to vertex, tracing an analog line across the area between
these points. With a framebuffer, the electron beam (if the display technology uses one) is
commanded to trace a left-to-right, top-to-bottom path across the entire screen, the way a
television renders a broadcast signal. At the same time, the color information for each
point on the screen is pulled from the framebuffer, creating a set of discrete picture
elements (pixels).
The term "framebuffer" has also entered into colloquial usage to refer to a backing store
of graphical information. The key feature that differentiates a framebuffer from memory
6
Computer Graphics and Multimedia
used to store graphics — the output device — is lost in this usage.
Vector and character generation
A vector generator for a computer graphics system of the type in which the beam of a
CRT is deflected by analogue deflection signals to form an image comprised of a
plurality of discrete vectors each defined, in each of two, orthogonal deflection axes, the
first and second digital words respectively defining the initial position and the length of
the vector. The vector generator includes circuitry that ensures that each vector is drawn
with substantially uniform brightness regardless of the length of the vector. Additional
circuitry is furnished so that an instantaneous indication may be obtained of a light pen
strike or the occurrence of edge violations. This circuitry also yields continuous digital
information concerning the current beam position.
This term encompasses a range of algorithms where polygons are rendered, normally one
at a time, into a frame buffer. The term scan comes from the fact that an image on a CRT
is made up of scan lines. Examples of polygon scan conversion algorithms are the
painter's algorithm, the zbuffer, and the Abuffer. In this course we will generally
assume that polygon scan conversion (PSC) refers to the zbuffer algorithm or one of its
derivatives.
The advantage of polygon scan conversion is that it is fast. Polygon scan conversion
algorithms are used in computer games, flight simulators, and other applications where
interactivity is important. To give a human the illusion that they are interacting with a 3D
model in real time, you need to present the human with animation running at 10 frames
per second or faster. Polygon scan conversion can do this. The fastest hardware
implementations of PSC algorithms can now process millions of polygons per second.
One problem with polygon scan conversion is that it can only support simplistic lighting
models, so images do not necessarily look realistic. For example: transparency can be
supported, but refraction requires the use of an advanced and time-consuming technique
called "refraction mapping"; reflections can be supported -- at the expense of duplicating
all of the polygons on the "other side" of the reflecting surface; shadows can be produced,
by a more complicated method than ray tracing. The other limitation is that it only has a
single primitive: the polygon, which means that everything is made up of flat surfaces.
This is especially unrealistic when modeling natural objects such as humans or animals.
An image generated using a polygon scan conversion algorithm, even one which makes
heavy use of texture mapping, will tend to look computer generated.
Line drawing algorithms
a) Digital Differential Analyzer DDA
Basic idea: Calculate the (x, y)-coordinates of the pixels for the line using the parametric
representation for the line and scan t from 0 to 1.
7
Computer Graphics and Multimedia
( x (t) , y(t) ) = (P1x, P1y) + t * (P2x-P1x, P2y- P1y)
= (x1, y1) + t_(x2-x1, y2-y1)
x (t) = x1 + t ∆X
y (t) = y1 + t ∆Y
= y1 + m (t ∆X)
Where m = ∆Y/∆X
Use suitable incremental values for t, calculate (x(t),y(t)) and round off to ints.
Simplified version:
If |m| ≤1 then increment x x + 1, that is, scan over tn, such that tn∆X = 0,1,2, are
integers; and increment y with m.
if |m| > 1 then increment y y + 1, that is, scan over tn, such that m*tn *∆X = 0,1,2, are
integers, and increment x by1/m.
Potential problems:
• floating point division at start up
• floating point additions when incrementing
• rounding at every increment
b) Bresenham’s Algorithm
For |m| < 1, x-increments are integer valued . Try to eliminate rounding off floating point
values by detecting when the rounding off gives a value which is different from the
previously rounded off value.
Advantages:
– No floating point divisions
– No floating point numbers
– Only integer multiplication by 2, i.e. shift left.
8
Computer Graphics and Multimedia
2nd Order Curves
Curves which are algebraically of second order (circle, ellipse, parabola, some splines)
have special algorithms. Example: for circles use octant symmetry.
Assume that the circle in centered at the origin (0, 0)
_ Let (xi, yi) be the last drawn pixel.
_ Now study (x i+1, yi+1) = (xi+1, yi+1) and decide whether (xi+1, yi) or (xi+1, yi -1) will be
chosen.
_ Circle equation: F(x, y) = x2 + y2 - R2 = 0
_ For any point (x, y): if inside the circle then F(x, y) < 0, if outside the circle F(x, y) > 0.
Introduce a decision variable pi to be evaluated at (xi, yi) to decide where the next pixel
will be.
_ Pi evaluates F(x, y) at the midpoint between yi and yi-1 for xi+1 = xi+1:
Pi = F (xi+1, yi-0.5) = (xi+1)2 + (yi-0.5)2 –R2 = 0
_ if pi < 0 then the midpoint is inside the circle, and we choose the pixel above the
midpoint: yi+1 = yi
_ if pi > 0 then the midpoint is outside the circle, and we choose the pixel below the
midpoint: yi+1 = yi-1
Really neat trick: Calculate pi+1 using values Already calculated at pi:
pi+1 = F(xi+1+1,yi+1-0.5)
= (x +1)2 + (y -0.5)2 – R2
Pi+1 –Pi = 2(xi+1)+1+(yi+1)2 – (yi)2 – (yi+1-yi)
if pi < 0 then yi+1 = yi, and pi+1 = pi + 2*xi+1 +1
if pi > 0 then yi+1 = yi -1, and pi+1 = pi + 2*xi+1 +1 - 2*yi+1
We need only p from the very first pixel drawn at 0 (0, R):
p0 = F (1,R-0.5) = 1.25 –R
Note 1: pi is incremented/decremented by integer values, hence round off p0 = round(p0);
if R integer then start with p0 = 1-R.
Note 2: pi is incremented with 2*xi and 2*yi, hence 2*xi+1 = 2*xi + 2 , 2*yi+1 = 2*yi – 2 or 2*yi
Display Devices
Computer monitor
Performance measurements
• Aspect ratios is the ratio of the horizontal length to the vertical length. 4:3 is the
standard aspect ratio, for example, so that a screen with a width of 1024 pixels
will have a height of 768 pixels. If a widescreen display has an aspect ratio of
16:9, a display that is 1024 pixels wide will have a height of 576 pixels.
• Display resolution is the number of distinct pixels in each dimension that can be
displayed. Maximum resolution is limited by dot pitch.
• Dot pitch is the distance between subpixels of the same color in millimeters. In
general, the smaller the dot pitch, the sharper the picture will appear.
• Response time is the time a pixel in a monitor takes to go from active (black) to
inactive (white) and back to active (black) again, measured in milliseconds.
Lower numbers mean faster transitions and therefore fewer visible image
artifacts.
• Contrast ratio is the ratio of the luminosity of the brightest color (white) to that of
the darkest color (black) that the monitor is capable of producing.
• Power consumption is measured in watts.
• Viewing angle is the maximum angle at which images on the monitor can be
viewed, without excessive degradation to the image. It is measured in degrees
horizontally and vertically.
Problems
Phosphor burn-in
Phosphor burn-in is localized aging of the phosphor layer of a CRT screen where it has
displayed a static bright image for many years. This results in a faint permanent image on
the screen, even when turned off. In severe cases, it can even be possible to read some of
the text, though this only occurs where the displayed text remained the same for years.
This was once a common phenomenon in single purpose business computers. It can still
be an issue with CRT displays when used to display the same image for years at a time,
but modern computers are not normally used this way anymore, so the problem is not a
significant issue. The only systems that suffered the defect were ones displaying the same
image for years, and with these the presence of burn-in was not a noticeable effect when
10
Computer Graphics and Multimedia
in use, since it coincided with the displayed image perfectly. It only became a significant
issue in three situations:
• when some heavily used monitors were reused at home,
• or reused for display purposes
• in some highsecurity applications (but only those where the highsecurity data
displayed did not change for years at a time).
Screen savers were developed as a means to avoid burn-in, but are unnecessary for CRTs
today, despite their popularity.
Phosphor burn-in can be gradually removed on damaged CRT displays by displaying an
all-white screen with brightness and contrast turned up full. This is a slow procedure, but
is usually effective.
Plasma burn-in
Burn-in re-emerged as an issue with early plasma displays, which are more vulnerable to
this than CRTs. Screen savers with moving images may be used with these to minimize
localized burn. Periodic change of the color scheme in use also helps.
Glare
Glare is a problem caused by the relationship between lighting and screen, or by using
monitors in bright sunlight. Matte finish LCDs and flat screen CRTs are less prone to
reflected glare than conventional curved CRTs or glossy LCDs, and aperture grille CRTs,
which are curved on one axis only and are less prone to it than other CRTs curved on
both axes.
If the problem persists despite moving the monitor or adjusting lighting, a filter using a
mesh of very fine black wires may be placed on the screen to reduce glare and improve
contrast. These filters were popular in the late 1980s. They do also reduce light output.
A filter above will only work against reflective glare; direct glare (such as sunlight) will
completely wash out most monitors' internal lighting, and can only be dealt with by use
of a hood or transreflective LCD.
Color misregistration
With exceptions of correctly aligned video projectors and stacked LEDs, most display
technologies, especially LCD, have an inherent misregistration of the color channels, that
is, the centers of the red, green, and blue dots do not line up perfectly. Sub-pixel
rendering depends on this misalignment; technologies making use of this include the
11
Computer Graphics and Multimedia
Apple II from 1976, and more recently Microsoft (ClearType, 1998) and XFree86 (X
Rendering Extension).
Incomplete spectrum
RGB displays produce most of the visible color spectrum, but not all. This can be a
problem where good color matching to non-RGB images is needed. This issue is common
to all monitor technologies with three color channels.
Note: above 4-fill and 8-fill algorithms involve heavy duty recursion which may consume
memory and time. Better algorithms are faster, but more complex. They make use of
pixel runs (horizontal groups of pixels).
UNIT II
Transformations
2D Transformations
Given a point cloud, polygon, or sampled parametric curve, we can use transformations
for several purposes:
1. Change coordinate frames (world, window, viewport, device, etc).
2. Compose objects of simple parts with local scale/position/orientation of one part
defined with regard to other parts. For example, for articulated objects.
3. Use deformation to create new shapes.
4. Useful for animation.
There are three basic classes of transformations:
Homogeneous coordinates
Homogeneous coordinates are another way to represent points to simplify the way in
which we express affine transformations. Normally, bookkeeping would become tedious
when affine transformations of the form A¯p +~t are composed. With homogeneous
coordinates, affine transformations become matrices, and composition of transformations
is as simple as matrix multiplication. In future sections of the course we exploit this in
much more powerful ways.
16
Computer Graphics and Multimedia
Concept
It
is
primitives to a
subregion of the
canvas, to protect other
portions of the
canvas. All
17
Computer Graphics and Multimedia
primitives are clipped to the boundaries of this clipping rectangle; that is, primitives
lying outside the clip rectangle are not drawn.
The default clipping rectangle is the full canvas ( the screen ), and it is obvious that we
cannot see any graphics primitives outside the screen.
A simple example of line clipping can illustrate this idea.
This is a simple Example of line clipping : the display window is the canvas and also the
default clipping rectangle, thus all line segments inside the canvas are drawn.
The red box is the clipping rectangle we will use later, and the dotted line is the extension
of the four edges of the clipping rectangle.
Line Clipping
This section treats clipping of lines against rectangles. Although there are specialized
algorithms for rectangle and polygon clipping, it is important to note that other graphic
primitives can be clipped by repeated application of the line clipper.
Clipping Individual Points
Before we discuss clipping lines, let's look at the simpler problem of clipping
individual points.
If the x coordinate boundaries of the clipping rectangle are Xmin and Xmax, and
the y coordinate boundaries are Ymin and Ymax, then the following inequalities
must be satisfied for a point at (X,Y) to be inside the clipping rectangle:
Xmin < X < Xmax
and Ymin < Y < Ymax
18
Computer Graphics and Multimedia
If any of the four inequalities does not hold, the point is outside the clipping
rectangle.
The CohenSutherland LineClipping Algorithm
The more efficient Cohen-Sutherland Algorithm performs initial tests on a line to
determine whether intersection calculations can be avoided.
1. End-points pairs are check for trivial acceptance or trivial rejected using
the outcode.
2. If not trivial-accepance or trivial-rejected, divided into two segments at a
clip edge.
3. Iteratively clipped by testing trivial-acceptance or trivial-rejected, and
divided into two segments until completely inside or trivial-rejected.
Clipping Polygons
An algorithm that clips a polygon must deal with many different cases. The case is
particularly note worthy in that the concave polygon is clipped into two separate
polygons. All in all, the task of clipping seems rather complex. Each edge of the polygon
must be tested against each edge of the clip rectangle; new edges must be added, and
existing edges must be discarded, retained, or divided. Multiple polygons may result from
clipping a single polygon. We need an organized way to deal with all these cases.
The following example illustrate a simple case of polygon clipping.
WeilerAtherton Algorithm
The Weiler-Atherton algorithm is capable of clipping a concave polygon with interior
holes to the boundaries of another concave polygon, also with interior holes. The polygon
to be clipped is called the subject polygon (SP) and the clipping region is called the clip
polygon (CP). The new boundaries created by clipping the SP against the CP are identical
to portions of the CP. No new edges are created. Hence, the number of resulting polygons
is minimized.
The algorithm describes both the SP and the CP by a circular list of vertices. The exterior
boundaries of the polygons are described clockwise, and the interior boundaries or holes
are described counter-clockwise. When traversing the vertex list, this convention ensures
that the inside of the polygon is always to the right. The boundaries of the SP and the CP
19
Computer Graphics and Multimedia
may or may not intersect. If they intersect, the intersections occur in pairs. One of the
intersections occurs when the SP edge enters the inside of the CP and one when it leaves.
Fundamentally, the algorithm starts at an entering intersection and follows the exterior
boundary of the SP clockwise until an intersection with a CP is found. At the intersection
a right turn is made, and the exterior of the CP is followed clockwise until an intersection
with the SP is found. Again, at the intersection, a right turn is made, with the SP now
being followed. The process is continued until the starting point is reached. Interior
boundaries of the SP are followed counter-clockwise.
A more formal statement of the algorithm is [3]
• Determine the intersections of the subject and clip polygons Add each
intersection to the SP and CP vertex lists. Tag each intersection vertex and
establish a bidirectional link between the SP and CP lists for each intersection
vertex.
• Process nonintersecting polygon borders Establish two holding lists: one for
boundaries which lie inside the CP and one for boundaries which lie outside.
Ignore CP boundaries which are outside the SP. CP boundaries inside the SP form
holes in the SP. Consequently. a copy of the CP boundary goes on both the inside
and the outside holding list. Place the boundaries on the appropriate holding list.
• Create two intersection vertex lists One, the entering list, contains only the
intersections for the SP edge entering the inside of the CP. The other, the leaving
list, contains only the intersections for the SP edge leaving the inside of the CP.
The intersection type will alternate along the boundary. Thus, only one
determination is required for each pair of intersections.
• Perform the actual clipping
Polygons inside the CP are found using the following procedure.
o Remove an intersection vertex from the entering list. If the list is empty,
the process is complete.
o Follow the SP vertex list until an intersection is found. Copy the SP list
upto this point to the inside holding list.
o Using the link, jump to the CP vertex list.
o Follow the CP vertex list until an intersection is found. Copy the CP
vertex list upto this point to the inside holding list.
o Jump back to the SP vertex list.
o Repeat until the starting point is again reached. At this point, the new
inside polygon has been closed.
20
Computer Graphics and Multimedia
Polygons outside the CP are found using the same procedure, except that the
initial intersection vertex is obtained from the leaving list and the CP vertex list is
followed in the reverse direction. The polygon lists are copied to the outside
holding list.
Steps of SutherlandHodgman's polygonclipping algorithm
• Polygons can be clipped against each edge of the window one at a time.
Windows/edge intersections, if any, are easy to find since the X or Y coordinates
are already known.
• Vertices which are kept after clipping against one window edge are saved for
clipping against the remaining edges.
• Note that the number of vertices usually changes and will often increases.
• We are using the Divide and Conquer approach.
Four Cases of polygon clipping against one edge
The clip boundary determines a visible and invisible region. The edges from vertex i to
vertex i+1 can be one of four types:
• Case 1 : Wholly inside visible region save endpoint
• Case 2 : Exit visible region save the intersection
• Case 3 : Wholly outside visible region save nothing
• Case 4 : Enter visible region save intersection and endpoint
Scaling
Rotation about z axis
Rotation about x axis
Rotation about y axis
25
Computer Graphics and Multimedia
Parallel and Perspective Projection
This is one of the final steps before anything is actually blitted to the screen. A lot of
programmer's choose to include volume clipping in their projection functions because it
would be a waste of time to do projection on points or objects that are out of the viewing
area. We are covering two types of projection, parallel and perspective. Most games use
perspective projection to give the user a real view of the world around them, while
parallel projection is typically used for graphics that don't need that sence of realism.
Here's a Adobe Acrobat (.PDF) version of this tutorial for a nice printed copy here!
Parallel Projection
Parallel projection is just a cheap imitation of what the real world would look like. This is
because it simply ignores the z extent of all points. It is just concerned with the easiest
way of getting the point to the screen. For these reasons, parallel projection is very easy
to do and is good for modeling where perspective would distort construction, but it is not
used as much as perspective projection. For the rest of this tutorial, assume that we are
dealing with members of our point class that we developed. Notice that our test object to
the right after projection each edge is perfectly aligned just as it was in the world. Each
edge that was parallel in the world appears parallel in the projection. The transition from
world coordinates to screen coordinates is very simple.
Remember that our world is designed to be viewed from the center (0,0,0). If we just
used the world coordinates, our view would be incorrect! This is because the upper left of
the screen is (0,0) and this is NOT aigned with what we want. To correct this, we must
add half the screen width and half the screen height to our world coordinates so that the
views are the same. Also notice that we are taking 1*wy. By now you should be able to
guess why! It is because in our world, as the object moves up, its y extent increases. This
is completely opposite of our screen where moving down increases its y extent.
Multiplying the y extent by 1 will correct this problem as well! I told you it was simple!
Perspective Projection
This type of projection is a little more complicted than parallel projection, but the results
are well worth it. We are able to realistically display our world as if we were really
standing there looking around! The same display issues face us as before
1. We need to align the screen (0,0) with the world (0,0,0)
2. We need to correct the y directional error.
We correct the two problems as we did before. We add half the screen width and height
to our point, and then reverse the sign of the y extent. Here's a generalized equation for
the actual perspective projection.
The only real difference between these equations and our parallel relatives are the
26
Computer Graphics and Multimedia
addition of the XSCALE,YSCALE and One Over Z variables. Firstly we define
OneOverZ (double) which stores 1 divided by the z extent of the point. We need to
multiply our points by this because by its nature, as points move further away, they will
move closer to 0. This is exactly what we can notice in the "real" world. If we look into a
plastic tube, the sides look as if they are moving closer to the center. The longer the tube,
the more this effect is noticed. We then define XSCALE and YSCALE. These are used to
adjust our Field Of View. The larger our FOV, the more squished together objects
appear. This makes sense since we have a finite viewing area, and in order to
accommodate a larger FOV, we will have to squish stuff together. I've found that when
displaying objects, each video mode needs a different FOV factor to make it look truely
realistic. Play with the numbers to see what you like the best! Most vary from 40150.
Adjusting one of the FOV variables while leaving the other unchanged will make the
display appear to stretch in that extent. Look at our cube now with a high FOV, the edges
are far from being parallel!
Z-buffering
In computer graphics, z-buffering is the management of image depth coordinates in three-
dimensional (3-D) graphics, usually done in hardware, sometimes in software. It is one
solution to the visibility problem, which is the problem of deciding which elements of a
rendered scene are visible, and which are hidden. The painter's algorithm is another
common solution which, though less efficient, can also handle non-opaque scene
elements. Z-buffering is also known as depth buffering.
When an object is rendered by a 3D graphics card, the depth of a generated pixel (z
coordinate) is stored in a buffer (the z-buffer or depth buffer). This buffer is usually
arranged as a two-dimensional array (x-y) with one element for each screen pixel. If
another object of the scene must be rendered in the same pixel, the graphics card
compares the two depths and chooses the one closer to the observer. The chosen depth is
then saved to the z-buffer, replacing the old one. In the end, the z-buffer will allow the
graphics card to correctly reproduce the usual depth perception: a close object hides a
farther one. This is called z-culling.
The granularity of a z-buffer has a great influence on the scene quality: a 16-bit z-buffer
can result in artifacts (called "z-fighting") when two objects are very close to each other.
A 24-bit or 32-bit z-buffer behaves much better, although the problem cannot be entirely
eliminated without additional algorithms. An 8-bit z-buffer is almost never used since it
has too little precision.
Given: A list of polygons {P1,P2,.....Pn}
Output: A COLOR array, which display the intensity of the visible polygon surfaces.
Initialize:
note : z-depth and z-buffer(x,y) is positive........
z-buffer(x,y)=max depth; and
COLOR(x,y)=background color.
27
Computer Graphics and Multimedia
Begin:
for(each polygon P in the polygon list) do{
for(each pixel(x,y) that intersects P) do{
Calculate z-depth of P at (x,y)
If (z-depth < z-buffer[x,y]) then{
z-buffer(x,y)=z-depth;
COLOR(x,y)=Intensity of P at(x,y);
}
}
}
display COLOR array.
Bézier curve
Definition: A Bézier curve is a curved line or path defined by mathematical equations. It was named after
Pierre Bézier, a French mathematician and engineer who developed this method of computer drawing in the
late 1960s while working for the car manufacturer Renault. Most graphics software includes a pen tool for
drawing paths with Bézier curves.
The most basic Bézier curve is made up of two end points and control handles attached to
each node. The control handles define the shape of the curve on either side of the
common node. Drawing Bézier curves may seem baffling at first; it's something that
requires some study and practice to grasp the geometry involved. But once mastered,
Bezier curves are a wonderful way to draw!
A Bezier curve with three nodes. The center node is selected and the control handles are
visible.
28
Computer Graphics and Multimedia
A Bezier curve with three nodes. The second node (from left) is selected and the control
handles are visible.
B-splines
Definition: B-splines are not used very often in 2D graphics software but are used quite
extensively in 3D modeling software. They have an advantage over Bézier curves in that
they are smoother and easier to control. B-splines consist entirely of smooth curves, but
sharp corners can be introduced by joining two spline curve segments. The continuous
curve of a b-spline is defined by control points. While the curve is shaped by the control
points, it generally does not pass through them
Creature House Expression and Ulead PhotoImpact are two 2D graphics programs that
offer spline curve drawing tools. Drawing with a spline tool is generally more intuitive
and much easier for computer drawing beginners to understand compared to drawing
with a Bézier tool. Because a Bézier curve often changes form as each new curve
segment is placed, it is difficult to predict the outcome unless you understand the
underlying geometry. On the other hand, when drawing splines, the shape of the curve
can be previewed as the pointer is moved and the curve does not change form as new
segments are placed.
Originally, a spline tool was a thin flexible strip of metal or rubber used by draftsman to
aid in drawing curved lines.
The solid line represents an open path created with Expression's B-Spline tool. The points
on the dotted line represent mouse clicks. By moving the control points on the dotted
line, the path can be reshaped.
29
Computer Graphics and Multimedia
The solid line represents a closed path created with Expression's B-Spline tool.
Painter's algorithm
The painter's algorithm, also known as a priority fill, is one of the simplest solutions to
the visibility problem in 3D computer graphics. When projecting a 3D scene onto a 2D
plane, it is necessary at some point to decide which polygons are visible, and which are
hidden.
The name "painter's algorithm" refers to the technique employed by many painters of
painting distant parts of a scene before parts which are nearer thereby covering some
areas of distant parts. The painter's algorithm sorts all the polygons in a scene by their
depth and then paints them in this order, farthest to closest. It will paint over the parts that
are normally not visible — thus solving the visibility problem — at the cost of having
painted redundant areas of distant objects.
The algorithm can fail in some cases, including cyclic overlap or piercing polygons. In
the case of cyclic overlap, as shown in the figure to the right, Polygons A, B, and C
overlap each other in such a way that it is impossible to determine which polygon is
above the others. In this case, the offending polygons must be cut to allow sorting.
Newell's algorithm, proposed in 1972, provides a method for cutting such polygons.
Numerous methods have also been proposed in the field of computational geometry.
The case of piercing polygons arises when one polygon intersects another. As with cyclic
overlap, this problem may be resolved by cutting the offending polygons.
In basic implementations, the painter's algorithm can be inefficient. It forces the system
to render each point on every polygon in the visible set, even if that polygon is occluded
in the finished scene. This means that, for detailed scenes, the painter's algorithm can
overly tax the computer hardware
30
Computer Graphics and Multimedia
UNIT 4
Diffuse Reflection
Diffuse reflection is the reflection of light from a surface such that an incident ray is
reflected at many angles that can be described as casual, rather than at just one precise
angle, which is the case of specular reflection. If a surface is completely nonspecular, the
reflected light will be evenly scattered over the hemisphere surrounding the surface.
A surface built from a non-absorbing powder, such as plaster, or from fibers, such as
paper, or from a polycrystalline material, such as marble, can be a nearly perfect diffuser.
On the opposite side, calm water and glass objects give only specular reflection (not
great, however), while, among common materials, only polished metals can reflect light
specularly with great efficiency (as a matter of fact, the reflecting material of mirrors
usually is aluminum or silver). All other materials, even when perfectly polished, usually
give not more than perhaps a 5 - 10% specular reflection. Except for particular
conditions, such as the total reflection of a glass prism; or when structured in a complex
purposely made configuration, such as the silvery skin of many fish species. A white
material, instead, such as snow, can reflect back all the light it receives, but diffusely, not
specularly.
Mechanism
Diffuse reflection from a solid surface generally is not due, as one can naively think, to
the roughness of its surface: a flat surface is indeed required to give specular reflection,
but it does not cancel diffuse reflection. We can levigate and polish at will a piece of a
white marble, but it continues to be white, and will never become a mirror: simply it will
give a small specular reflection, while the remaining light continues to be difusely
reflected.
The most general mechanism by which a surface gives diffuse reflection does not involve
exactly the surface: most of the light is contributed by internal scattering centers, beneath
the surface. If we imagine that the figure represents snow, and that the polygons are its
(transparent) ice crystallites, we have that an impinging ray is partially reflected (a few
percent) by the first particle, enters in it, is again reflected by the interface with the
second particle, enters in it, impinges on the third, and so on, generating a series of
"primary" scattered rays in random directions, which, in turn, through the same
31
Computer Graphics and Multimedia
mechanism, generate a large number of "secondary" scattered rays, which generate
"tertiary" rays. All these rays walk through the snow crystallytes, which do not absorb
light, until they arrive at the surface and exit in random directions. The result is that we
get back in all directions all the light we sent, so that we can say that snow is white, in
spite of the fact that it is made of transparent objects (ice crystals).
For simplicity, here we have spoken of "reflections", but more generally the interface
between the small particles that constitute many materials is irregular on a scale
comparable with light wavelength, so diffuse light is generated at each interface, rather
than a single reflected ray, but the story can be told the same way.
This mechanism is very general, because almost all common materials are made of
"small things" held togheter. Mineral materials are generally polycrystalline: we can
describe them as made of a 3-D mosaic of small, irregularly shaped defective crystals.
Organic materials are usually composed of fibers or cells, with their membranes and their
complex internal structure. And each interface, inhomogeneity or imperfection can
deviate, reflect or scatter light, reproducing the above mechanism.
Few materials don't follow it: among them metals, which do not allow light to enter into
them; gases, liquids and glass (which has a liquid-like amorphous microscopic structure);
single crystals, such as some gems or a salt crystal; and some very special materials, such
as the tissues which make the cornea and the lens of an eye. These material can reflect
diffusely, however, if their surface is microscopically rough, like in a frost glass (figure
2), or, of course, if their homogeneous structure deteriorates, as in the eye lens.
A surface may also exhibit both specular and diffuse reflection, as is the case, for
example, of glossy paints as used in home painting, which give also a fraction of specular
reflection, while matte paints give almost exclusively diffuse reflection.
Colored objects
We have up to now spoken of white objects, which do not absorb light. But the above
scheme continues to be valid in the case that the material is absorbent. In this case,
diffused rays will lose some wavelengths during their walk in the material, and will
emerge colored.
More, diffusion affects in a substantial manner the color of objects, because it determines
the average path of light in the material, and hence to which extent the various
wavelengths are absorbed. Red ink looks black when it stays in its bottle. Its vivid color
is only perceived when we put it on a scattering material (paper). This is so because
light's path through the paper fibers (and through the ink) is only a fraction of millimeter
long. Light coming from the bottle, instead, has crossed centimeters of ink, and has been
heavily absorbed, even in its red wavelengths.
And, when a colored object has both diffuse and specular reflection, usually only the
diffuse component is colored. A cherry reflects diffusely red light, absorbs all other
colors and has a specular reflection which is essentially white. This is quite general,
because, except for metals, the reflectivity of most materials depends on their refraction
32
Computer Graphics and Multimedia
index, which varies little with the wavelength (though it is this variation that causes the
chromatic dispersion in a prism), so that all colors are reflected nearly with the same
intensity. Reflections from different origin, instead, may be colored: metallic reflections,
such as in gold or copper, or interferential reflections: iridescences, peacock feathers,
butterfly wings, beetle elytra, or the antireflection coating of a lens.
Interreflection
Diffuse interreflection is a process whereby light reflected from an object strikes other
objects in the surrounding area, illuminating them. Diffuse interreflection specifically
describes light reflected from objects which are not shiny or specular. In real life terms
what this means is that light is reflected off non-shiny surfaces such as the ground, walls,
or fabric, to reach areas not directly in view of a light source. If the diffuse surface is
colored, the reflected light is also colored, resulting in similar coloration of surrounding
objects.
In 3D computer graphics, diffuse interreflection is an important component of global
illumination. There are a number of ways to model diffuse interreflection when rendering
a scene. Radiosity and photon mapping are two commonly used methods.
Specular reflection
Unlike Diffusion, Specular reflection is viewpoint dependent. According to Snell's Law,
light striking a specular surface will be reflected at an angle which mirrors the incident
light angle, which makes the viewing angle very important. Specular reflection forms
tight, bright highlights, making the surface appear glossy (Figure 10.3, “Specular
Reflection.”).
Figure 10.3. Specular Reflection.
In reality, Diffusion and Specular reflection are generated by exactly the same process of
light scattering. Diffusion is dominant from a surface which has so much small-scale
roughness in the surface, with respect to wavelength, that light is reflected in many
different directions from each tiny bit of the surface, with tiny changes in surface angle.
33
Computer Graphics and Multimedia
Specular reflection, on the other hand, dominates on a surface which is smooth, with
respect to wavelength. This implies that the scattered rays from each point of the surface
are directed almost in the same direction, rather than being diffusely scattered. It's just a
matter of the scale of the detail. If the surface roughness is much smaller than the
wavelength of the incident light it appears flat and acts as a mirror.
Like Diffusion, Specular reflection has a number of different implementations, or
specular shaders. Again, each of these implementations shares two common parameters:
the Specular colour and the energy of the specularity, in the [0-2] range. This effectively
allows more energy to be shed as specular reflection as there is incident energy. As a
result, a material has at least two different colors, a diffuse, and a specular one. The
specular color is normally set to pure white, but it can be set to different values to obtain
interesting effects.
The four specular shaders are:
l CookTorr - This was Blender's only Specular Shader up to version 2.27. Indeed,
up to that version it was not possible to separately set diffuse and specular shaders
and there was just one plain material implementation. Besides the two standard
parameters this shader uses a third, hardness, which regulates the width of the
specular highlights. The lower the hardness, the wider the highlights.
l Phong - This is a different mathematical algorithm, used to compute specular
highlights. It is not very different from CookTorr, and it is governed by the same
three parameters.
l Blinn - This is a more 'physical' specular shader, thought to match the Oren-Nayar
diffuse one. It is more physical because it adds a fourth parameter, an index of
refraction (IOR), to the aforementioned three. This parameter is not actually used
to compute refraction of rays (a ray-tracer is needed for that), but to correctly
compute specular reflection intensity and extension via Snell's Law. Hardness and
Specular parameters give additional degrees of freedom.
l Toon - This specular shader matches the Toon diffuse one. It is designed to
produce the sharp, uniform highlights of toons. It has no hardness but rather a Size
and Smooth pair of parameters which dictate the extension and sharpness of the
specular highlights.
Thanks to this flexible implementation, which keeps separate the diffuse and specular
reflection phenomena, Blender allows us to easily control how much of the incident light
striking a point on a surface is diffusely scattered, how much is reflected as specularity,
and how much is absorbed. This, in turn, determines in what directions (and in what
amounts) the light is reflected from a given light source; that is, from what sources (and
in what amounts) the light is reflected toward a given point on the viewing plane.
It is very important to remember that the material color is just one element in the
rendering process. The color is actually the product of the light color and the material
color.
34
Computer Graphics and Multimedia
Ray tracing
Ray tracing has the tremendous advantage that it can produce realistic looking images.
The technique allows a wide variety of lighting effects to be implemented. It also permits
a range of primitive shapes which is limited only by the ability of the programmer to
write an algorithm to intersect a ray with the shape.
Ray tracing works by firing one or more rays from the eye point through each pixel. The
colour assigned to a ray is the colour of the first object that it hits, determined by the
object's surface properties at the ray-object intersection point and the illumination at that
point. The colour of a pixel is some average of the colours of all the rays fired through it.
The power of ray tracing lies in the fact that secondary rays are fired from the ray-object
intersection point to determine its exact illumination (and hence colour). This spawning
of secondary rays allows reflection, refraction, and shadowing to be handled with ease.
Ray tracing's big disadvantage is that it is slow. It takes minutes, or hours, to render a
reasonably detailed scene. Until recently, ray tracing had never been implemented in
hardware. A Cambridge company, Advanced Rendering Technologies, is trying to do just
that, but they will probably still not get ray tracing speeds up to those achievable with
polygon scan conversion.
Ray tracing is used where realism is vital. Example application areas are high quality
architectural visualisations, and movie or television special effects.
Gouraud Shading :
In Gouraud Shading, the intensity at each vertex of the polygon is first calculated by
applying equation 1.7. The normal N used in this equation is the vertex normal which is
calculated as the average of the normals of the polygons that share the vertex. This is an
important feature of the Gouraud Shading and the vertex normal is an approximation to
the true normal of the surface at that point. The intensities at the edge of each scan line
are calculated from the vertex intensities and the intensities along a scan line from these.
35
Computer Graphics and Multimedia
The interpolation equations are as follows:
(2.3)
Phong Shading:
Phong Shading overcomes some of the disadvantages of Gouraud Shading and specular
reflection can be successfully incorporated in the scheme. The first stage in the process is
the same as for the Gouraud Shading - for any polygon we evaluate the vertex normals.
For each scan line in the polygon we evaluate by linear intrepolation the normal vectors
at the end of each line. These two vectors Na and Nb are then used to interpolate Ns. we
thus derive a normal vector for each point or pixel on the polygon that is an
approximation to the real normal on the curved surface approximated by the polygon. Ns
, the interpolated normal vector, is then used in the intensity calculation. The vector
interpolation tends to restore the curvature of the original surface that has been
approximated by a polygon mesh. We have :
36
Computer Graphics and Multimedia
These are vector equations that would each be implemented as a set of three equations,
one for each of the components of the vectors in world space. This makes the Phong
Shading interpolation phase three times as expensive as Gouraud Shading. In addition
there is an application of the Phong model intensity equation at every pixel. The
incremental computation is also used for the intensity interpolation:
Color Models in Computer Graphics
YIQ
YIQ is the color space used by the NTSC color TV system, employed mainly in North
and Central America, and Japan. In the U.S., it is currently federally mandated for analog
over-the-air TV broadcasting as shown in this excerpt of the current FCC rules and
regulations part 73 "TV transmission standard":
I stands for in-phase, while Q stands for quadrature, referring to the components used in
quadrature amplitude modulation. Some forms of NTSC now use the YUV color space,
which is also used by other systems such as PAL.
The Y component represents the luma information, and is the only component used by
black-and-white television receivers. I and Q represent the chrominance information. In
YUV, the U and V components can be thought of as X and Y coordinates within the color
space. I and Q can be thought of as a second pair of axes on the same graph, rotated 33°;
therefore IQ and UV represent different coordinate systems on the same plane.
The YIQ system is intended to take advantage of human color-response characteristics.
The eye is more sensitive to changes in the orange-blue (I) range than in the purple-green
37
Computer Graphics and Multimedia
range (Q) — therefore less bandwidth is required for Q than for I. Broadcast NTSC limits
I to 1.3 MHz and Q to 0.4 MHz. I and Q are frequency interleaved into the 4 MHz Y
signal, which keeps the bandwidth of the overall signal down to 4.2 MHz. In YUV
systems, since U and V both contain information in the orange-blue range, both
components must be given the same amount of bandwidth as I to achieve similar color
fidelity.
Very few television sets perform true I and Q decoding, due to the high costs of such an
implementation[citation needed]. Compared to the cheaper R-Y and B-Y decoding which
requires only one filter, I and Q each requires a different filter to satisfy the bandwidth
differences between I and Q. These bandwidth differences also requires that the 'I' filter
include a time delay to match the longer delay of the 'Q' filter. The Rockwell Modular
Digital Radio (MDR) was one I and Q decoding set, which in 1997 could operate in
frame-at-a-time mode with a PC or in realtime with the Fast IQ Processor (FIQP). Some
RCA "Colortrak" home TV receivers made circa 1985 not only used I/Q decoding, but
also advertised its benefits along with its comb filtering benefits as full "100 percent
processing" to deliver more of the original color picture content. Earlier, more than one
brand of color TV (RCA, Arvin) used I/Q decoding in the 1954 or 1955 model year on
models utilizing screens about 13 inches (measured diagonally). The original Advent
projection television used I/Q decoding. Around 1990, at least one manufacturer
(Ikegami) of professional studio picture monitors advertised I/Q decoding.
RGB
There are many models used to measure and describe color. The RGB color model is
based on the theory that all visible colors can be created using the primary additive colors
red, green and blue. These colors are known as primary additives because when
combined in equal amounts they produce white. When two or three of them are combined
in different amounts, other colors are produced. For example, combining red and green in
equal amounts creates yellow, green and blue creates cyan, and red and blue creates
magenta.As you change the amount of red, green and blue you are presented with new
colors. Additionally, when one of these primary additive colors is not present you get
black.
RGB Color in Graphic Design
Types of RGB Color Spaces
38
Computer Graphics and Multimedia
Within the RGB model are different color spaces, and the two most common are sRGB
and Adobe RGB. When working in a graphics software program such as Adobe
Photoshop or Illustrator, you can choose which setting to work in.
sRGB: The sRGB space is best to use when designing for the web, as it is what most
computer monitors use.
Adobe RGB: Because the Adobe RGB space contains a larger selection of colors that are
not available in the sRGB space, it is best to use when designing for print. It is also
recommended for use with photos taken with professional digital cameras (as opposed to
consumerlevel), because highend cameras often use the Adobe RGB space.
CMY and CMYK Color in Graphic Design
Even if the whole color gamut can be created in CMY, black is often not produced by this
system since the color created does not become true black. The reason for this is that the
colored inks always contain minor impurities.
The CMYK color model (process color, four color) is a subtractive color model, used
in color printing, and is also used to describe the printing process itself. CMYK refers to
the four inks used in some color printing: cyan, magenta, yellow, and key black. Though
it varies by print house, press operator, press manufacturer and press run, ink is typically
applied in the order of the abbreviation.
39
Computer Graphics and Multimedia
The “K” in CMYK stands for key since in four-color printing cyan, magenta, and yellow
printing plates are carefully keyed or aligned with the key of the black key plate. Some
sources suggest that the “K” in CMYK comes from the last letter in "black" and was
chosen because B already means blue.[1][2] However, this explanation, though plausible
and useful as a mnemonic, is incorrect.[3]
The CMYK model works by partially or entirely masking colors on a lighter, usually
white, background. The ink reduces the light that would otherwise be reflected. Such a
model is called subtractive because inks “subtract” brightness from white.
In additive color models such as RGB, white is the “additive” combination of all primary
colored lights, while black is the absence of light. In the CMYK model, it is the opposite:
white is the natural color of the paper or other background, while black results from a full
combination of colored inks. To save money on ink, and to produce deeper black tones,
unsaturated and dark colors are produced by using black ink instead of the combination
of cyan, magenta and yellow
UNIT 5
Introduction
42
Computer Graphics and Multimedia
Multimedia is media and content that uses a combination of different content forms. The
term can be used as a noun (a medium with multiple content forms) or as an adjective
describing a medium as having multiple content forms. The term is used in contrast to
media which only use traditional forms of printed or handproduced material. Multimedia
includes a combination of text, audio, still images, animation, video, and interactivity
content forms.
Multimedia is usually recorded and played, displayed or accessed by information content
processing devices, such as computerized and electronic devices, but can also be part of a
live performance. Multimedia (as an adjective) also describes electronic media devices
used to store and experience multimedia content. Multimedia is distinguished from mixed
media in fine art; by including audio, for example, it has a broader scope. The term "rich
media" is synonymous for interactive multimedia. Hypermedia can be considered one
particular multimedia application.
Usage
Multimedia finds its application in various areas including, but not limited to,
advertisements, art, education, entertainment, engineering, medicine, mathematics,
business, scientific research and spatial temporal applications. Several examples are as
follows:
Creative industries
Creative industries use multimedia for a variety of purposes ranging from fine arts, to
entertainment, to commercial art, to journalism, to media and software services provided
for any of the industries listed below. An individual multimedia designer may cover the
spectrum throughout their career. Request for their skills range from technical, to
analytical, to creative.
Commercial
Much of the electronic old and new media used by commercial artists is multimedia.
Exciting presentations are used to grab and keep attention in advertising. Business to
business, and interoffice communications are often developed by creative services firms
for advanced multimedia presentations beyond simple slide shows to sell ideas or liven
up training. Commercial multimedia developers may be hired to design for governmental
services and nonprofit services applications as well.
Entertainment and fine arts
In addition, multimedia is heavily used in the entertainment industry, especially to
develop special effects in movies and animations. Multimedia games are a popular
43
Computer Graphics and Multimedia
pastime and are software programs available either as CDROMs or online. Some video
games also use multimedia features. Multimedia applications that allow users to actively
participate instead of just sitting by as passive recipients of information are called
Interactive Multimedia.
Education
In Education, multimedia is used to produce computerbased training courses (popularly
called CBTs) and reference books like encyclopedia and almanacs. A CBT lets the user
go through a series of presentations, text about a particular topic, and associated
illustrations in various information formats. Edutainment is an informal term used to
describe combining education with entertainment, especially multimedia entertainment.
Engineering
Software engineers may use multimedia in Computer Simulations for anything from
entertainment to training such as military or industrial training. Multimedia for software
interfaces are often done as a collaboration between creative professionals and software
engineers.
Industry
In the Industrial sector, multimedia is used as a way to help present information to
shareholders, superiors and coworkers. Multimedia is also helpful for providing
employee training, advertising and selling products all over the world via virtually
unlimited webbased technology
Multi Media Hardware
VIDEO CAMERAS
With the right adapters, software, and hardware, camcorders and digital video cameras
can be used to capture fullmotion images. Although regular camcorders store video on
film, digital video cameras store images as digital data. This enables the digital images to
be transferred directly into the product being created. Digital video cameras range in
price from under a hundred dollars for small desktop cameras like the Connectix
QuickCam, to thousands of dollars for higherend equipment.
Digital video cameras offer an inexpensive means of getting images into your computer,
44
Computer Graphics and Multimedia
however, you should be aware that the resolution is often quite low and the color is
sometimes questionable.
DIGITAL CAMERAS
Digital cameras allow you to take pictures just as you would with a regular camera, but
without film developing and processing. Unlike regular cameras, photographs are not
stored on film but are instead stored in a digital format on magnetic disk or internal
memory. The photographs can be immediately recognized by the computer and added to
any multimedia product.
SCANNERS
Scanners digitize already developed images including photographs, drawings, pages of
text. By converting these images to a digital format, they can be interpreted and
recognized by the microprocessor of the computer. A better way of scanning larger
images is to use a page or flatbed scanner. These scanners look like small photocopiers.
Page scanners are either grayscale scanners that work well with blackandwhite
photographs or color scanners that can record millions of colors.
GRAPHICS TABLET
MICROPHONES
As is true with most equipment, all microphones are not created equal. If you are
planning to use a microphone for input, you will want to purchase a superior, highquality
microphone because your recordings will depend on its quality.
Next to the original sound, the microphone is the most important factor in any sound
system. The microphone is designed to pick up and amplify incoming acoustic waves or
harmonics precisely and correctly and convert them to electrical signals. Depending on
its sensitivity, the microphone will pick up the sound of someone's voice, sound from a
musical instrument, and any other sound that comes to it. Regardless of the quality of the
other audiosystem components, the true attributes of the original sound are forever lost if
the microphone does not capture them,
Macintosh computers come with a builtin microphone, and more and more PCs that
include Sound Blaster sound cards also include a microphone. These microphones are
45
Computer Graphics and Multimedia
generally adequate for mediumquality sound recording of voiceover's and narration.
These microphones are not adequate for recording music
MIDI HARDWARE
MIDI (Musical Instrument Digital Interface) is a standard that was agreed upon by the
major manufacturers of musical instruments. The MIDI standard was established so
musical instruments could be hooked together and could thereby communicate with one
another.
To communicate, MIDI instruments have an "in" port and an "out" port that enables them
to be connected to one another. Some MIDI instruments also have a "through" port that
allows several MIDI instruments to be daisy chained together.
STORAGE
Multimedia products require much greater storage capacity than textbased data. All
multimedia authors soon learn that huge drives are essential for the enormous files used
in multimedia and audiovisual creation. Floppy diskettes really aren't useful for storing
multimedia products. Even small presentations will quickly consume the 1.44 MB of
storage allotted to a highdensity diskette.
In addition to a hefty storage capacity, a fast drive is also important. This is because large
files, even if they are compressed, take a long time to load and a long time to save and
back up. Consequently, if the drive is slow, frustration and lost productivity will
undoubtedly follow. When purchasing a storage medium, consider the speed of the
devicehow fast it can retrieve and save large files as well as the size of its storage
capacity.
OPTICAL DISKS
Optical storage offers much higher storage capacity than magnetic storage. This makes it
a much better medium for storing and distributing multimedia products that are full of
graphics, audio, and video files. In addition, reading data with lasers is more precise.
Therefore, when working with multimedia, optical storage media such as Magneto
Optical Disks (MO) and CDROM (CD) is more common than magnetic media. Digital
Versatile Disk (DVD), a newer optical storage medium with even greater storage
capacity than a CD, will probably take the place of these other optical media within the
next few years.
CD’S
46
Computer Graphics and Multimedia
CDROM stands for compact disk read only memory. A CDROM can hold about
650MB of data. Because CDs provide so much storage capacity, they are ideal for storing
large data files, graphics, sound, and video. Entire references such as encyclopedias
complete with text and graphics, as well as audio and video to further enhance the
information, can be stored on one CDROM. In addition, interactive components that
enable the user to respond to and control the medium ensure that the user will be even
more attentive and likely to retain information. For these reasons, CDs have been the
medium of choice for publishing multimedia applications.
Because a CDROM is the most common type of optical disk, computers sold today
include a CDROM drive as standard equipment. In fact, in order to have a multimedia
personal computer based on the standards set by the MPC, you must have a CDROM
drive. Therefore, when considering the purchase of a multimedia computer, the important
consideration in regard to the CDROM drive is the speed of transfer.
CDROM speed is measured in kilobytes (KB) per second. This refers to the speed at
which data is transferred from the CD to the computer processor or monitor. Double
speed (2x) CDROM drives can transfer data at a rate of 300 KB per second, quadruple
speed (4x) can transfer data at a rate of 600 KB per second, and so on up to 24x and
higher.
DVDS
DVD, which in some places stands for Digital Versatile Disk, but really doesn't stand for
anything, is the newest and most promising multimedia storage and distribution
technology. DVD technology offers the greatest potential to multimedia because its
storage capacity is extensive.
DVD's are the same size as CDs, but they offer much more storage capacity. DVD's are
either single or double sided. A doublesided DVD is actually two single DVD's glued
together. By using more densely packed data pits together with more closely spaced
tracks, DVD's can store tremendous amounts of data. DVD disk types and capacities
include the following four:
l DVD5: one layer, one sidemax. capacity about 4.7GB.
l DVD9: one layer, dual sidedmax. capacity about 8.5GB.
l DVD10: two layers, one sidemax. capacity about 9GB.
l DVD18: two layers, dual sidedmax. capacity about 17GB.
MONITORS
The image displayed on the computer monitor depends on the quality of the monitor and
software, as well as the capability of the video adapter card. For multimedia applications,
47
Computer Graphics and Multimedia
it is critical that all of these elements work together to create high quality graphic images.
However,
because all display systems are not the same, you will have very little control over how
your images appear on other people's systems. Consequently, it is a good idea to test your
projects on different display systems to see how your digital images appear.
When purchasing a computer monitor to be used with multimedia applications, you will
want to consider purchasing a larger screen. Screen sizes are measured along the diagonal
and range in size from eight to more than 50 inches. You will probably want at least a 17
inch monitor. Though this larger monitor will cost you a bit more, it will prove well
worth it if you intend to spend any time at all either designing or otherwise working with
multimedia applications. In fact, after you have spent some time working with
multimedia applications, you may even want to consider purchasing two monitors if you
are using a Macintosh or PC setup that will support two monitors.
The number of colors that the monitor can display is also important. The number colors is
dependent on the amount of memory installed on the video board as well as the monitor
itself. The number of colors a monitor can ay varies as listed below:
4bit system will display 16 different colors
8bit system will display 256 different colors
16bit system will display 65,636 different colors
A 24bit system will display more than 16 million different colors.
Most monitors can display at least 256 colors (8bit), which is probably adequate for
multimedia presentations, particularly if the presentation is delivered via the Web, but it
may not be adequate for video. Eightbit images are the most compatible across multiple
platforms and they also take up very little disk space. Computer monitors capable of
displaying thousands of colors (16bit) are quickly becoming the multimedia standard.
Images on these displays not only look better, they also display much faster.
Data and File format standards
Graphic images may be stored in a wide variety of file formats. In choosing a format, you
should consider how and where the image will be used. This is because the application
must support the file format you select. Some formats are proprietary while others have
become universally supported by the graphics industry. Though proprietary formats may
function perfectly in their own environment, their lack of compatibility with other
systems can create problems.
In the Macintosh environment, the PICT format, a vectorbased file format, is the image
format supported by almost all Macintosh applications. Recently, the Windows
environment has standardized on the BMP file format. Prior to this, there were multiple
48
Computer Graphics and Multimedia
file formats under DOS that made it difficult to transfer graphic files from one application
to another. The most common file formats are described below.
TIFF (TAGGED IMAGE FILE FORMAT)
The TIFF file format is probably the most widely used bitmapped file format. Image
editing applications, scanning software, illustration programs, pagelayout programs, and
even word processing programs support TIFF files. The TIFF format works for all types
of images and supports bit depths from 1 to 32 bits. In addition, TIFF is cross platform.
Versions are available for the Mac, PC, and UNIX systems. The TIFF file format is often
used when the output is printed.
BMP (SHORT FOR BITMAP)
The BMP format has been adopted as the standard bitmapped format on the Windows
platform. It is a very basic format supported by most Windows applications. It is also the
most efficient format to use with Windows.
GIF (GRAPHICS INTERCHANGE FORMAT)
CompuServe created this format. Consequently, you may see it listed as CompuServe
GIF. It is one of two standard formats used on the Web without plugins. It is the method
of storing bitmaps on the Web. The GIF format only supports up to 256 colors.
PICT/PICT2 (SHORT FOR PICTURE)
These are formats for the Macintosh. They are generally used only for screen display.
Some Mac programs can only import images saved as either PICT or EPS. Unlike the
EPS format, PICT does not provide information for separations, which means graphics
saved with this file format will be smaller than EPS files. PICT2 added additional levels
of color to the PICT format.
JPEG (JOINT PHOT04GRALPHIC EXPERTS GROUP)
This format creates a very compact file. Because of its small file size, it is easy to
transmit across networks. Consequently, it is one of only two graphic file formats
supported by the World Wide Web without plugins. Do keep in mind that in order to
make the file so small, lossy compression is used when a file is saved or converted to this
format. This means pixels will be removed from the image. JPEG files are bitmapped
images.
49
Computer Graphics and Multimedia
CD (PHOTO CD)
This is Kodak's Photo CD graphics file format. It is a bitmapped format that contains
different image sizes for each photograph.
This is only a small sample of all of the different graphic file formats available. There are
also many proprietary formats. If you plan to transfer files from application to
application, consider using the most common file format supported by all of your
applications. If you get stuck because an application does not support a graphics file
format, graphic conversion software is available to help you change the file format so that
you can import and export graphic images from almost any application to another.
Keep in mind, that in most situations, commercial image providers are only selling the
rights to use the image, they are not selling the image itself. In other words, they may sell
you the right to use the image in one multimedia application, but the image does not
become your property. If you want to use it again in a different multimedia application,
you may very well have to pay another royalty. The agreements vary depending on the
image, the original artist, and the commercial image provider. Take caution and read the
licensing agreement carefully before you include an image from a CD or the Web in
a multimedia application. Just because you purchased the CD or were given access to the
image on the Web, that doesn't necessarily mean you own it.
Audio in Multi Media
Audio has long fought for equal billing with video. With the acceptance of stereo
sound,.home theaters and surround sound, audio has made great strides in the traditional
video world. But the battle is being fought all over again in the multimedia world of
QuickTime and Video for Windows. We worry about the smallest detail of video
compression methods, data rates, and color palettes, but all too often handle the audio as
an afterthought. While there are some limitations to what we can achieve with audio in
multimedia applications, proper care can yield far better results than the default case we
often settle for.
The familiar audio CD is composed of 16bit samples at a 44.1 KHz data rate. While this
rate is supported by the latest computer sound cards, handling that much audio data can
tax even highend systems and reduce video performance. Remember that when
establishing the target data rate for compressed video, the audio data rate must be
subtracted from the otal, with whatever is left over available for video. Every bit of extra
quality we give to the audio side comes straight out of the video side. Simply throwing
more data at the audio to improve its quality is not a good solution. That stereo CD
quality audio we all want needs 176.4 K bytes/second, more than we usually allocate to
the combination of audio and video together!
50
Computer Graphics and Multimedia
The most common data rates used for audio in the multimedia environment are 22.050
KHz and 11.025 KHz, both submultiples of the 44.1 KHz CD rate. Lower rates are also
available, but are really only useful for lowquality voice. The sample rate we choose
determines the maximum frequency that can be reproduced. Sampling theory tells us that
the maximum accurately reproducible frequency can be no more than half the sample
rate. This "half the sample rate" frequency is known as the Nyquist limit. Keep in mind,
however, that this is the theoretical maximum; in the real world many factors conspire to
keep you from actually getting to that limit.
The sample size will be either 8 or 16bits. The most universal is 8bit, but most sound
cards sold today are 16bit, slowly pushing the 8bit cards out of the installed base. The
sample size determines both the maximum dynamic range and the signal to noise ratio
sample. While 16bit audio has a respectable 98 dB theoretical SNR, 8bits yields less
than 50 dB SNR.
The quality of your audio digitizing card is also important. Many sound cards add the
audio input as an afterthought and have serious distortion in their input stages. Also,
placing audio gear into a computer box filled with digital signals invites all sorts of
interference and noise problems, particularly if working with microphone level inputs.
Choose a digitizing card that has proper shielding and a good audio input section, or you
will limit your results before you even get something digitized. If you are digitizing from
a microphone, you will be better off using an external preamp to boost the signal to line
level before feeding the digitizer card.
So what can be done when we are forced to use mono, 11 KHz, 8bit samples because of
data rate limitations, when we know that we will be limited to a 5.5 KHz frequency range
and a tiny dynamic range? Not to worry. With proper care in producing the audio, we can
get surprisingly good results. And if we can allow ourselves to move to a 22 KHz sample
rate, we can get something darn good.
The first thing to do is to make sure that no frequencies above the Nyquist limit are ever
sampled. This means inserting a lowpass filter into the audio before we ever get to the
digitizing card. Adjust it so that nothing above the Nyquist limit will be passed through.
Remember that audio filters are analog devices, so set the cutoff frequency somewhat
below the Nyquist limit to allow for the slope of the filter.
Next we need to work on the dynamic range of the material. Remember that digital audio
has no "headroom". Once you hit 0 VU there is no more room to encode audio. If you
think tape saturation sounds bad, try listening to digital clipping! To avoid clipping, use
an audio compressor/limiter to compress the audio signal, reducing the dynamic range,
and to limit it, making sure we never exceed the maximum signal level. Keep in mind
that this is analog "compression" and is not related to the digital "compression" that we
do on digitized video and audio data.
Unlike video, compression of the digital audio data is not common. This is partly because
the payoff for compression is not as great, what with audio having less data to compress,
51
Computer Graphics and Multimedia
and also because most compression algorithms (Alaw, mulaw, etc.) came from the
telephony industry and were designed for voice. Most highfidelity compression
algorithms have been proprietary.
Data Compression
In computer science and information theory, data compression or source coding is the
process of encoding information using fewer bits (or other informationbearing units)
than an unencoded representation would use, through use of specific encoding schemes.
As with any communication, compressed data communication only works when both the
sender and receiver of the information understand the encoding scheme. For example,
this text makes sense only if the receiver understands that it is intended to be interpreted
as characters representing the English language. Similarly, compressed data can only be
understood if the decoding method is known by the receiver.
Compression is useful because it helps reduce the consumption of expensive resources,
such as hard disk space or transmission bandwidth. On the downside, compressed data
must be decompressed to be used, and this extra processing may be detrimental to some
applications. For instance, a compression scheme for video may require expensive
hardware for the video to be decompressed fast enough to be viewed as it is being
decompressed (the option of decompressing the video in full before watching it may be
inconvenient, and requires storage space for the decompressed video). The design of data
compression schemes therefore involves tradeoffs among various factors, including the
degree of compression, the amount of distortion introduced (if using a lossy compression
scheme), and the computational resources required to compress and uncompress the data.
Lossless versus lossy compression
Lossless compression algorithms usually exploit statistical redundancy in such a way as
to represent the sender's data more concisely without error. Lossless compression is
possible because most realworld data has statistical redundancy. For example, in English
text, the letter 'e' is much more common than the letter 'z', and the probability that the
letter 'q' will be followed by the letter 'z' is very small. Another kind of compression,
called lossy data compression or perceptual coding, is possible if some loss of fidelity is
acceptable. Generally, a lossy data compression will be guided by research on how
people perceive the data in question. For example, the human eye is more sensitive to
subtle variations in luminance than it is to variations in color. JPEG image compression
works in part by "rounding off" some of this lessimportant information. Lossy data
compression provides a way to obtain the best fidelity for a given amount of
compression. In some cases, transparent (unnoticeable) compression is desired; in other
cases, fidelity is sacrificed to reduce the amount of data as much as possible.
Lossless compression schemes are reversible so that the original data can be
reconstructed, while lossy schemes accept some loss of data in order to achieve higher
52
Computer Graphics and Multimedia
compression.
However, lossless data compression algorithms will always fail to compress some files;
indeed, any compression algorithm will necessarily fail to compress any data containing
no discernible patterns. Attempts to compress data that has been compressed already will
therefore usually result in an expansion, as will attempts to compress all but the most
trivially encrypted data.
In practice, lossy data compression will also come to a point where compressing again
does not work, although an extremely lossy algorithm, like for example always removing
the last byte of a file, will always compress a file up to the point where it is empty.
An example of lossless vs. lossy compression is the following string:
25.888888888
This string can be compressed as:
25.[9]8
Interpreted as, "twenty five point 9 eights", the original string is perfectly recreated, just
written in a smaller form. In a lossy system, using
26
instead, the exact original data is lost, at the benefit of a smaller file.
Example for Lossy and Lossless
Lossy
Lossy image compression is used in digital cameras, to increase storage capacities with
minimal degradation of picture quality. Similarly, DVDs use the lossy MPEG2 Video
codec for video compression.
In lossy audio compression, methods of psychoacoustics are used to remove nonaudible
(or less audible) components of the signal. Compression of human speech is often
performed with even more specialized techniques, so that "speech compression" or "voice
coding" is sometimes distinguished as a separate discipline from "audio compression".
Different audio and speech compression standards are listed under audio codecs. Voice
compression is used in Internet telephony for example, while audio compression is used
for CD ripping and is decoded by audio players.
Lossless
The LempelZiv (LZ) compression methods are among the most popular algorithms for
53
Computer Graphics and Multimedia
lossless storage. DEFLATE is a variation on LZ which is optimized for decompression
speed and compression ratio, therefore compression can be slow. DEFLATE is used in
PKZIP, gzip and PNG. LZW (LempelZivWelch) is used in GIF images. Also
noteworthy are the LZR (LZRenau) methods, which serve as the basis of the Zip
method. LZ methods utilize a tablebased compression model where table entries are
substituted for repeated strings of data. For most LZ methods, this table is generated
dynamically from earlier data in the input. The table itself is often Huffman encoded (e.g.
SHRI, LZX). A current LZbased coding scheme that performs well is LZX, used in
Microsoft's CAB format.
The very best compressors use probabilistic models, in which predictions are coupled to
an algorithm called arithmetic coding. Arithmetic coding, invented by Jorma Rissanen,
and turned into a practical method by Witten, Neal, and Cleary, achieves superior
compression to the betterknown Huffman algorithm, and lends itself especially well to
adaptive data compression tasks where the predictions are strongly contextdependent.
Arithmetic coding is used in the bilevel imagecompression standard JBIG, and the
documentcompression standard DjVu. The text entry system, Dasher, is an inverse
arithmeticcoder.
Video
AVI
Audio Video Interleave, known by its acronym AVI, is a multimedia container format
introduced by Microsoft in November 1992 as part of its Video for Windows technology.
AVI files can contain both audio and video data in a file container that allows
synchronous audiowithvideo playback. Like the DVD video format, AVI files support
multiple streaming audio and video, although these features are seldom used. Most AVI
files also use the file format extensions developed by the Matrox OpenDML group in
February 1996. These files are supported by Microsoft, and are unofficially called "AVI
2.0".
AVI is a derivative of the Resource Interchange File Format (RIFF), which divides a
file's data into blocks, or "chunks." Each "chunk" is identified by a FourCC tag. An AVI
file takes the form of a single chunk in a RIFF formatted file, which is then subdivided
into two mandatory "chunks" and one optional "chunk".
The first subchunk is identified by the "hdrl" tag. This subchunk is the file header and
contains metadata about the video, such as its width, height and frame rate. The second
subchunk is identified by the "movi" tag. This chunk contains the actual audio/visual
data that make up the AVI movie. The third optional subchunk is identified by the
"idx1" tag which indexes the offsets of the data chunks within the file.
By way of the RIFF format, the audio/visual data contained in the "movi" chunk can be
encoded or decoded by software called a codec, which is an abbreviation for
(en)coder/decoder. Upon creation of the file, the codec translates between raw data and
54
Computer Graphics and Multimedia
the (compressed) data format used inside the chunk. An AVI file may carry audio/visual
data inside the chunks in virtually any compression scheme, including Full Frame
(Uncompressed), Intel Real Time (Indeo), Cinepak, Motion JPEG, Editable MPEG,
VDOWave, ClearVideo / RealVideo, QPEG, and MPEG4 Video.
3GP
3GP (3GPP file format) is a multimedia container format defined by the Third
Generation Partnership Project (3GPP) for 3G UMTS multimedia services. It is used on
3G mobile phones but can also be played on some 2G and 4G phones.
3G2 (3GPP2 file format) is a multimedia container format defined by the 3GPP2 for 3G
CDMA2000 multimedia services. It is very similar to the 3GP file format, but has some
extensions and limitations in comparison to 3GP.
3GP is defined in the ETSI 3GPP technical specification.3GP is a required file format for
video and associated speech/audio media types and timed text in ETSI 3GPP technical
specifications for IP Multimedia Subsystem (IMS), Multimedia Messaging Service
(MMS), Multimedia Broadcast/Multicast Service (MBMS) and Transparent endtoend
Packetswitched Streaming Service (PSS).
3G2 is defined in 3GPP2 technical specification.
The 3GP and 3G2 file formats are both structurally based on the ISO base media file
format defined in ISO/IEC 1449612 MPEG4 Part 12 but older versions of the 3GP
file format did not use some of its features.3GP and 3G2 are container formats similar to
MPEG4 Part 14 (MP4), which is also based on MPEG4 Part 12. The 3GP and 3G2 file
format were designed to decrease storage and bandwidth requirements in order to
accommodate mobile phones.
3GP and 3G2 are similar standards, but with some differences:
l 3GPP file format was designed for GSMbased Phones and may have the filename
extension .3gp
l 3GPP2 file format was designed for CDMAbased Phones and may have the
filename extension .3g2
The 3GP file format stores video streams as MPEG4 Part 2 or H.263 or MPEG4 Part 10
(AVC/H.264), and audio streams as AMRNB, AMRWB, AMRWB+, AACLC, HE
AAC v1 or Enhanced aacPlus (HEAAC v2). 3GPP allowed use of AMR and H.263
codecs in the ISO base media file format (MPEG4 Part 12), because 3GPP specified the
usage of the Sample Entry and template fields in the ISO base media file format as well
as defining new boxes to which codecs refer. These extensions were registered by the
registration authority for codepoints in ISO base media file format ("MP4 Family"
files).For the storage of MPEG4 media specific information in 3GP files, the 3GP
specification refers to MP4 and the AVC file format, which are also based on the ISO
55
Computer Graphics and Multimedia
base media file format. The MP4 and the AVC file format specifications described usage
of MPEG4 content in the ISO base media file format.
The 3G2 file format can store the same video streams and most of the audio streams used
in the 3GP file format. In addition, 3G2 stores audio streams as EVRC, EVRCB, EVRC
WB, 13K (QCELP), SMV or VMRWB, which was specified by 3GPP2 for use in ISO
base media file format.The 3G2 specification also defined some enhancements to 3GPP
Timed Text. 3G2 file format does not store Enhanced aacPlus (HEAAC v2) and AMR
WB+ audio streams.For the storage of MPEG4 media (AAC audio, MPEG4 Part 2
video, MPEG4 Part 10 H.264/AVC) in 3G2 files, the 3G2 specification refers to the
MP4 file format and the AVC file format specification, which described usage of this
content in the ISO base media file format. For the storage of H.263 and AMR content
3G2 specification refers to the 3GP file format specification.
MPEG
The Moving Picture Experts Group (MPEG) is a working group of experts that was
formed by the ISO to set standards for audio and video compression and transmission. It
was established in 1988 and its first meeting was in May 1988 in Ottawa, Canada. As of
late 2005, MPEG has grown to include approximately 350 members per meeting from
various industries, universities, and research institutions. MPEG's official designation is
ISO/IEC JTC1/SC29 WG11 Coding of moving pictures and audio (ISO/IEC Joint
Technical Committee 1, Subcommittee 29, Working Group 11).
The MPEG compression methodology is considered asymmetric as the encoder is more
complex than the decoder. The encoder needs to be algorithmic or adaptive whereas the
decoder is 'dumb' and carries out fixed actions..This is considered advantageous in
applications such as broadcasting where the number of expensive complex encoders is
small but the number of simple inexpensive decoders is large. The MPEG's (ISO's)
approach to standardization is novel, because it is not the encoder that is standardized, but
the way a decoder interprets the bitstream. A decoder that can successfully interpret the
bitstream is said to be compliant. The advantage of standardizing the decoder is that over
time encoding algorithms can improve, yet compliant decoders continue to function with
them.The MPEG standards give very little information regarding structure and operation
of the encoder and implementers can supply encoders using proprietary algorithms.This
gives scope for competition between different encoder designs, which means better
designs can evolve and users have greater choice, because encoders of different levels of
cost and complexity can exist, yet a compliant decoder operates with all of them.
MPEG also standardizes the protocol and syntax under which it is possible to combine or
multiplex audio data with video data to produce a digital equivalent of a television
program. Many such programs can be multiplexed and MPEG defines the way such
multiplexes can be created and transported. The definitions include the metadata used by
decoders to demultiplex correctly.
56
Computer Graphics and Multimedia
Image Processing