You are on page 1of 78

Chapter Four

Color in Image and Video

Topics Fundamental Concepts in Video

Basics of Digital Audio

Prepared by Temesgen T.(MSc)


Color Science
• Light and Spectra Light is an electromagnetic wave.
• Its color is characterized by the wavelength content of the light.
• a) Laser light consists of a single wavelength: e.g., a ruby laser
produces a bright, scarlet-red beam.
• b) Most light sources produce contributions over many wavelengths.
• c) However, humans cannot detect all light, just contributions that fall
in the visible wavelengths".
• d) Short wavelengths produce a blue sensation, long wavelengths
produce a red one.
Cont’d…
• Spectrophotometer: device used to measure visible light, by
reflecting light from a diffraction grating (a ruled surface) that spreads
out the different wavelengths.
• Visible light is an electromagnetic wave in the range 400 nm to 700
nm (where nm stands for nanometer, 10-9 meters)
Visible Light
Human Vision
• The eye works like a camera, with the lens focusing an image onto the retina
(upside-down and left-right reversed).
• The retina consists of an array of rods and three kinds of cones.
• The rods come into play when light levels are low and produce a image in
shades of gray (all cats are gray at night!").
• For higher light levels, the cones each produce a signal. Because of their
differing pigments, the three kinds of cones are most sensitive to red (R),
green (G), and blue (B) light.
• It seems likely that the brain makes use of differences R-G, G-B, and B-R, as
well as combining all of R, G, and B into a high-light-level achromatic channel.
Spectral Sensitivity of the Eye
• The eye is most sensitive to light in the middle of the visible spectrum.
• The sensitivity of our receptors is also a function of wavelength (Fig. 4.2
below).
• The Blue receptor sensitivity is not shown to scale because it is much
smaller than the curves for Red or Green | Blue is a late addition, in
evolution.
• Statistically, Blue is the favorite color of humans, regardless of
nationality perhaps for this reason:
• Blue is a latecomer and thus is a bit surprising!
Image formation graph
Cont’d…
• Fig. 4.2 shows the overall sensitivity as a dashed line.
• This important curve is called the luminous-efficiency function.
• It is usually denoted V (λ) and is formed as the sum of the response
curves for Red, Green, and Blue.
• The rod sensitivity curve looks like the luminous-efficiency function V
(λ) but is shifted to the red end of the spectrum.
• The achromatic channel produced by the cones is approximately
proportional to 2R + G + B/20
Cont’d…
• The eye has about 6 million cones, but the proportions of R, G, and B
cones are different.
• They likely are present in the ratios 40:20: 1
• Therefore, achromatic channel produced by the cones is thus
something like 2R + G + B /20
• These spectral sensitivity functions are usually denoted by some
other letters than R, G, and B, so here let us denote them by the
vector function q (λ), with components.
Cont’d…
• The response in each color channel in the eye is proportional to the
number of neurons firing.
• Again thinking of these sensitivities as continuous functions, we can
succinctly write down this idea in the form of an integral:
Image Formation
• Surfaces reflect different amounts of light at different wavelengths,
and dark surfaces reflect less energy than lights.
Cont’d…
• Light from the illuminant with SPD E(λ) impinges on a surface, with
surface spectral reflectance function S(λ), is reflected, and then is
filtered by the eye’s cone functions q (λ).
• The function C(λ) is called the color signal and consists of the product
of E(λ), the illuminant, times S(λ), the reflectance: C(λ) = E(λ)
Cont’d…
• The terms E(λ) and S(λ) represent the spectral energy and the spectral
sensitivity, respectively.
• The spectral energy E(λ) refers to the amount of light energy at each
wavelength λ, while the spectral sensitivity S(λ) represents the
response of the camera sensor to light at each wavelength λ.
Equation for image formation
• The equations that take into account the image formation model are:
• R = ∫E(λ) S(λ) qR(λ) dλ
• G = ∫E(λ) S(λ) qG(λ) dλ
• B = ∫E(λ) S(λ) qB(λ) dλ
• The terms qR(λ), qG(λ), and qB(λ) represent the spectral color-matching
functions, which describe how the spectral energy at each wavelength λ is
mapped to the RGB values.
• The integral sign and the differential term dλ indicate that the equation
involves an integration over the entire visible spectrum of light, which is
typically between 400-700 nm.
Camera Systems
• Camera systems are made in a similar fashion; a studio quality camera
has three signals produced at each pixel location (corresponding to a
retinal position).
• Analog signals are converted to digital, truncated to integers, and
stored. If the precision used is 8-bit, then the maximum value for any
of R; G; B is 255, and the minimum is 0.
• However, the light entering the eye of the computer user is that
which is emitted by the screen is essentially a self-luminous source.
• Therefore we need to know the light E(λ) entering the eye.
Gamma Correction
• The light emitted is in fact roughly proportional to the voltage raised to a power; this
power is called gamma, with symbol γ.
• (a) Thus, if the file value in the red channel is R, the screen emits light proportional to R
γ , with SPD equal to that of the red phosphor paint on the screen that is the target of
the red channel electron gun.
• The value of gamma is around 2.2.
• (b) It is customary to append a prime to signals that are gamma-corrected by raising to
the power (1/γ) before transmission. Thus we arrive at linear signals:
• R→ R ! = R 1/γ )=>(R !) /γ → R
• In RGB colors, gamma affects each of the three primary colors (red, green, and blue)
differently.
• Gamma correction is used to adjust the brightness of each color channel to produce a
more accurate and consistent display of colors on the screen.
Color-Matching Functions
• Color-matching functions (CMFs) are mathematical functions that describe the
sensitivity of the human eye to different wavelengths of light.
• These functions are used to predict how a particular color stimulus will be
perceived by the human eye.
• The particular set of three basic lights used in an experiment are called the set of
color primaries.
• To match a given color, a subject is asked to separately adjust the brightness of
the three primaries using a set of controls until the resulting spot of light most
closely matches the desired color.
• The basic situation is shown in the following figure.
• A device for carrying out such an experiment is called a colorimeter.
Cont’d…
• The human eye has three types of color receptors, or cones, that are
sensitive to different parts of the visible spectrum.
• These cones are commonly referred to as S-cones (short-wavelength),
M-cones (medium-wavelength), and L-cones (long-wavelength).
• Color-matching functions describe the amount of each primary color
(red, green, and blue) that is required to match a particular color
stimulus.
• These functions are typically represented as a set of three curves, one
for each type of cone.
Cont’d…
Color Models in Images
• Colors models and spaces used for stored, displayed, and printed
images
• RGB Color Model for Cathode Ray Tube(CRT) Displays
• We expect to be able to use 8 bits per color channel for color that is
accurate enough.
• However, in fact we have to use about 12 bits per channel to avoid
aliasing effect in dark image areas | contour bands that result from
gamma correction.
Cont’d…
• For images produced from computer graphics, we store integers
proportional to intensity in the frame buffer.
• A look-up table (LUT) is a method used to implement gamma
correction in digital image processing.
• The LUT contains a set of predetermined gamma values that are used
to map the input image data to the output image data.
• If gamma correction is applied to floats before quantizing to integers,
before storage in the frame buffer, then in fact we can use only 8 bits
per channel and still avoid contouring artifacts.
Subtractive Color: CMY Color Model
• Additive color: when two light beams impinge on a target, their colors add;
when two phosphors on a CRT screen are turned on, their colors add.
• But for ink deposited on paper, the opposite situation holds: yellow ink
subtracts blue from white illumination, but reflects red and green; it
appears yellow.
• Instead of red, green, and blue primaries, we need primaries that amount
to -red, -green, and -blue. I.e., we need to subtract R, or G, or B.
• These subtractive color primaries are Cyan (C), Magenta (M) and Yellow (Y ).
• In this model, colors are created by subtracting (or absorbing) certain colors
from white light.
Cont’d…
Transformation from RGB to CMY
• Simplest model we can invent to specify what ink density to lay down
on paper, to make a certain desired RGB color:

• Fig.: color combinations that result from combining primary colors


available in the two situations, additive color and subtractive colors.
Cont’d…
Example
• Suppose you have an RGB color with the values R=200, G=100, B=50.
To transform this color to CMY, you would use the following formula:
• C = 255 - R = 255 - 200 = 55
• M = 255 - G = 255 - 100 = 155
• Y = 255 - B = 255 - 50 = 205
• Therefore, the CMY values for this color are C=55, M=155, and Y=205.
Color Models in Video
• Video Color Transforms
• When you press the video record button on the camera, the sensor
starts to record a continuous stream of images.
• Largely derive from older analog methods of coding color for TV.
• Luminance(brightness) is separated from color information.
• For example, a matrix transform method called YIQ is used to transmit
TV signals in North America and Japan.
• This coding also makes its way into VHS video tape coding in these
countries since video tape technologies also use YIQ
Cont’d…
• In Europe, video tape uses the PAL or SECAM codings, which are
based on TV that uses a matrix transform called YUV.
• Finally, digital video mostly uses a matrix transform called YCbCr that
is closely related to YUV
YUV Color Model
• YUV codes a luminance signal (for gamma-corrected signals) equal to
Y 0 in the \luma".
• Chrominance refers to the difference between a color and a reference
white at the same luminance. -! use color differences U, V :
• U = B0 - Y 0 ;
• V = R0 - Y 0
• Fig. 4.18 shows the decomposition of a color image into its Y 0, U, V
components. Since both U and V go negative, in fact the images
displayed are shifted and rescaled
How this achieved?
• To achieve luminance from RGB, its intensities are choosen to yield
standard luminosity curve.
• Since Y has luminance(brightness), the Black and White TV selects
only Y component from YIQ
• I component consists of orange cyan hue information.
• The Q component carries Magenta green hue information.
Cont’d…
YIQ Color Model
• YIQ is used in NTSC color TV broadcasting.
• It is used in analog color television systems.
• Again, gray pixels generate zero (I; Q) chrominance signal.
• I and Q are a rotated version of U and V .
• Y in YIQ is the same as in YUV; U and V are rotated by 33◦:
• I = 0:492111(R0 - Y 0) cos 33◦ - 0:877283(B0 - Y 0) sin 33◦
• Q = 0:492111(R0 - Y 0) sin 33◦ + 0:877283(B0 - Y 0) cos 33◦
• One of the main advantages of the YIQ color model is that it is designed to be
compatible with black-and-white television sets, allowing color broadcasts to be
received on both color and black-and-white sets without affecting the image
quality on the latter.
Cont’d…
YCbCr Color Model
• The Rec. 601 standard for digital video uses another color space, Y CbCr,
often simply written YCbCr | closely related to the YUV transform.
• YUV is changed by scaling such that Cb is U, but with a coefficient of 0.5
multiplying B0.
• In some software systems, Cb and Cr are also shifted such that values
are between 0 and 1.
• This makes the equations as follows:
• Cb = ((B0 - Y 0)=1:772) + 0:5
• Cr = ((R0 - Y 0)=1:402) + 0:5
Rationale
• Cathode ray tube displays are driven by red, green, and blue voltage
signals, but these RGB signals are not efficient as a representation for
storage and transmission, since they have a lot of redundancy.
• Y′CbCr is used to separate out a luma signal (Y′) that can be stored
with high resolution or transmitted at high bandwidth, and two
chroma components (CB and CR) that can be bandwidth-reduced,
subsampled, compressed, or otherwise treated separately for
improved system efficiency.
Cont’d…
• YCbCr is a color model used in digital video and image processing.
• It is a form of YUV color space, where Y stands for luma or brightness
and Cb and Cr represent chroma or color difference signals.
• The Cb and Cr components are calculated by subtracting the luma
value from the blue and red color values respectively, and then
subsampling them to reduce the amount of data required to
represent the image or video.
• The YCbCr color model is used in digital video compression, such as
MPEG and H.264.
HSL/HSV Color Model
• HSL stands for Hue, Saturation, and Lightness, while HSV stands for
Hue, Saturation, and Value.
• These color models are used to define colors based on their
perceptual qualities, such as hue, saturation, and brightness.
• HSL/HSV is useful in video editing, as it allows for easy adjustment of
color and brightness levels.
Fundamental Concepts in Video
• Digital video is represented as a sequence of digital images.
• Types of Video signals
• Video signals can be organized in three different ways: Component
video, Composite video, and S-video
• Component Video
• Higher-end video systems, such as for studios, make use of three
separate video signals for the red, green, and blue image planes.
• This kind of system has three wires (and connectors) connecting the
camera or other devices to a TV or monitors.
Cont’d…
• we can form three signals via a luminance chrominance
transformation of the RGB signals - for example, YIQ or YUV.
• In contrast, most computer systems use component video, with
separate signals for R, G, and B signals.
• For any color separation scheme, component video gives the best
color reproduction, since there is no "crosstalk" between the three
different channels, unlike composite video or S-video.
• Component video, however, requires more bandwidth and good
synchronization of the three components.
composite video
• In composite video, color ("chrominance") and intensity ("luminance")
signals are mixed into a single carrier wave.
• Chrominance is a composite of two color components (I and Q, or U and
V).
• This is the type of signal is used by broadcast color TVs; it is downward
compatible with black-and-white.
• In NTSC TV, I and Q are combined into a chroma signal, and a color sub-
carrier then puts the chroma signal at the higher frequency end of the
channel shared with the luminance signal.
• The chrominance and luminance components can be separated at the
receiver end, and the two color components can be further recovered.
S-Video
• As a compromise, S-video (separated video, or super-video, e.g" in S-Video
Home System(VHS) uses two wires: one for luminance(brightness or intensity)
and another for a composite chrominance signal(color information or
saturation of image and video).
• As a result, there is less crosstalk between the color information and the crucial
gray-scale information.
• The reason for placing luminance into its own part of the signal is that black-
and-white information is crucial for visual perception.
• As noted in the previous slide, humans are able to differentiate spatial
resolution in grayscale images much better than for the color part of color
images.
Analog Video
• Most TV is still sent and received as an analog signal.
• Once the electrical signal is received, we may assume that brightness
is at least a monotonic function of voltage, if not necessarily linear,
because of gamma correction.
• An analog signal f(t) samples a time-varying image. So-called
progressive scanning traces through a complete picture (a frame) row-
wise for each time interval.
• A high resolution computer monitor typically uses a time interval of
1/72 second
Cont’d…
• In TV and in some monitors and multimedia standards, another
system, interlaced scanning, is used.
• Here, the odd-numbered lines are traced first, then the even-
numbered lines.
• This results in "odd" and "even" fields - two fields make up one frame.
• In fact, the odd lines (starting from 1) end up at the middle of a line at
the end of the odd field, and the even scan starts at a half
Cont’d…
Cont’d…
• Interlacing was invented because, when standards were being
defined, it was difficult to transmit the amount of information in a full
frame quickly enough to avoid flicker.
• The double number of fields presented to the eye reduces perceived
flicker.
• Because of interlacing, the odd and even lines are displaced in time
from each other.
• This is generally not noticeable except when fast action is taking place
onscreen, when blurring may occur.
National Television System
Committee(NTSC Video)
• The NTSC TV standard is mostly used in North America and Japan.
• The video signal is transmitted using a method called interlacing, where every
other line of the image is transmitted in one frame, and the remaining lines are
transmitted in the next frame.
• This helps to reduce flicker on the screen.
• It uses a familiar aspect ratio (i.e., the ratio of picture width to height) and 525 sc
4:3an lines per frame at 30 frames per second
• More exactly, for historical reasons NTSC uses 29.97 fps -or, in other words,
33.37 msec per frame
• NTSC follows the interlaced scanning system, and each frame is divided into two
fields, with 262.5 lines/field.
PAL Video(simultaneous)
• PAL (Phase Alternating Line) is a TV standard originally invented by
German scientists.
• It uses 625 scan lines per frame, at 25 frames per second (or 40 msec!
frame), with a 4:3 aspect ratio and interlaced fields.
• Its broadcast TV signals are also used in composite video.
• This important standard is widely used in Western Europe, China, India
and many other parts of the world.
• PAL uses the YUV color model with an 8 MHz channel, allocating a
bandwidth of 5.5 MHz to Y and 1.8 MHz each to U and V.
• The color subcarrier frequency is fsc ~4.43 MHz
SECAM Video(Sequential)
• SECAM, which was invented by the French, is the third major broadcast TV
standard.
• SECAM stands for Systeme Electronique Couleur avec Momoire.
• SECAM also uses 625 scan lines per frame, at 25 frames per second, with a
4:3 aspect ratio and interlaced fields.
• The Original design called for a higher number of scan lines (over 800), but
the final version settled for 625.
• SECAM and PAL are similar, differing slightly in their color coding scheme.
• In SECAM, U and V signals are modulated using separate color subcarriers
at 4.25 MHz and 4.41 MHz, respectively.
1 Frame= 2fields
Digital Video
• The advantages of digital representation for video are many. It
permits:
• Storing video on digital devices or in memory, ready to be processed
(noise removal, cut and paste, and so on) and integrated into various
multimedia applications
• Direct access, which makes nonlinear video editing simple
• Repeated recording without degradation of image quality
• Ease of encryption and better tolerance to channel noise In earlier
Sony or Panasonic recorders
Cont’d…
• digital video was in the form of composite video.
• Modem digital video generally uses component video, although RGB
signals are first converted into a certain type of color opponent space,
such as YUV. The usual color space YCbCr.
Cont’d…
• First-generation HDTV was based on an analog technology developed by
Sony and NHK in Japan in the late 1970s.
• HDTV successfully broadcast the 1984 Los Angeles Olympic Games in Japan.
• MUltiple sub-Nyquist Sampling Encoding (MUSE) was an improved NHK
HDTV with hyblid analog/digital technologies that was put in use in the
1990s.
• It has 1,125 scan lines, interlaced (60 fields per second), and a 16:9 aspect
ratio.
• It uses satellite to broadcast ~ quite appropriate for Japan, which can be
covered with one or two satellites.
• The Direct Broadcast Satellite (DBS) channels used have a bandwidth of
24:MHz
Cont’d…
• In general, terrestrial broadcast, satellite broadcast, cable, and
broadband networks are all feasible means for transmitting HDTV as
well as conventional TV.
• The Federal Communications Commission (FCC) has planned to
replace all analog broadcast services with digital TV broadcasting by
the year 2006.
• Consumers with analog TV sets will still be able to receive signals via
an 8-VSB (8-level vestigial sideband) demodulation box.
• 8-VSB converts a binary stream into an octal representation by 
amplitude-shift keying a sinusoidal carrier to one of eight levels.
Cont’d…
• The services provided will include:
• Standard Definition TV (SDTV) ~ the current NTSC TV or higher
• Enhanced Definition TV (EDTV) - 480 active lines or higher
• High Definition TV (HDTV) - 720 active lines or higher.
• So far, the popular choices are 720P(720 lines, progressive, 30 fps)
and 1080l (1,080 lines, interlaced, 30 fps or 60 fields per second).
• The latter provides slightly better picture quality but requires much
higher bandwidth.
Basics of Digital Audio
• Audio information is crucial for multimedia presentations and, in a
sense, is the simplest type of multimedia data.
• Sound is a wave phenomenon like light, but it is macroscopic and
involves molecules of air being compressed and expanded under the
action of some physical device.
• For example, a speaker in an audio system vibrates back and forth and
produces a longitudinal pressure wave that we perceive as sound.
Cont’d…
• Without air there is no sound - for example, in space. Since sound is a
pressure wave, it takes on continuous values, as opposed to digitized
ones with a finite range.
• Nevertheless, if we wish to use a digital version of sound waves, we
must form digitized representations of audio information.
Cont’d…
• Figure shows the one-dimensional nature of sound.
• Values change over time in amplitude: the pressure increases or
decreases with time.
• The amplitude value is a continuous quantity.
• Since we are interested in working with such data in computer
storage, we must digitize the analog signals (i.e., continuous-valued
voltages) produced by microphone.
• Digitization means conversion to a stream of numbers - preferably
integers for efficiency.
Cont’d…
• Since the graph in Figure is two-dimensional, to fully digitize the signal
shown we have to sample in each dimension - in time and in
amplitude.
• Sampling means measuring the quantity we are interested in, usually
at evenly spaced intervals.
• The first kind of sampling - using measurements only at evenly spaced
time intervals - is simply called sampling (surprisingly), and the rate at
which it is performed is called the sampling frequency
Cont’d…
• For audio, typical sampling rates are from 8 kHz (8,000 samples per
second) to 48 kHz.
• The human ear can hear from about 20 Hz (a very deep rumble) to as
much as 20 kHz; above this level, we enter the range of ultrasound.
• The human voice can reach approximately 4 kHz and we need to
bound our sampling rate from below by at least double this frequency
(see the discussion of the Nyquist sampling rate, below).
• Thus we arrive at the useful range about 8 to 40 or so kHz
Nyquist Theorem
• Signals can be decomposed into a sum of sinusoids, if we are willing
to use enough sinusoids.
• shows how weighted sinusoids can build up quite a complex signal.
• Whereas frequency is an absolute measure, pitch is a perceptual,
subjective quality of sound - generally, pitch is relative.
• Note that the true frequency and its alias are located symmetrically
on the frequency axis with respect to the Nyquist frequency
pertaining to the sampling rate used.
Cont’d…
• For this reason, the Nyquist frequency associated with the sampling
frequency is often called the "folding" frequency.
• That is to say, if the sampling frequency is less than twice the true
frequency, and is greater than true frequency, then the alias
frequency equals the sampling frequency.
Signal-to-Noise Ratio (SNR)
• In any analog system, random fluctuations produce noise added to
the signal, and the measured voltage is thus incorrect.
• The ratio of the power of the correct signal to the noise is called the
signal-to-noise ratio (SNR).
• Therefore, the SNR is a measure of the quality of the signal.
• The SNR is usually measured in decibels (dB), where 1 dB is a tenth of
a bel.
• The SNR value, in units of dB, is defined in tenns of base-l0 logarithms
of squared voltage.
Signal-to-Quantization-Noise Ratio (SQNR)
• For digital signals, we must take into account the fact that only
quantized values are stored.
• For a digital audio signal, the precision of each sample is determined
by the number of bits per sample, typically 8 or 16 bit.
• Aside from any noise that may have been present in the original
analog signal, additional error results from quantization.
• That is, if voltages are in the range of 0 to 1 but we have only 8 bits in
which to store values, we effectively force all continuous values of
voltage into only 255 different values.
Cont’d…
• Inevitably, this introduces a roundoff error.
• Although it is not really "noise," it is called quantization noise (or quantization
error).
• The association with the concept of noise is that such errors will essentially occur
randomly from sample to sample.
• The quality of the quantization is characterized by the signal-to-quantization-
noise ratio (SQNR).
• Quantization noise is defined as the difference between the value of the analog
signal, for the particular sampling time, and the nearest quantization interval
value.
• At most, this error can be as much as half of the interval
Quantization and Transmission of Audio
• To be transmitted, sampled audio information must be digitized, and
here we look at some of the details of this process.
• Once the information has been quantized, it can then be transmitted or
stored.
• Coding of Audio
• Quantization and transformation of data are collectively known as
coding of the data.
• For audio, the ч-law technique for companding audio signals is usually
combined with a simple algorithm that exploits the temporal
redundancy present in audio signals.
Cont’d…
• Differences in signals between the present and a previous time can
effectively reduce the size of signal values and, most important, concentrate
the histogram of pixel values (differences, now) into a much smaller range.
• Pulse Code Modulation
• Audio is analog - the waves we hear travel through the air to reach our
eardrums.
• We know that the basic techniques for creating digital signals from analog
ones consist of sampling and quantization.
• Sampling is invariably done uniformly - we select a sampling rate and
produce one value for each sampling time.
Cont’d…
• In the magnitude direction, we digitize by quantization, selecting
breakpoints in magnitude and remapping any value within an interval
to one representative output level.
• The set of interval boundaries is sometimes called decision
boundaries, and the representative values are called reconstruction
Levels.
• Every compression scheme has three stages:
• Transformation. The input data is transformed to a new
representation that is easier or more efficient to compress previous
ones and transmit the prediction error.(predictive Coding)
Cont’d…
• Loss: Quantization is the main lossy step.
• Here we use a limited number of reconstruction levels, fewer than in
the original signal.
• Therefore, quantization necessitates some loss of information.
• Coding. Here, we assign a codeword (thus forming a binary bitstream)
to each output level or symbol.
• This could be a fixed-length code or a variable-length code, such as
Huffman coding.
Digital Audio
• Audio is often stored not in simple PCM but in a form that exploits
differences.
• For a start, differences will generally be smaller numbers and hence offer
the possibility of using fewer bits.
• if a time-dependent signal has some consistency over time (temporal
redundancy), the difference signal - subtracting the current sample from the
previous one - will have a more peaked histogram, with a maximum around
zero.
• Consequently, if we then go on to assign bitstring codewords to differences,
we can assign short codes to prevalent values and long codewords to rarely
occurring ones.
Lossless Predictive Coding
• Predictive coding simply means transmitting differences - we predict
the next sample as being equal to the current sample and send not
the sample itself but the error involved in making this assumption.
• That is, if we predict that the next sample equals the previous one,
then the error is just the difference between previous and next.
• Our prediction scheme could also be more complex.
• The goal of lossless audio compression is to achieve a smaller file size
while maintaining the exact same audio quality as the original,
uncompressed audio file.
Cont’d…
• DPCM
• Differential Pulse Code Modulation is exactly the same as Predictive
Coding, except that it incorporates a quantizer step.
• Quantization is as in PCM and can be uniform or nonuniform.
• One scheme for analytically determining the best set of nonuniform
quantizer steps is the Lloyd-Max quantizer, named for Stuart Lloyd
and Joel Max, which is based on a least squares minimization of the
error term.
DM
• DM stands for Delta Modulation, a much-simplified version of DPCM
often used as a quick analog-to-digital converter.
• ADPCM Adaptive DPCM takes the idea of adapting the coder to suit
the input much further.
• Basically, two pieces make up a DPCM coder: the quantizer and the
predictor.
• Above, in Adaptive DM,we adapted the quantizer step size to suit the
input.
• In DPCM, we can adaptively modify the quantizer, by changing the
step size as well as decision boundaries in a nonuniform quantizer
Cont’d…
• We can carry this out in two ways: using the properties of the input
signal (called forward adaptive quantization), or the properties of the
quantized output.
• For if quantized errors become too large, we should change the
nonuniform Lloyd-Max quantizer (this is called backward adaptive
quantization.
MUSICAL INSTRUMENT DIGITAL
INTERFACE
• MIDI, which dates from the early 1980s, is an acronym that stands for
Musical Instrument Digital Interface.
• It forms a protocol adopted by the electronic music industry that
enables computers, synthesizers, keyboards, and other musical
devices to communicate with each other.
• A synthesizer produces synthetic music and is included on sound
cards, using one of the two methods discussed above.
• The MIDI standard is supported by most synthesizers, so sounds
created on one can be played and manipulated on another and sound
reasonably close.
Cont’d…
• Computers must have a special MIDI interface, but this is
incorporated into most sound cards.
• The sound card must also have both DA and AD converters.
• MIDI is a scripting language - it codes "events" that stand for the
production of certain sounds.
• Therefore, MIDI files are generally very small. For example, a MIDI
event might include values for the pitch of a single note, its duration,
and its volume.
Cont’d…
• Terminology. A synthesizer was, and still may be, a stand-alone sound
generator that can vary pitch, loudness, and tone color. (The pitch is
the musical note the instrument plays - a C, as opposed to a G, say.)
• It can also change additional music characteristics, such as attack and
delay time.
• A good (musician's) synthesizer often has a microprocessor, keyboard,
control panels, memory, and so on.
• However, inexpensive synthesizers are now included on PC sound
cards.
Cont’d…
• Units that generate sound are referred to as tone modules or sound
modules.
• A sequencer started off as a special hardware device for storing and
editing a sequence of musical events, in the form of MIDI data.
• Now it is more often a software music editor on the computer.
• A MIDI keyboard produces no sound, instead generating sequences of
MIDI instructions called MIDI messages.
The End!!!

You might also like