Multimedia Cheatsheet

<Properties of Emerging Computing and Communication Applications> a finger touches the screen, it disturbs the screen's electrostatic field,
Spatiotemporal and live data streams are norm rather than the exception; measurable as a change in capacitance.
A holistic picture of an event or situation is more important than silos of <Types of microphones> Dynamic microphones operate on the principle
isolated data; Users want insights and information that are independent of of electromagnetic induction. They consist of a diaphragm attached to a coil
the medium and data source; Only relevant information counts; of wire, placed within the magnetic field of a magnet. When sound waves
Exploration (browsing) is the predominant mode of interaction (Instead of <Neisser’s Perceptual Cycle> An agent is continuously interacting with the hit the diaphragm, it moves, causing the coil to move within the magnetic
relational database querying is the primary interaction in previous environment using its sensory mechanisms; Build the model of the environ field. This movement generates an electrical current that is analogous to
generation application). ment; The system decides what is further required to complete the task an the sound waves; Condenser microphones, also known as capacitor
<Difference of human and computer in percept> Current information envir d how that information could be acquired.Agent:{Environment(available in microphones, use a capacitor to convert acoustic energy into electrical
onments often work against the human-machine synergy: Humans are goo formation)-modifies->Schema-Directs->Exploration-samples->Env.} energy. The main components are a diaphragm and a backplate, which
d at conceptual and perceptual analysis; Computers are exactly the opposit <Semantic gap> The semantic gap is the difference between the together create a capacitor. When sound waves hit the diaphragm, the
e, they are efficient at mathematical and logical analysis. information that a machine can extract from (perceptual) data and the distance between the diaphragm and the backplate changes, causing
<Experiential environment> is an environment where users directly use th interpretation that a user in a given situation has for the same data. variations in capacitance. These changes are translated into an electrical
eir senses to observe data and information of interest related to an event. N signal. To function, condenser microphones require power, either from a
atural interaction with the data based on the users’ particular set of interes battery or phantom power provided by a mixer or audio interface.
ts in the context of that event. <Digitization> is a process for converting continuous signal into a discrete
<Properties of Experiential Environments> Direct and Intuitive, Provides set of its samples. Digitized sample values are much less susceptible to
a holistic picture of an event without using unfamiliar metaphors and com distortions than analog signals. The error the quantization introduces is the
mands; Provide the same query and presentation spaces, Query and pre quantization noise. The error the sampling rate introduces is the
sentation spaces are the same; Consider both the User State and Context, discretization error.
People operate best when they are in known contexts. Good experiential e <Nyquist frequency> If a function x(t) contains no frequencies higher than
nvironment should promote perceptual analysis and exploration. B hertz, x(t) is completely determined by its ordinates at a series of points
<Why Multimedia Computing> We need a good experiential system to impr spaced 1/2B seconds apart.
ove human-machine synergy for current information environments; Experi <Sound Pressure Level> Sound Pressure Level (SPL), or L p , is defined as:
ential Systems are technically implemented as Multimedia Systems (Integra Lp=20lg(p/p_ref) dB. Measurement on logarithmic scale using decibels,
te, Process and Output data from different sensory modalities). <Masking in hearing> Frequency Masking: The phenomenon where a loud where p is the RMS sound pressure and p ref is a reference sound pressure.
<Formal Definition of Multimedia Computing> Consider a system equipped sound in one frequency band masks a softer sound in another frequency The sound pressure perceived by human ear is non-linear.
with multiple sensors working in a physical environment; Multimedia diffe band, which would have been audible without the loud sound, is known as <A-weighting scheme> Sound-pressure measurements are often frequency
rs from monomedia fields like computer vision or audio processing in the w auditory masking. weighted to match perception. A-weighting is applied to instrument-
ay that Multimedia is closely related to how humans experience the world; Temporal Masking: After a loud sound ceases, a quiet sound may remain measured sound levels in an effort to account for the relative loudness
Partial information from multiple media sources is correlated and combine inaudible for a brief duration as the ear temporarily reduces its sensitivity, perceived by the human ear, as the ear is less sensitive to low audio
d to get complete information about the environment; Multimedia computi requiring time to readjust its gain to normal levels. frequencies. It is employed by arithmetically adding a table of values, listed
ng and communication is fundamentally about combining information from <Types of Sensors> Sensors that imitate human sensors; Sensors that do by octave or third-octave bands, to the measured sound pressure levels in
multiple sources in the context of the problem being solved. not imitate human sensor; Sensors that measure physical facts about the dB. Sound-pressure levels weighted by A-weighting scheme are usually
environment. <Sensors Limitations> Limited dynamic range; Offset, bias; labeled as dBA or dB(A).
Nonlinearity; Hysteresis(deviation over time).
<Sinusoid Signal> x(t)=Acos(2πBt+θ), B=1/T.
<Objective of Multimedia System> Process different types of data streams s

imultaneously as one correlated set of streams that represent information a
nd knowledge of interest for solving a problem. <Human Audible Frequency Range> Hearing range is from 20 Hz to 22 kHz;
<Challenge of Multimedia System> Discover correlations that exist in the se Professional audio-recording equipment usually employs sampling
t of multimedia data; Combine partial information from disparate sources t frequencies of at least 44 kHz; Digital filtering and machine learning, might
o build holistic information in a given context. use these higher frequencies. <Speaker Characteristics> Rated power;
<Experience and Information> Experience is the direct observation of or p Impedance; Crossover frequencies; Frequency range.
articipation in events as a basis of knowledge (Webster’s dictionary). Infor <Types of Touch Sensors> Resistive touch screens work on the principle of <Produce Virtual Surround Sound> Interaural Level Difference (volume,
mation is an efficient but abstract communication of experience. Communic pressure. They consist of two conductive layers separated by a small gap: a ILD), Interaural Time Difference(ITD, delay); ILD and ITD can be analysed
ation is the process of sharing experiences and information with others. flexible top layer (typically made of polyester coated with a conductive by our brain to determine whether the sound came from the left or right; In
<Perception> is the process of understanding sensory signals to recover us material like indium tin oxide) and a rigid bottom layer. When a user addition, human brain can use a sound wave’s reflection off the pinna, or
eful information; Understanding of sensory information is an important ste presses on the screen, the two layers make contact at the point of touch, auricle, or the ear to determine the sound’s location. The volume and sound
p in many multimedia systems; Without any knowledge, the system cannot creating an electrical circuit; Capacitive touch screens work by sensing the quality change due to these reflections are known as Head-Related
produce any information; Perception is usually considered as controlled ha electrical properties of the human body. They are constructed with a layer Transfer Functions (HRTFs); To produce virtual surround sound, Recreate
llucination process. of glass coated with a transparent conductor (like indium tin oxide). When the ILD, ITD, HRTFs based on the speakers available.
<Music> Overtone; Timbre; Spectrum; ADSR envelope. representation of interval boundaries may not be possible due to limited forward/rewind operations. <I-Frame Coding> Block-based: Each frame
<Units of Light> Candela: Measures luminous intensity, which is defined as number of bits in representing numbers. 32/64 bits. is partitioned into non-overlapping blocks, Each 16x16 block is called
power emitted by a light source in a particular direction; Lumen: measures <Weakness of Entropy-based Compression Methods for Multimedia data> Macroblock. Each Macroblock contains: 4 Luminance (Y) 8x8 bloc, 2
luminous flux, i.e. the total light an object emits; Lux: luminous power per The compression obtained based on these entropy compression routines is Chrominance (1 Cr & 1 Cb) 8x8 block. Explore only spatial
area. Steradian. Lumen, lm, cd*sr; Lux, lx, lm/m2. Nit, Lv, cd/m2. usually a factor of two or less because: Entropy is usually higher for redundancies: Each frame is coded independently.
<CCD/CMOS> <Lights> Subtractive, CYMK; Additive, RGB. multimedia signals than for text data; Sampled signals contain noise, the
possibility for finding ideal repeating patterns is low; Multimedia data is
usually multidimensional.
<Lossy compression> leverages the fact that multimedia data can be
gracefully degraded in quality by increasingly losing more information.
Challenge: Tradeoff between quality and cost.
<Vector Quantization> is similar to quantisation, but instead of working on
a single-valued function, it works on tuples, triples, or higher dimensional
vectors. Common approach: Linear Quant, KX means. <Linear
Quantization> The easiest way to map n vectors to k vectors. <Distance
<Transformation in documents> Audio and video technologies changed the metric> Nonnegativity, Reflexivity ,Symmetry, Triangle inequality. <X-
book metaphor for documents; Hyperlinks in electronic documents and the means> Good scoring mechanism is difficult to determine. Traverse X for K. Compression can be achieved by: Lossy, DCT + Quantization; Ent., RLE +
introduction on hyperlinked pages on web allow documents to be read in <Perceptual Quantization> Even though it is highly desirable to reproduce Huffman Encoding. DCT transforms the digitized signal in spatial domain
an order that a reader finds appropriate rather than what the author content as accurate as possible, but “accuracy” is not defined based on into a representation in frequency domain. Quantization removes some
intended; Now a document is no longer a compact and closed physical simple mathematical distance; Two signals are accurately reproduced if “details” in the frequency domain, with graceful degradation in spatial
artifact, but a dynamically and organically growing result of collaborative they percept the same, even they have many differences bitwise. domain. RLE transforms the quantized coefs. into a short sequence of
authoring that presents multimedia in all its forms. The simplest and yet most widely used perceptual quantization techniques (run, level) pairs. The (run, level) pairs can then be compressed easier by
<Evolution of Documents> for audio data include the <A/µ-law> µ=255, A=87.6. Both are conceptually Huffman Encoder. <Quantization in H.261> DC coef. quantized uniformly
Type of Media: Text remains an important component of documents, but similar, µ-law become a quasi-standard for low-bandwidth voice. with step size 8 with no deadzone, All other AC coefs. quantized uniformly
now documents use different media as the author sees fit; Author can <Visual Quantization> The coexistence of a lip-synchronous audio track with deadzone and with the same step size. Zigzag Scanning. Scan
combine different media in space and time to communicate his ideas in the generally allows for lower frame rates because human brain is good at quantized coefficients in zig-zag order, Non-zero coefficients tend to be
most compelling manner; The same content segments in electronic form filling missing information across modalities. grouped together. More compression, more quality loss. Blocky effect ,
can be used frequently as necessary. Non-Linear Flow: With the advent of <Differential coding> leverages global knowledge about the properties of Mosquito noise. <P-Frame Coding> block-based, same Macroblock
electronic media and the ability to create links, the limitation of linearity the signal to encode; widely used for compression; scales very well from structure as in I-frame coding Frame predicted from previous I/P-frame.
can be easily overcome; This provides a very flexible method of authoring lossless to entirely lossy. The main component of a differential encoder is P-frame explore both the spatial and temporal redundancies. Motion
documents that may be customized for different types of audiences. the predictor, Lossy differential coding can be achieved by quantizing the Estimation+Motion Compensation. Residual: Motion Compensated diff.
<Stages in Document Creation> Data acquisition and organization; Data difference values, Difference encoding is usually combined with entropy
selection; Editing.<Synchronization> means establishing, specifying and encoding for better compression performance. The idea behind the
then performing coordination of multiple media sources precisely in space difference encoding of audio is that sound other than noise follows a
and time. <Desirable Encoding Properties> The code must be rather predictable wave pattern. PNG image format uses a differential
unambiguously decodable: One coded message corresponds to exactly one encoding step before an LZ-derivate entropy encoder is used. The
decoded message; The code should be easily decodable: Able to find symbol algorithm predicts the color of each pixel based on the colors of previous
endings and the end of message easily, Able to decode online, that is as neighboring pixels. Adaptive Differential Pulse Code Modulation.
symbol comes in, it can be decoded without knowing the entire coded
message; The code should be compact.
<Huffman code> A block code: each source symbol is mapped into a fixed
sequence of code symbols; Instantaneous decodable : each code word in a
string of code symbols can be decoded without referencing succeeding
symbols; Uniquely decodable: any string of code symbols can be decoded in
only one way by examining the individual symbols of the string in a left to
right manner. Limitation: only be encoded using an integer number of bits
per symbol. <VLC vs. FLC> Huffman Code is a common form of Variable
Length Coding: Assign shorter code word to higher probability symbol, and
vice versa. <Coding Algorithms> Truncated Huffman, Shift, Lempel-Ziv, <MPEG-1> IBBPBBPBB. Encoding: IPBBPBBP. Increase memory
Arithmetic.<Variants of LZ77> LZR, LZH, DEFLATE (gzip).<78> LZC, LZW. requirement: Two reference frames for B-predictions, Holding pending
<Arithmetic coding> Unlike the variable-length codes described previously, frame for predictions. Increase computation requirement: Two motion
Arithmetic Coding generates nonblock codes: A one-to-one correspondence estimations for each block in a B-frame. B-Frames allows higher
between source symbols and code words does not exist; An entire compression ratio, B-Frames induces higher video latency. NG 4 live.
sequence of source symbols is assigned a single arithmetic code word; The
code word itself defines an interval or real numbers between 0 and 1; As
<Video compression> H.26_13456, MPEG-_124. How to achieve? Explore
the number of symbols in the message increases, the interval used to
redundancies in video sequences: Spatial, Temporal. Reduce irrelevant
represent it becomes smaller and smaller; A termination symbol is usually
data according to “Psycho Visual” Model.
required. Technical Challenge: Precision problem, Accurate
<I/P-Frames> I-Frames provide access points. Serve as recovery points
when some frames are lost. Serve as probe points to support fast-

Multimedia Cheatsheet

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multimedia Cheatsheet

Uploaded by

Copyright:

Available Formats

<Properties of Emerging Computing and Communication Applications> a finger touches the screen, it disturbs the screen's electrostatic field,

<Objective of Multimedia System> Process different types of data streams s

You might also like