Unit Iii Audio Fundamental and Representaion

UNIT III AUDIO FUNDAMENTAL AND REPRESENTAION
Audio- Signals representing sound and speech which travels throw a medium to transfer data are called audio. The speed of signal is depending on the medium which is normally air or metal. Sound signal is of two types1) Analog sound-It travels in a continuous wave format which changes from the lowest value to the highest value and stores all the values between the ranges. 2) Digital sound- Digital sounds are discrete value which store only two values highest and lowest.
Characteristics of sound signal
1) Period (Oscillation):
Regular intervals of sound which repeat multiple times to construct sound signal are called period. 2) FrequencyNumbers of period travels in one second is called frequency of audio signal. It is measured in Hertz (Hz) Amplitude- It is the maximum strength or intensity of sound signal. Pitch- Sound is an objective quantity which shows the relative frequency of sound signal. Frequency is number of cycle transfer per second, which is a subjective quantity so can b easily measure. Pitch is signal which can be measured by comparison of periodic frequency signal. Pitch is mostly used in musical nodes. Bandwidth- Bandwidth is the total frequency of a medium from its lowest frequency range to its highest frequency. It is measured in BPS (Bits/sec). Its shows the data handling capacity of a medium. Wave Length- The total distance cover in on cycle by a wave is called wave length. Frequency of sound (f) and speed(c) so the wave length isWave Length= C/F Decibel System- Acoustics is branch of science which study about sound. Decibel system is used to measure sound pressure or loudness of sound. The decibel (dB) is a logarithmic unit that indicates the ratio of a physical quantity (usually power or intensity) relative to a specified or implied reference level. A ratio in decibels is ten times the logarithm to base 10 of the ratio of two power quantities. Firstly it is used to measure electric loss in wire. But now it is mostly used in measuring sound intensity or radio signal strength. It is shown as dB. Some time it is used to measure some
3) 4)
5) 6) 7)
other unit like- to measure kilowatt loss we use dBK while to measure Voltage loss we use dBV. The range of audible sound for human is up to 80Db. Computer Represent sound in digital format while normally sound travels in analog format, so we have to convert analog sound signal into digital format which is called digitalization of sound. Analog to digital conversion: - Analog signals are converted in digital signal for using them in computers. A PCM stream is a digital representation of an analog signal, in which the magnitude of the analog signal is sampled regularly at uniform intervals, with each sample being quantized to the nearest value within a range of digital steps. Pulse code modulation is used to convert analog signal to digital signal. The basic steps of pulse code modulation is1) Sampling 2) Quantinization 3) Binary encoding 4) Line encoding
1) Sampling- In
this stage sample of amplitude are noted ion fixed time interval the timing of samples are kept low to increase the sampling rate and the quality of digital signal increase with this increased sample rate 2) Quantinization- The sampled are valued comparatively. Each of the samples is assign to a value specified by the scale with comparison of other value. 3) Binary encoding- The value of quantinized signals are then converted into decimal to binary. By this we get a continue string of bits which is called binary encoded signal. 4) Line encoding- The string we got from binary encoding are then converted into digital sound form. Here one is the maximized value while 0 is the minimized value.
Advantage of digital signal over analog signal1) 2) 3) 4) Digital sound can be store easily in digital medium like CD, DVD etc. Digital sound is less sensitive for interference than analog sound. Digital sound can be easily regenerated. Process of editing like cut a track, copy it or eco track etc can be easily applied on digital sound.
Type of Sound- Sound can be categorized into two typesi) Periodic sound- sound which generate on fix time interval is called periodic sound. It
has constant amplitude and frequency with time.
ii) Aperiodic Sound- frequency and amplitude is change with time in these types of
sound. So it is also called sound generating on varying time interval.
PERIODIC SIGNAL
APERIODIC SIGNAL Date Rate- Data rate is the number of bits transfer in a unit time. Commonly this data rate is
measured in mega bits/second or kilo bits per second. When data rate is measure in bits/ second it is also called bit rate. Bit rate is used to define size of a audio file. For this we have to know the total time of file. The formula of calculating the Bit rate isFile size= bit rate * play time Here bit rate is in bits per second Play time is total time consume to play a track File size is in bits per second In Digital multimedia bits rate show the information store in files per unit time. It depends of following factorsi) Sampling rate of original date ii) Number of bits used in sample iii) Data encoding scheme iv) Data compression technique algorithm Bit rate can be control by changing sample rate by reducing the bit rate the size of file decrease, while the sound quality is negliable affected by this. So decreasing bit rate is used to reduce file size.
Audio file Formats Audio Interchange File Format (AIFF)

1. AIFF Is the propriety file format of apple. 2. It can use both mono and stereo channel for transferring data. 3. It have extension .aif or .ief 4. AIFF file row data channel information, bit depth, and sample rate or application specific data area. 5. It doesnt support data compression but it provides an alternative format which support data compression and called AIFF compressed file format.
Wave Format1. 2. 3. 4. 5. 6. IT is the standard format of Microsoft and IBM pc. It have extension of .wav Its also called audio for window. It is based on RIFF format (Resource interchange file format) Audio can be easily edited on this file format It used uncompressed format where data is store by using Linear Pulse Code modulation(LPCM) 7. IT limits the file size to less than 4GB
Adaptive Multi Rate(AMR)1. AMR was adopted as the standard speech codec by 3GPP in October 1999 and is now
widely used in GSM and UMTS.
2. The common filename extension is .amr.

3. AMR is also a file format for storing spoken audio using the AMR codec. 4. Many modern mobile telephone handsets can store short audio recordings in the AMR format. 5. Both free and proprietary programs exist (see Software support) to convert between this and other formats. 6. Sampling frequency 8 kHz/13-bit
MP3(MPEG-1 and MPEG-2)1. MP3 is an audio-specific format that was designed by the Moving Picture Experts 2. 3. 4. 5. 6.
Group (MPEG) as part of its MPEG-1 standard. Later extended in MPEG-2 standard. The use in MP3 of a lossy compression algorithm is designed to greatly reduce the amount of data required to represent the audio recording. Still sound like a faithful reproduction of the original uncompressed audio for most listeners. A sample rate of 44.1 kHz is almost always used, because this is also used for CD audio, the main source used for creating MP3 files. Represent compression ratios of approximately 11:1, 9:1 and 7:1 respectively.
MP4 (MPEG part4)

1. 2. 3. 4. 5. It is most commonly used to store digital video and digital audio streams. A multimedia container format standard specified as a part of MPEG-4 It can also be used to store other data such as subtitles and still images. MPEG-4 Part 14 allows streaming over the Internet The only official filename extension for MPEG-4 Part 14 files is .mp4.
Real Audio(.ra / .rm) 1. It is a proprietary audio format developed by Real Networks and first released in
April 1995. 2. It ranging from low-bit rate formats that can be used over dialup modems, to highfidelity formats for music. 3. It can also be used as a streaming audio format that is played at the same time as it is downloaded. 4. RealAudio files were originally identified by a filename extension of .ra 5. The combination of the audio and video formats was called Real Media and used the file extension .rm.
Asf (Advance streaming format)

1. This format is developed by Microsoft for storing synchronized streaming data. 2. We can deliver data on different network with the help of this format 3. The main goal of this network is to provide industry wide multimedia interoperability. 4. It is created with the combination of more than one media stream. 5. It has a file header with it which stores all the properties of whole file in it. 6. Asf file synchronize the entire stored data stream on a common time line before presenting or delivering information.
AVI (Audio video Interleaved)

1. It is used to store sound or moving picture in RIFF Format. 2. It store audio and video in a single frame. 3. It provides better synchronisation of audio and video in less space.
COMPRESSION- It is used to reduce file size of file. There are three major groups of audio
file formats:

Uncompressed audio formats, such as WAV, AIFF etc. Formats with lossless compression, such as APE, TTA, and WMA etc. Formats with lossy compression, such as MP3, etc
Uncompressed Audio Format-Uncompressed audio formats encode both sound and silence
with the same number of bits per unit of time. Encoding an uncompressed minute of absolute silence produces a file of the same size as encoding an uncompressed minute of music. This makes them suitable file formats for storing and archiving an original recording.
Lossless compressed audio formats- In a lossless compressed format, however, the music
would occupy a smaller portion of the file and the silence would take up almost no space at all. Lossless compression formats enable the original uncompressed data to be recreated exactly. They provide a compression ratio of about 2:1 (i.e. their files take up half the space of the originals).
Lossy compressed audio formats-Lossy compression enables even greater reductions in file
size by removing some of the data. Lossy compression typically achieves far greater compression but somewhat reduced quality than lossless compression by simplifying the complexities of the data. The popular MP3 format is probably the best-known example. Most formats offer a range of degrees of compression, generally measured in bit rate. Here the lower the rate, the smaller the file and the more significant the quality loss.
TRANSFER OF AUDIO OVER INTERNET- To send Audio on internet a specific

protocol is used which is called Voice over IP protocol. Audio transmission is done on the form of data packets, where receiver receives data packets from IP data network. Downloading and play back can be done using two methods which arei) File downloading then streaming where first complete file is downloaded and played. ii) Progressive downloading where media is downloaded as well as played at a time.
Voice over IP- It is use to send audio file on internet. For this we have to follow given steps1) While recording the sample sound compression is provided according to data format and the audio recording frequency is limited according to the recording. 2) Compression/decompression algorithm (CODEC) is used for compression. 3) Sample sound is then collected and converted in data packets; this process is known as packetization. 4) After this they are send on IP network. 5) After transmission the packets are rearranged on receiver side according to packet number. 6) At the time of rearrangement the lost packet are regenerated by using filling the gaps algorithm
7) Some time packets are sending multiple times to stop packet loss which is called redundancy. 8) Forward error correction mechanism is also used to stop packet loss. In this method every packet have some information of previous packet which is matched at the time of rearrangement 9) Delay packets are treated as loss. Variations in delay are called Jitter. To overcome this problem buffering queue is used. Some other audio transmission protocols are also used to find receiver and synchronize sender and receiver. Voice over IP is used within TCP/IP protocol suit.
FUNCTION PERFORMED ON SOUND1) Trimming- It is used to remove beginning or ending blank space. This blank 2)
3) space is used to show the gap in two continue recordings. But it can be removed to reduce file size by trimming. Slicing It is used to extract sound beat from the recording at desire position. It is also used to remove unwanted sound beat from recording. Reassembling To rearrange the multiple small sound portion to construct a new sound piece are called reassembling. Resampling - By changing sampling rate both file quality and file size are affected. By decreasing sampling rate we can decrease the file size this is called Resampling. But it has negative effect on sound quality also. Volume control - To change volume in sound beat with time is called the process of volume control. Special effects Effects are used to increase sound quality. The most commonly used effects are Echo effect a mild same node sound is inserted with the original sound. Reverse effect Sound is reverse play back. Mix to combine two or more sound track in one. Fade in/ Fade out At end of Sound beat sound is mild down to give a soft ending. Equalization In this process long section are provided smoothing by using fade in fade out effect. This process is used to balance recording. Time stretching This is used to increase recoding overall length/time without changing sound pitch. This is also used to increase sound file size without changing sound.
4) 5)
6)
7) 8)
SYNTHESIZER
Periodic electric signals can be converted into sound by amplifying them and driving a loudspeaker with them. A sound synthesizer (often abbreviated as "synthesizer" or "synth") is an electronic instrument capable of producing a wide range of sounds. Synthesizers generate electric signals (waveforms), and can finally be converted to sound through the loudspeakers or headphones. Modern sound synthesis makes increasing use of MIDI for sequencing and communication between devices. Type Of Synthesis- Additive synthesis adds various amplitudes of the harmonics of a chosen pitch until the desired timbre is obtained. Additive synthesis builds sounds by adding together waveforms (which are usually harmonically related). To implement real-time additive synthesis, Wavetable synthesis is useful for reducing required hardware/processing power, and is commonly used in low-end MIDI instruments (such as educational keyboards) and low-end sound cards.
Subtractive synthesis is based on filtering harmonically rich waveforms. to start with geometric waves, which are rich in harmonic content, and filter the harmonics to produce a new sound- subtractive synthesis. FM synthesis (frequency modulation synthesis) is a process that usually involves the use of at least two signal generators (sine-wave oscillators, commonly referred to as "operators" in FM-only synthesizers) to create and modify a voice. Often, this is done through the analog or digital generation of a signal that modulates the tonal and amplitude characteristics of a base carrier signal Resynthesis is modification of digitally sampled sounds before playback. Analysis/resynthesis is a form of synthesis that uses a series of band pass filters or Fourier transforms to analyze the harmonic content of a sound. The resulting analysis data is then used in a second stage to resynthesize the sound using a band of oscillators. Granular synthesis - combining of several small sound segments into a new sound. Physical modelling synthesis is the synthesis of sound by using a set of equations and algorithms to simulate a real instrument, or some other physical source of sound. Sample-based synthesis is one of the easiest synthesis systems is to record a real instrument as a digitized waveform, and then play back its recordings at different speeds to produce different tones. This is the technique used in "sampling". Imitative synthesis is a sound synthesis can be used to mimic acoustic sound sources. Generally, a sound that does not change over time will include a fundamental partial or harmonic, and any number of partials. Synthesis may attempt to mimic the amplitude and pitch of the partials in an acoustic sound source.
MIDI

Musical Instrument Digital Interface (MIDI) is a data transfer protocol which is widely used with music synthesizers. MIDI Musical Instrument Digital Interface) is an industry specification for encoding, storing, synchronizing, and transmitting the musical performance and control data of electronic musical instruments (synthesizers, drum machines, computers) and other electronic equipment (MIDI controllers, sound cards, samplers). It uses a serial data connection with five leads. It uses two basic message types channel and system. Channel messages can be sent from machine to machine over any one of 16 channels to control an instrument's voice parameters or to control the way the instrument responds to voice messages. System messages can be directed to all devices in the system (called "common" messages) or can be directed to a specific machine (exclusive). Within the MIDI protocol, a basic set of standards has been developed called the General MIDI specification, or just GM. It attempts to standardize common practices within MIDI and make it more accessible to the general user. Midis composition takes advantage of MIDI 1.0 and General MIDI (GM) technology to allow MIDI data files to be shared between multiple devices, eliminating compatibility issues by using a standard set of commands and parameters. The MIDI file is just a digital representation of the sequence of notes with information about pitch, duration, voice, etc., and that takes much less memory than the digitally recorded image of the complex sound.
The current MIDI specification includes:

A hardware scheme for physically connecting electronic musical instruments and associated electronic equipment together. (MIDI Interface, MIDI Adapter, MIDI Cable). A data encoding scheme for storage and transmission of musical performance and control event data as messages. Typical message types include musical notation, pitch, and velocity, control signals for parameters (such as volume, vibrato, panning, cues, and clock signals. (MIDI messages, MIDI file) Communication protocols for transmitting and synchronizing musical performance and control event data. (MIDI Machine Control, MIDI Show Control, MIDI time code, Song Position Pointer) Schemes for categorizing instrument and percussive sounds or timbres, also referred to as patches or programs. (General MIDI, General MIDI Level 2) MIDI files are typically created using computer-based sequencing software (or sometimes a hardware-based MIDI instrument or workstation) that organizes MIDI messages into one or more parallel "tracks" for independent recording and editing.
MIDI Basics

MIDI information is transmitted in "MIDI messages", which can be thought of as instructions which tell a music synthesizer how to play a piece of music. The synthesizer receiving the MIDI data must generate the actual sounds. The MIDI data stream is a unidirectional asynchronous bit stream at 31.25 Kbits/sec. with 10 bits transmitted per byte (a start bit, 8 data bits, and one stop bit). The MIDI interface on a MIDI instrument will generally include three different MIDI connectors, labelled IN, OUT, and THRU. The MIDI data stream is usually originated by a MIDI controller, such as a musical instrument keyboard, or by a MIDI sequencer. MIDI controller is a device which is played as an instrument, and it translates the performance into a MIDI data stream in real time (as it is played). A MIDI sequencer is a device which allows MIDI data sequences to be captured, stored, edited, combined, and replayed. The MIDI data output from a MIDI controller or sequencer is transmitted via the devices' MIDI OUT connector.
MIDI SYSTEM
Figure shows a simple MIDI system, consisting of a MIDI keyboard controller and a MIDI sound module. Note that many MIDI keyboard instruments include both the keyboard controller and the MIDI sound module functions within the same unit. In these units, there is an internal link between the keyboard and the sound module which may be enabled or disabled by setting the "local control" function of the instrument to ON or OFF respectively.
The single physical MIDI Channel is divided into 16 logical channels by the inclusion of a 4 bit Channel number within many of the MIDI messages. A musical instrument keyboard can generally be set to transmit on any one of the sixteen MIDI channels. A MIDI sound source, or sound module, can be set to receive on specific MIDI Channel(s). In the system depicted in Figure 1, the sound module would have to be set to receive the Channel which the keyboard controller is transmitting on in order to play sounds. Figure 2 shows a more elaborate MIDI system. In this case, a MIDI keyboard controller is used as an input device to a MIDI sequencer, and there are several sound modules connected to the sequencer's MIDI OUT port. A composer might utilize a system like this to write a piece of music consisting of several different parts, where each part is written for a different instrument. The composer would play the individual parts on the keyboard one at a time, and these individual parts would be captured by the sequencer. The sequencer would then play the parts back together through the sound modules. Each part would be played on a different MIDI Channel, and the sound modules would be set to receive different channels. For example, Sound module number 1 might be set to play the part received on Channel 1 using a piano sound, while module 2 plays the information received on Channel 5 using an acoustic bass sound, and the drum machine plays the percussion part received on MIDI Channel 10.
Unit IV IMAGES
An image (from Latin: imago) is an artifact, for example a two-dimensional picture, that has a similar appearance to some subjectusually a physical object or a person.
Types
i) A volatile image is one that exists only for a short period of time. This may be a
reflection of an object by a mirror, a projection of a camera obscura, or a scene displayed on a cathode ray tube.
ii) . A fixed image, also called a hard copy, is one that has been recorded on a material
object, such as paper or textile by photography or digital processes.
iii) A mental image exists in an individual's mind: something one remembers or

imagines. The subject of an image need not be real; it may be an abstract concept, such as a graph, function, or "imaginary" entity.
iv) A Range imaging is the name for a collection of techniques which are used to
produce a 2D image showing the distance to points in a scene from a specific point, normally associated with some type of sensor device.
v) Intensity images measure the amount of light impinging on a photosensitive device.

The input to the photosensitive device, typically a camera, is the incoming light, which enters the camera's lens and hits the image plane.
On the Basis Of Motion Images Are 2 Typesi) A still image is a single static image, as distinguished from a kinetic image (see
below). This phrase is used in photography, visual media and the computer industry to emphasize that one is not talking about movies, or in very precise or pedantic technical writing such as a standard.
ii) A film still is a photograph taken on the set of a movie or television program during
production, used for promotional purposes.
Digital image representation For representing image in digital media we have to convert it
in numeric or binary form. For representing a 2D image a array of 3 rows are used, which store each image pixel as X coordinate, Y coordinate and intensity. Pixel is the smallest individual element in an image holding quantized value that represents the brightness of the giving color at any specific point. Construction of integer array at the time of digital representation of image is called raster scanning and this array is called raster map.
Properties of image i) Scalability generally refers to a quality reduction achieved by manipulation of the bit
stream or file (without decompression and re-compression). Other names for scalability are progressive coding or embedded bit streams. Despite its contrary nature, scalability also may be found in lossless codecs, usually in form of coarseto-fine pixel scans. Scalability is especially useful for previewing images while downloading them (e.g., in a web browser) or for providing variable quality access to e.g., databases. There are several types of scalability: Quality progressive or layer progressive: The bit stream successively refines the reconstructed image. Resolution progressive: First encode a lower image resolution; then encode the
difference to higher resolutions. Component progressive: First encode grey; then color.
ii) Meta information Compressed data may contain information about the image which
may be used to categorize, search, or browse images. Such information may include color and texture statistics, small preview images, and author or copyright information.
iii) Region of interest coding Certain parts of the image are encoded with higher quality
than others. This may be combined with scalability (encode these parts first, others later).
iv) Processing power Compression algorithms require different amounts of processing

power to encode and decode. Some high compression algorithms require high processing power.
v) Image Quality- is not a single factor but is a composite of at least five factors: contrast,
blur, noise, artifacts, and distortion, as shown Image Contrast: Contrast means difference. In an image, contrast can be in the
form of different shades of grey, light intensities, or colors. Contrast is the most fundamental characteristic of an image. The physical contrast of an object must represent a difference in one or more object characteristics. When a value is assigned to contrast, it refers to the difference between a specific structure and object in the image and the area around it or its background.
Blur: Each imaging method has a limit as to the smallest object that can be
imaged and thus on visibility of detail. Visibility of detail is limited because all imaging methods introduce blurring into the process. The primary effect of image blur is to reduce the contrast and visibility of small objects or detail. The amount of blur in an image can be quantified in units of length. This value represents the width of the blurred image of a small object.
Noise: Image noise, sometimes referred to as image mottle, gives an image a
textured or grainy appearance. The source and amount of image noise depend on the imaging method. Noise will affect the boundary between visible and invisible objects. The general effect of increasing image noise is to lower the curtain and reduce object visibility. How noise becomes less pronounced as the tones become brighter. Brighter regions have a stronger signal due to more light, resulting in a higher overall SNR. This means that images which are underexposed will have more visible noise even if you brighten them up to a more natural level afterwards. On the other hand, overexposed images will have less noise and can actually be advantageous, assuming that you can darken them later and that no region has become solid white where there should be texture.
Artifacts: Most imaging methods can create image features that do not
represent a body structure or object. These are image artifacts. In many situations an artifacts does not significantly affect object visibility and diagnostic accuracy. But artifacts can obscure a part of an image or may be interpreted as an anatomical feature. Distortion: An image should not only make internal objects visible, but should give an accurate impression of their size, shape, and relative positions. vi) Pixel bit depth is the number of bits that have been made available in the digital system to represent each pixel in the image. Here we have an example of using only four bits. This is smaller than would be used in any actual medical image because with four bits,
a pixel would be limited to having only 16 different values (brightness levels or shades of grey).
vii) Pixel size: When an image is in digital form, it is actually blurred by the
size of the pixel. This is because all anatomical detail within an individual pixel is "blurred together" and represented by one number. The physical size of a pixel, relative to the anatomical objects, is the amount of blurring added to the imaging process by the digitizing of the image. Here we see that an image with small pixels (less blurring) displays much more detail than an image made up of larger pixels. The size of a pixel (and image detail) is determined by the ratio of the actual image size and the size of the image matrix.
Image resolution: This is an umbrella term that describes the detail
viii)
an image holds. The term applies to raster digital images, film images, and other types of images. Higher resolution means more image detail. Image resolution can be measured in various ways. Basically, resolution quantifies how close lines can be to each other and still be visibly resolved. Resolution units can be tied to physical sizes (e.g. lines per mm, lines per inch), to the overall size of a picture (lines per picture height, also known simply as lines, TV lines, or TVL), or to angular subtenant. Line pairs are often used instead of lines; a line pair comprises a dark line and an adjacent light line. A line is either a dark line or a light line. A resolution of 10 lines per millimetre means 5 dark lines alternating with 5 light lines, or 5 line pairs per millimetre (5 LP/mm). Photographic lens and film are most often quoted in line pairs per millimetre. The resolution of digital images can be described in many different ways. Resolution of an image can be of following types-
i)
Pixel Resolution- The pixel resolution with the set of two positive integer numbers, where the first number is the number of pixel columns (width) and the second is the number of pixel rows (height), for example as 640 by 480. Spatial resolution - The measure of how closely lines can be resolved in an image is called spatial resolution, and it depends on properties of the system creating the image, not just the pixel resolution in pixels per inch (ppi). In effect, spatial resolution refers to the number of independent pixel values per unit length. Spectral resolution - Color images distinguish light of different spectra. Multi-spectral images resolve even finer differences of spectrum or wavelength than is needed to reproduce color. That is, they can have higher spectral resolution. That is the strength of each band that is created. Radiometric resolution - Radiometric resolution determines how finely a system can represent or distinguish differences of intensity, and is usually expressed as a number of levels or a number of bits, for example 8 bits or 256 levels that is typical of computer image files. The higher the radiometric resolution, the better subtle
ii)
iii)
iv)
differences of intensity or reflectivity can be represented, at least in theory. In practice, the effective radiometric resolution is typically limited by the noise level, rather than by the number of bits of representation.
ix) Numeric size The numerical size (number of bits) of an image is the
product of two factors:
The number of pixels which is found by multiplying the pixel length and width of the image. The bit depth (bits per pixel). This is usually in the range of 8-16 bits, or 1-2 bytes, per pixel. The level of compression is the factor by which the numerical size is reduced. It depends on the compression method and the selected level of compression. Lossless compression is when there is no loss of image quality, and is commonly used in many medical applications. Lossy compression results in some loss of image quality and must be used with care for diagnostic images.
x) Image compression is the process of reducing the numerical size of digital images.
Image Format Image file formats are standardized means of organizing and
storing digital images. Image files are composed of pixels, vector (geometric) data, or a combination of the two.
PNG (Portable Network Graphics)

The PNG file format was created as the free, open-source successor to the GIF. The PNG file format supports truecolor (16 million colors) while the GIF supports only 256 colors. The PNG file excels when the image has large, uniformly colored areas. The lossless PNG format is best suited for editing pictures, and the lossy formats, like JPG, are best for the final distribution of photographic images, because in this case JPG files are usually smaller than PNG files. PNG provides a patent-free replacement for GIF and can also replace many common uses of TIFF. Indexed-color, greyscale, and truecolor images are supported, plus an optional alpha channel. PNG is designed to work well in online viewing applications like web browsers so it is fully stream able with a progressive display option. PNG is robust, providing both full file integrity checking and simple detection of common transmission errors. PNG can store gamma and chromaticity data for improved color matching on heterogeneous platforms.
GIF (Graphics Interchange Format)

GIF is limited to an 8-bit palette, or 256 colors. This makes the GIF format suitable for storing graphics with relatively few colors such as simple diagrams, shapes, logos and cartoon style images. The GIF format supports animation and is still widely used to provide image animation effects. It also uses a lossless compression that is more effective when large areas have a single color, and ineffective for detailed images or dithered images.
BMP (Windows bitmap)

The BMP file format handles graphics files within the Microsoft Windows OS. BMP files are uncompressed, hence they are large. The advantage is their simplicity and wide acceptance in Windows programs.
TIFF (Tagged Image File Format)

The TIFF format is a flexible format that normally saves 8 bits or 16 bits per color (red, green, blue) for 24-bit and 48-bit totals, respectively. Usually using either the TIFF or TIF filename extension. TIFF's flexibility can be both an advantage and disadvantage, since a reader that reads every type of TIFF file does not exist. TIFFs can be lossy and lossless. TIFF image format is not widely supported by web browsers. TIFF remains widely accepted as a photograph file standard in the printing business. TIFF can handle device-specific color spaces, such as the CMYK defined by a particular set of printing press inks. OCR (Optical Character Recognition) software packages commonly generate some (often monochromatic) form of TIFF image for scanned text pages.
EXIF (Exchangeable image file format)

The Exif format is a file standard similar to the JFIF format with TIFF extensions. It is incorporated in the JPEG-writing software used in most cameras. Its purpose is to record and to standardize the exchange of images with image metadata between digital cameras and editing and viewing software.
The metadata are recorded for individual images and include such things as camera settings, time and date, shutter speed, exposure, image size, compression, name of camera, color information.
When images are viewed or edited by image editing software, all of this image information can be displayed. It stores Meta information.
JPEG (Joint Photographic Experts Group)

JPEG-compressed images are usually stored in the JFIF (JPEG File Interchange Format) file format JPEG compression is (in most cases) lossy compression. The JPEG/JFIF filename extension is JPG or JPEG. It supports 8 bits per color (red, green, blue) for a 24-bit total, producing relatively small files. JPEG files suffer generational degradation when repeatedly edited and saved. The JPEG/JFIF format also is used as the image compression algorithm in many PDF files. It Use Following Steps For Compression1. Encoding Image 3. Quantinization 4. Scanning And Compression
2. DCT (Digital Cosine Transform)
Lets Take an image as Example asFirst, the image is arranged in a rectangular grid of pixels whose dimensions are 250 by 375 giving a total of 93,750 pixels. The color of each pixel is determined by specifying how much of the colors red, green and blue should be mixed together. Each color component is represented as an integer between 0 and 255 and so requires one byte of computer storage. Therefore, each pixel requires three bytes of storage implying that the entire image should require 93,750 3 = 281,250 bytes.
However, the JPEG image is only 32,414 bytes. In other words, the image has been compressed by a factor of roughly nine. The JPEG compression algorithm First, the image is divided into 8 by 8 blocks of pixels. Since each block is processed without reference to the others, we'll concentrate on a single block. We may think of the color of each pixel as represented by a threedimensional vector (R,G,B) consisting of its red, green, and blue components. In a typical image, there is a significant amount of correlation between these components. For this reason, we will use a color space transform to produce a new vector whose components represent luminance, Y, and blue and red chrominance, Cb and Cr.
The luminance describes the brightness of the pixel while the chrominance carries information about its hue. These three quantities are typically less correlated than the (R, G, B) components. When we apply this transformation to each pixel in our block. we obtain three new blocks, one corresponding to each component. The Discrete Cosine Transform Instead of recording the individual values of the components, we could record, say, the average values and how much each pixel differs from this average value. This is the essence of the Discrete Cosine Transform (DCT), which will now be explained. We will first focus on one of the three components in one row in our block and imagine that the eight values are represented by f0, f1, ..., f7. We would like to represent these values in a way so that the variations become more apparent. We store these coefficients in another 8 by 8 block as shown:
the coefficients Fw,u, are real numbers, which will be stored as integers. This means that we will need to round the coefficients; as we'll see, we do this in a way that facilitates greater compression. Rather than simply rounding the coefficients Fw,u, we will first divide by a quantizing factor and then record round(Fw,u / Qw,u) when a JPEG file is created, the algorithm asks for a parameter to control the quality of the image and how much the image is compressed. The entry in the upper left corner essentially represents the average over the block. Moving to the right increases the horizontal frequency while moving down increases the vertical frequency. What is important here is that there are lots of zeroes. We now order the coefficients as shown below so that the lower frequencies appear first. Instead of recording all the zeroes, we can simply say how many appear. Reconstructing the image from the information is rather straightforward. The quantization matrices are stored in the file so that approximate values of the DCT coefficients may be recomputed. From here, the (Y, Cb, Cr) vector is found through the Inverse Discrete Cosine Transform. Then the (R, G, B) vector is recovered by inverting the color space transform.
COLOR MODEL - A color model is an abstract mathematical model describing the

way colors can be represented as tuples of numbers, typically as three or four values or color components. When this model is associated with a precise description of how the components are to be interpreted (viewing conditions, etc.), the resulting set of colors is called color space. This section describes ways in which human color vision can be modelled. Mainly used models are followingRGB (Red Green Blue) color model- Media that transmit light (such as television)
use additive color mixing with primary colors of red, green, and blue, each of which stimulates one of the three types of the eye's color receptors with as little stimulation as possible of the other two. This is called "RGB" color space. The main purpose of the RGB color model is for the sensing, representation, and display of images in electronic systems, such as televisions and computers, though it has also been used in conventional photography. RGB is a device-dependent color model: different devices detect or reproduce a given RGB value differently. CMYK (Cyon, magenta, yellow, key or black) - It is possible to achieve a large range of colors seen by humans by combining cyan, magenta, and yellow transparent dyes/inks
on a white substrate. These are the subtractive primary colors. Often a fourth black is added to improve reproduction of some dark colors. This is called "CMY" or "CMYK" color space. Cyan is green + blue, magenta is red + blue, and yellow is red + green. Color printers, on the other hand, are not RGB devices, but subtractive color devices (typically CMYK color model). Chromaticity Model It show color on the basis of frequency, saturation and chrominance. It is a three dimension model. It show color on x and y axis and on third dimension it shows the It is also an addictive model. It generates color by combining x and y axis. It has limited uses as not used by scanner etc device. HSB/HSL (Hue, Saturation, Brightness/Lightness) - For describing light source hue means dominant frequency saturation means priority while Brightness or luminance are used. Saturation is used to define the intensity of color. Brightness defined the amount of black or white color in image. Hue can be defined on angle in color wheel from 0 to 360. It is used to define color of image. Color Plates- A palette is either a given, finite set of colors for the management of digital
images. It shows the total number of colors that a given system is able to generate or manage. Due to video memory limitations, it may not be able to display them all simultaneously. It is used to define colors on the bases of numbers of bits provided and define the numbers of standard color in image. Color Depth 1 bit per pixel 4 bit per pixel 8 bit per pixel 16 bit per pixel 24 bit per pixel Or truecolor Color available 2 color 16 color 256 color 65536 color 16.7million color
Halftone Halftone is the reprographic technique that simulates continuous tone imagery through the use of dots, varying either in size, in shape or in spacing. Where continuous tone imagery contains an infinite range of colors or greys, the halftone process reduces visual reproductions to a binary image that is printed with only one color of ink. This binary reproduction relies on a basic optical illusion. The resolution of a halftone screen is measured in lines per inch (lpi). This is the number of lines of dots in one inch, measured parallel with the screen's angle Halftoning is also commonly used for printing color pictures. The general idea is the same, by varying the density of the four primary printing colors, cyan, magenta, yellow and black Shape of dots can be-
1.
Round dots: most common, suitable for light images, especially for skin tones. They meet at a tonal value of 70%. Elliptical dots: appropriate for images with many objects. Elliptical dots meet at the tonal values 40% (pointed ends) and 60% (long side), so there is a risk of a pattern.
2.
3.
Square dots: best for detailed images, not recommended for skin tones. The corners meet at a tonal value of 50%.
Dithering
Full-color photographs may contain an almost infinite range of color values. Dithering is
the most common means of reducing the color range of images down to the 256 (or fewer) colors seen in 8-bit GIF images.
Dithering is the process of juxtaposing pixels of two colors to create the illusion that a third
color is present. A simple example is an image with only black and white in the color palette. By combining black and white pixels in complex patterns a graphics program like Adobe Photoshop can create the illusion of gray values
White Balance
In photography and image processing, color balance is the global adjustment of the intensities of
the colors (typically red, green, and blue primary colors). An important goal of this adjustment is to render specific colors particularly neutral colors correctly; hence, the general method is sometimes called Gray balance, neutral balance, or white balance.
Color temperature is used to measure light source intensity which is the ration of blue light on red
light.
By this color cast can be created which is used to define lighting in a image. The color balance operations in popular image editing applications usually operate directly on the red, green, and blue channel pixel values, without respect to any color sensing or reproduction model. In shooting film, color balance is typically achieved by using color correction filters over the lights or on the camera lens Color balances normally reserved to refer to correction for differences in the ambient illumination conditions. Dynamic range correction This term is used to show the measure of possible changeable quality of a value. It is used to developed range of luminance. Dynamic range sensor is used to adjust the value of dark area in digital photography. Human eye dynamic range is from reduce sunlight to bright sunlight. It can also view object in moon light which illumination is of 1/10 but the range use in digital camera are normally less than human eye.
Gamma Correction/ Gamma encoding / Gamma Nonlinearity/ Gamma-
Gamma is the name of a nonlinear operation used to code and decode luminance in video or still image systems. Gamma correction is, in the simplest cases, defined by the following power-law expression:
Where A is a constant and the input and output values are non-negative real values; in the common case of A = 1, inputs and outputs are typically in the range 01. A gamma value < 1 is sometimes called an encoding gamma, and the process of encoding with this compressive power-law nonlinearity is called gamma compression; conversely a gamma value > 1 is called a decoding gamma and the application of the expansive power-law nonlinearity is called gamma expansion. Gamma encoding of images is required to compensate for properties of human vision - to maximize the use of the bits or bandwidth relative to how humans perceive light and color. Gamma encoding of floating point images is not required (and may be counterproductive) because the floating point format already provides a pseudologarithmic encoding. Photo Retouching It is a application of image editing techniques to photographs in order to create an illusion (in contrast to mere enhancement or correction), through analog or digital means. There are several subtypes of digital image-retouching: Technical retouching, Creative retouching etc. IT is used in advertising photography, background improvement etc.
UNIT V - VIDEO AND ANIMATION

Videos Video is the technology of electronically capturing, recording, processing,
storing, transmitting, and reconstructing a sequence of still motion. Videos are mainly of following types-
images representing scenes in
Component Video: Higher End video makes use of three separate video signals Red, Green and Blue image plane. It gives best color reproduction since there is no cross talk between the three different channels. Composite Video: It is also called CVBS (colour video baseband signal OR colour, video, blanking and sync), contains luminance (intensity), chrominance (colour), sync information in one signal. This type of signals used in broadcast Colour TV. S- video: Also called separated video or super video. It used two wires, one for luminance (intensity) and second for chrominance (colour) signal. The special S- Video connector also carries left and right signals for stereo.
Characteristics of video streams

Video field - In video,
a field is one of the many still images which are displayed sequentially to create the impression of motion on the screen. Two fields comprise one video frame.
Video frame is one of the many still (or nearly so) images which compose the complete moving picture. Frame rate is the number of still pictures per unit of time of video, ranges from six or eight frames per second (frame/s) for old mechanical cameras to 120 or more frames per second for new professional cameras. The minimum frame rate to achieve the illusion of a moving image is about fifteen frames per second. Video scanning- It can be of two types1. Interlace scanning: Interlacing divides each
line into odd and even ones and then alternately scans and refreshes them at 30 frames per second. The slight delay between odd and even line refreshes creates some distortion or flicker. This is because only half the lines keep up with the moving image while the other half waits to be refreshed. This type of scanning can be commonly seen in the traditional CRT monitors.
2. Progressive scanning: With the advent of LCD (Liquid Crystal Display) more efficient and better way of scanning the image was introduced know as progressive scanning. Unlike a interlace system which scans odd and alternately even lines (1,3,5 etc and 2,4,6 etc..) every 1/30th of a second, progressive system scans the line sequentially (1,2,3) every 1/60th and produces a complete and flicker less picture. Using progressive scanning, a smoother and much detailed image with finer details can be produced.
Difference between interlace and progressive scanning 1) Interlace scans odd lines first and alternates to scan even lines whereas a progressive system scans sequentially. 2) Interlace scans every 1/30th of a second and progressive every 1/60th. 3) Progressive produces much better and finer picture quality.
4) Less native source is available in 1080p format. 1. Aspect Ratio: Aspect ratio describes the dimensions of video screens and video picture elements. All popular video formats are rectilinear, and so can be described by a ratio between width and height. The screen aspect ratio of a traditional television screen is 4:3, or about 1.33:1. High definition televisions use an aspect ratio of 16:9, or about 1.78:1. The aspect ratio of a full 35 mm film frame with soundtrack (also known as the Academy ratio) is 1.375:1. 2. Bit rate is a measure of the rate of information content in a video stream. It is quantified using the bit per second (bit/s or bps) unit or Megabits per second (Mbit/s). A higher bit rate allows better video quality. 3. The display resolution of a digital television or display device is the number of distinct pixels in each dimension that can be displayed. It is usually quoted
as width height, with the units in pixels: for example, "1024768" means the width is 1024 pixels and the height is 768 pixels. 4. Monitor Refresh Rate: To make and hold the image the image is reconstruct multiple times in order to maintain the appearance of image. Number of times pixel reconstructed on a monitor screen in per unit time is called monitor refresh rate. Higher refresh rate generate flicker less images. 5. Dot-Pitch It govern picture sharpness in color monitor. It is the physical distance of 2 pixel which is measure in mm. By decreasing the dot pitch distance we can increase the image quality. Analog video is a video signal transferred by an analog signal. An analog color video signal contains luminance, brightness (Y) and chrominance (C) of an analog television image. When combined in to one channel, it is called composite video. Analog video may be carried in separate channels, as in two channel S-Video (YC) and multi-channel component video formats. Analog video is used in both consumer and professional television production applications. Digital video is a type of digital recording system that works by using a digital rather than an analog video signal. Digital video comprises a series of orthogonal bitmap digital images displayed in rapid succession at a constant rate. In the context of video these images are called frames. We measure the rate at which frames are displayed in frames per second (FPS). An example video can have a duration (T) of 1 hour (3600sec), a frame size of 640x480 (WxH) at a color depth of 24bits and a frame rate of 25fps. This example video has the following properties:

pixels per frame = 640 * 480 = 307,200 bits per frame = 307,200 * 24 = 7,372,800 = 7.37Mbits bit rate (BR) = 7.37 * 25 = 184.25Mbits/sec video size (VS) = 184Mbits/sec * 3600sec = 662,400Mbits = 82,800Mbytes = 82.8Gbytes
Video compression
Loss less
Lossy
File format- AVI, Swf, Mov,DAt,MPEG(I /P / B frame)
MPEG 1/ 2/ 4/ 21
Tape format- Ampex/ VERA/ U- matic/ Betamax /Betacam / VCR/ cvc / cam coder/dv/vcd/dvd
Cel animation
Computer animation
Morphing

Unit Iii Audio Fundamental and Representaion

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit Iii Audio Fundamental and Representaion

Uploaded by

Copyright:

Available Formats

UNIT III AUDIO FUNDAMENTAL AND REPRESENTAION

Characteristics of sound signal

Audio file Formats Audio Interchange File Format (AIFF)

2. The common filename extension is .amr.

MP4 (MPEG part4)

Asf (Advance streaming format)

AVI (Audio video Interleaved)

TRANSFER OF AUDIO OVER INTERNET- To send Audio on internet a specific

The current MIDI specification includes:

iii) A mental image exists in an individual's mind: something one remembers or

v) Intensity images measure the amount of light impinging on a photosensitive device.

iv) Processing power Compression algorithms require different amounts of processing

PNG (Portable Network Graphics)

GIF (Graphics Interchange Format)

BMP (Windows bitmap)

TIFF (Tagged Image File Format)

EXIF (Exchangeable image file format)

JPEG (Joint Photographic Experts Group)

2. DCT (Digital Cosine Transform)

COLOR MODEL - A color model is an abstract mathematical model describing the

Gamma Correction/ Gamma encoding / Gamma Nonlinearity/ Gamma-

UNIT V - VIDEO AND ANIMATION

images representing scenes in

Characteristics of video streams

File format- AVI, Swf, Mov,DAt,MPEG(I /P / B frame)

You might also like