You are on page 1of 54

A negative

Main article: Photographic film

A 35 mmfilmstrip.

Film for 135 film cameras comes in long narrow strips of chemical-coated plastic or cellulose acetate. After each image is captured by the camera onto the film strip, the film strip is advanced so that the next image is projected onto unexposed film. When the film is developed, it is a long strip of small negative images. This strip is often cut into sections for easier handling. In larger cameras this piece of film may be as large as a full sheet of paper or even larger, with a single image captured onto one piece. Each of these negative images may be referred to as a negative and the entire strip or set of images may be collectively referred to as negatives. These negative images are the master images, from which all other copies will be made, and thus they are treated with care. [edit]Negative


A positive image is a normal image. A negative image is a total inversion of a positive image, in which light areas appear dark,dark is light and vice versa. A negative color image is additionally color reversed, with red areas appearing cyan, greens appearing magenta and blues appearing yellow. This sometimes can have a reverse effect and cause the greens to appear a reddish brown. Film negatives usually also have much less contrast than the final images. This is compensated by the higher contrast reproduction by photographic paper or by increasing the contrast when scanning and post processing the scanned images. [edit]Negative


Many photographic processes create negative images: the chemicals involved react when exposed to light, and during developing these exposed chemicals are retained and become opaque while the unexposed chemicals are washed away. However, when a negative image is created from a negative image (just like multiplying two negative numbers in mathematics) a positive image results (see Color print film, C-41 process). This makes most chemical based photography a two step process. These are called negative films and processes. Special films and development processes have been devised such that positive images can be created directly from film; these are called positive, or slide, or (perhaps confusingly) reversal film (see Transparency, Black and white reversal film, E-6 process).

Despite the market's evolution away from film, there is still a desire and market for products which allow fine art photographers to produce negatives from digital images for their use in alternative processes such as cyanotypes, gum bichromate, platinum prints, and many others.[1]

Edge detection
From Wikipedia, the free encyclopedia

Edge detection is a fundamental tool in image processing and computer vision, particularly in the areas of feature detection and feature extraction, which aim at identifying points in adigital image at which the image brightness changes sharply or, more formally, has discontinuities. The same problem of finding discontinuities in 1D signals is known as step detection.


Canny edge detection applied to a photograph

The purpose of detecting sharp changes in image brightness is to capture important events and changes in properties of the world. It can be shown that under rather general assumptions for an image formation model, discontinuities in image brightness are likely to correspond to[1][2]:  discontinuities in depth,

  

discontinuities in surface orientation, changes in material properties and variations in scene illumination.

In the ideal case, the result of applying an edge detector to an image may lead to a set of connected curves that indicate the boundaries of objects, the boundaries of surface markings as well as curves that correspond to discontinuities in surface orientation. Thus, applying an edge detection algorithm to an image may significantly reduce the amount of data to be processed and may therefore filter out information that may be regarded as less relevant, while preserving the important structural properties of an image. If the edge detection step is successful, the subsequent task of interpreting the information contents in the original image may therefore be substantially simplified. However, it is not always possible to obtain such ideal edges from real life images of moderate complexity. Edges extracted from non-trivial images are often hampered by fragmentation, meaning that the edge curves are not connected, missing edge segments as well as false edges not corresponding to interesting phenomena in the image – thus complicating the subsequent task of interpreting the image data.[3] Edge detection is one of the fundamental steps in image processing, image analysis, image pattern recognition, and computer vision techniques. During recent years, however, substantial (and successful) research has also been made on computer vision methods[which?] that do not explicitly rely on edge detection as a pre-processing step. [edit]Edge


The edges extracted from a two-dimensional image of a three-dimensional scene can be classified as either viewpoint dependent or viewpoint independent. A viewpoint independent edgetypically reflects inherent properties of the three-dimensional objects, such as surface markings and surface shape. A viewpoint dependent edge may change as the viewpoint changes, and typically reflects the geometry of the scene, such as objects occluding one another. A typical edge might for instance be the border between a block of red color and a block of yellow. In contrast a line (as can be extracted by a ridge detector) can be a small number of pixels of a different color on an otherwise unchanging background. For a line, there may therefore usually be one edge on each side of the line.

Edge detection refers to the process of identifying and locating sharp discontinuities in an image. The discontinuities are abrupt changes in pixel intensity which characterize boundaries of objects in a scene. Classical methods of edge detection involve convolving the image with an operator (a 2-D filter), which is constructed to be sensitive to large gradients in the image while returning values of zero in uniform regions.

There are an extremely large number of edge detection operators available, each designed to be sensitive to certain types of edges. Variables involved in the selection of an edge detection operator include Edge orientation, Noise environment and Edge structure. The geometry of the operator determines a characteristic direction in which it is most sensitive to edges. Operators can be optimized to look for horizontal, vertical, or diagonal edges. Edge detection is difficult in noisy images, since both the noise and the edges contain highfrequency content. Attempts to reduce the noise result in blurred and distorted edges. Operators used on noisy images are typically larger in scope, so they can average enough data to discount localized noisy pixels.

Edge detection is a fundamental tool used in most image processing applications to obtain information from the frames as a precursor step to feature extraction and object segmentation. This process detects outlines of an object and boundaries between objects and the background in the image. An edge-detection filter can also be used to improve the appearance of blurred or anti-aliased image streams. The basic edge-detection operator is a matrix area gradient operation that determines the level of variance between different pixels. The edge-detection operator is calculated by forming a matrix centered on a pixel chosen as the center of the matrix area. If the value of this matrix area is above a given threshold, then the middle pixel is classified as an edge. Examples of gradient-based edge detectors are Roberts, Prewitt, and Sobel operators. All the gradient-based algorithms have kernel operators that calculate the strength of the slope in directions, which are orthogonal to each other, commonly vertical and horizontal. Later, the contributions of the different components of the slopes are combined to give the total value of the edge strength. The Prewitt operator measures two components.

An Introduction to Image Compression
Compressing an image is significantly different than compressing raw binary data. Of course, general purpose compression programs can be used to compress images, but the result is less than optimal. This is because images have certain statistical properties which can be exploited by encoders specifically designed for them. Also, some of the finer details in the image can be sacrificed for the sake of saving a little more bandwidth or storage space. This also means that lossy compression techniques can be used in this area. Lossless compression involves with compressing data which, when decompressed, will be an exact replica of the original data. This is the case when binary data such as executables, documents etc. are compressed. They need to be exactly reproduced when decompressed. On the other hand, images (and music too) need not be reproduced 'exactly'. An approximation of the original image is enough for most purposes, as long as the error between the original and the compressed image is tolerable.
Error Metrics

Two of the error metrics used to compare the various image compression techniques are the Mean Square Error (MSE) and the Peak Signal to Noise Ratio (PSNR). The MSE is the cumulative squared error between the compressed and the original image, whereas PSNR is a measure of the peak error. The mathematical formulae for the two are

MSE = PSNR = 20 * log10 (255 / sqrt(MSE))

a higher value of PSNR is good because it means that the ratio of Signal to Noise is higher. . One example is the Fractal Image Compression technique. But there are exceptions. and as seen from the inverse relation between the MSE and PSNR. The steps involved are 1. based on their importance. either by processing each of the colour planes separately.N are the dimensions of the images. So. The usual steps involved in compressing an image are 1. or by transforming the image from RGB representation to other convenient representations like YUV in which the processing is much easier. 4. Remember. but some latest methods promise to speed up the process. this is how 'most' image compression techniques work. The Outline We'll take a close look at compressing grey scale images.y) is the original image. if you find a compression scheme having a lower MSE (and a high PSNR). the 'signal' is the original image. you can recognise that it is a better one.y) is the approximated version (which is actually the decompressed image) and M. Specifying the Rate (bits available) and Distortion (tolerable error) parameters for the target image. 5. Encode each class separately using an entropy coder and write to the file. Here. A lower value for MSE means lesser error. Logically. where possible self similarity within the image is identified and used to reduce the amount of data required to reproduce the image. The algorithms explained can be easily extended to colour images. I'(x. (reverse of step 5). 2. 3. using an entropy decoder. Dividing the available bit budget among these classes. Traditionally these methods have been time consuming. such that the distortion is a minimum. Literature regarding fractal image compression can be found at <findout>. this translates to a high value of PSNR. Quantize each class separately using the bit allocation information derived in step 3. and the 'noise' is the error in reconstruction. Read in the quantized data from the file. Reconstructing the image from the compressed data is usually a faster process than compression.where I(x. Dividing the image data into various classes.

(November 2011) In computer science and information theory.2 Data differencing . and the computational resources required to compress and uncompress the data. No information is lost in lossless compression. Because compressed data must be decompressed to be used. the amount of distortion introduced (e. source coding. when using lossy data compression).. (reverse of step 4).[1] or bitrate reduction involves encoding information using fewer bits than the original representation. data compression. (reverse of step 2). Dequantize the data.2. Contents [hide] • • • 1 Lossy 2 Lossless 3 Theory o 3. Unsourced material may be challenged and removed.g. For instance. The design of data compression schemes involve trade-offs among various factors. Lossy compression reduces bits by identifying marginally important information and removing it. 3. see Source code. Rebuild the image. For the term in computer programming. the free encyclopedia Jump to: navigation. This article needs additional citations for verification. Lossless compression reduces bits by identifying and eliminating statistical redundancy. Compression can be either lossy or lossless. and the option to decompress the video in full before watching it may be inconvenient or require additional storage. Data compression From Wikipedia. search "Source coding" redirects here.1 Machine learning o 3. Compression is useful because it helps reduce the consumption of resources such as data space or transmission capacity. Please help improve this article by adding citations to reliable sources. this extra processing imposes computational or other costs through decompression. including the degree of compression. a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed.

1. instead of coding "red pixel. there are many schemes to reduce size by eliminating redundancy. detail can be dropped from the data to save storage space. In these schemes. Voice compression is used in Internet telephony for example. Lossy image compression is used in digital cameras.1. Generally. so that "speech compression" or "voice coding" is sometimes distinguished as a separate discipline from "audio compression".1 Coding methods  5. There is a corresponding trade-off between information lost and the size . an image may have areas of colour that do not change over several pixels. For example.1. red pixel. In lossy audio compression. DVDs use the lossy MPEG-2 Video codec for video compression. lossy data compression schemes are guided by research on how people perceive the data in question. the human eye is more sensitive to subtle variations in luminance than it is to variations in color.2..2. Different audio and speech compression standards are listed under audio codecs. to increase storage capacities with minimal degradation of picture quality.2 Video  5.1.1 Encoding theory  5.1.2 Speech encoding  5. This is a simple example of run-length encoding. Lossless compression is possible because most real-world data has statistical redundancy. Depending upon the application. JPEG image compression works in part by "rounding off" less-important visual information. Similarly.1 Audio  5. For example." the data may be encoded as "279 red pixels".1.. Compression of human speech is often performed with even more specialized techniques.2 Timeline 6 See also 7 References 8 External links [edit] Lossy Lossless data compression algorithms usually exploit statistical redundancy to represent data more concisely without losing information.2 History o 5. methods of psychoacoustics are used to remove non-audible (or less audible) components of the signal.1 Lossy audio compression  5. while audio compression is used for CD ripping and is decoded by audio players. . some loss of information is acceptable. [edit] Lossless Lossless data compression is contrasted with lossy data compression.• • • • • 4 Outlook and currently unused potential 5 Uses o 5.

huge versioned document collection. Sequitur and Re-Pair are practical grammar compression algorithms which public codes are available. and Cleary. In a further refinement of these techniques. and lends itself especially well to adaptive data compression tasks where the predictions are strongly context-dependent. biological data collection of same or related species. DEFLATE is used in PKZIP. invented by Jorma Rissanen. A number of popular compression formats exploit these perceptual differences. Neal. for instance.g. Also noteworthy are the LZR (LZ–Renau) methods. Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is. The table itself is often Huffman encoded (e. rather than as the original run. is an inverse-arithmetic-coder. The class of grammar-based codes are recently noticed because they can extremely compress highly-repetitive text. The Burrows–Wheeler transform can also be viewed as an indirect form of statistical modelling. which serve as the basis of the Zip method. statistical predictions can be coupled to an algorithm called arithmetic coding. These fields of study were essentially created by Claude Shannon. gzip and PNG. LZ methods utilize a table-based compression model where table entries are substituted for repeated strings of data. LZW (Lempel–Ziv–Welch) is used in GIF images. who published fundamental papers on the topic in the late 1940s and early 1950s. sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count. Arithmetic coding is used in the bilevel image-compression standard JBIG. such as prediction by partial matching. but compression can be slow. this table is generated dynamically from earlier data in the input. images. and turned into a practical method by Witten. This is . LZX). A current LZ-based coding scheme that performs well is LZX. achieves superior compression to the better-known Huffman algorithm. including those used in music files. and the document-compression standard DjVu. and by rate– distortion theory for lossy compression. Dasher. used in Microsoft's CAB format. The text entry system. The Lempel–Ziv (LZ) compression methods are among the most popular algorithms for lossless storage. For most LZ methods. The idea of data compression is deeply connected with statistical inference. Coding theory is also related. internet archives.reduction. SHRI. etc. and video. Arithmetic coding. DEFLATE is a variation on LZ which is optimized for decompression speed and compression ratio. The very best modern lossless compressors use probabilistic models. The basic task of grammar-based codes is constructing a context-free grammar deriving a single string. [edit] Theory The theoretical background of compression is provided by information theory (which is closely related to algorithmic information theory) for lossless compression.

The run-length code represents the original 67 characters in only 18. RLE also refers to a little-used image format in Windows 3. which is a Run Length Encoded Bitmap. used to compress the Windows 3. newer compression methods such as DEFLATE often use LZ77-based algorithms. Of course. etc. although JPEG uses it quite effectively on the coefficients that remain after transforming and quantizing image blocks. the actual format used for the storage of images is generally binary rather than ASCII characters like this. a generalization of run-length encoding that can take advantage of runs of strings of characters (such as BWWBWWBWWBWW). Contents [hide] • • • • 1 Example 2 Applications 3 See also 4 External links [edit]Example For example. Let us take a hypothetical single scan line. consider a screen containing plain black text on a solid white background. It is not useful with files that don't have many runs as it could greatly increase the file size. file format specifications often dictate repeated bytes in files as padding space. one B. with B representing a black pixel and W representing white: WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWW WW If we apply the run-length encoding (RLE) data compression algorithm to the above hypothetical scan line. However. There will be many long runs of white pixels in the blank space. twelve Ws. and many short runs of black pixels within the text. line drawings. Even binary data files can be compressed with this method. . three Bs. we get the following: 12W1B12W3B24W1B14W This is to be interpreted as twelve Ws. simple graphic images such as icons.most useful on data that contains many such runs: for example. [edit]Applications Run-length encoding performs lossless data compression and is well suited to palettebased iconic images.x. with the extension rle. It does not work well at all on continuous-tone images such as photographs. and animations. but the principle remains the same.x startup screen.

One of the main types of entropy coding creates and assigns a unique prefix-free code to each unique symbol that occurs in the input. Entropy encoding From Wikipedia. These static codes include universal codes (such as Elias gamma coding or Fibonacci coding) and Golomb codes (such as unary coding or Rice coding). Therefore. the optimal code length for a symbol is −logbP. where b is the number of symbols used to make output codes and P is the probability of the input symbol. PackBits. The length of each codeword is approximately proportional to the negativelogarithm of the probability. the most common symbols use the shortest codes. with occasional interruptions of black. the free encyclopedia In information theory an entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium. unknown data is then classified by feeding the uncompressed data to each compressor and seeing which compressor yields the highest compression. [edit]Entropy as a measure of similarity Besides using entropy encoding as a way to compress digital data. Run-length encoding is used in fax machines (combined with other techniques into Modified Huffman coding). PCX and ILBM. an entropy encoder can also be used to measure the amount of similarity between streams of data. According to Shannon's source coding theorem. It is relatively efficient because most faxed documents are mostly white space. The coder with the best compression is probably the coder trained on the data that was most similar to the unknown data. If the approximate entropy characteristics of a data stream are known in advance (especially for signal compression). a simpler static code may be useful. .Common formats for run-length encoded data include Truevision TGA. These entropy encoders then compress data by replacing each fixedlength input symbol by the corresponding variable-length prefix-free output codeword. Two of the most common entropy encoding techniques are Huffman coding and arithmetic coding. This is done by generating an entropy coder/compressor for each class of data.

as opposed to 288 bits if 36 characters of 8 bits were used.Huffman coding From Wikipedia. (January 2011) Huffman tree generated from the exact frequencies of the text "this is an example of a huffman tree". The frequencies and codes of each character are below. Encoding the sentence with this code requires 135 bits. but its sources remain unclear because it has insufficient inline citations. (This assumes that the code tree structure is known to the decoder and thus does not need to be counted as part of the transmitted information. the free encyclopedia This article includes a list of references. Please help to improve this article by introducing more precise citations.) Char Freq Code space 7 111 a 4 010 e 4 000 f 3 1101 h 2 1010 i 2 1000 m 2 0111 n 2 0010 .

Char Freq Code s 2 1011 t 2 0110 l 1 11001 o 1 00110 p 1 10011 r 1 11000 u 1 00111 x 1 10010 In computer science and information theory. Huffman coding uses a specific method for choosing the representation for each symbol. It was developed by David A. Huffman was able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code. resulting in a prefix code (sometimes called "prefix-free codes". and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes". Huffman while he was a Ph. Huffman coding is an entropy encoding algorithm used for lossless data compression. that is. student at MIT.[citation needed] .D. the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. A method was later found to design a Huffman code in linear time if input probabilities (also known as weights) are sorted.

For a set of symbols with a uniform probability distribution and a number of members which is a power of two.1 Informal description 2. In the case of known independent and identically-distributed random variables. the latter of which is useful when input probabilities are not precisely known or vary significantly within the stream.g. or when the probability mass functions are unknown.. or not independent (e. combining symbols together reduces inefficiency in a way that approaches optimality as the number of symbols combined increases.1 Compression 3. or context-dependent probabilities. Other methods such as arithmetic coding andLZW coding often have better compression capability: both of these methods can combine an arbitrary number of symbols for more efficient coding. it can be used adaptively. not identically distributed. Huffman coding is such a widespread method for creating prefix codes that the term "Huffman code" is widely used as a synonym for "prefix code" even when such a code is not produced by Huffman's algorithm. the limitations of Huffman coding should not be overstated.. Contents [hide] • • o o o • o o • • o o o o o 1 History 2 Problem definition 2.2 Adaptive Huffman coding 5.2 Decompression 4 Main properties 5 Variations 5.3 Huffman template algorithm 5. and generally adapt to the actual input statistics.2 Formalized description 2.1 n-ary Huffman coding 5. ASCII coding.5 Huffman coding with unequal letter costs .4 Length-limited Huffman coding 5. Although Huffman's original algorithm is optimal for a symbol-by-symbol coding (i.g. "cat" is more common than "cta"). accommodating unknown.3 Samples 3 Basic technique 3. However. Huffman coding is equivalent to simple binary block encoding. e. a stream of unrelated symbols) with a known input probability distribution. changing. it is not optimal when the symbol-bysymbol restriction is dropped.e.

o o • • • • • 5.7 The canonical Huffman code 6 Applications 7 See also 8 Notes 9 References 10 External links [edit]History In 1951. was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient.e. Code . . the student outdid his professor. The professor. which is the set of the (positive) symbol weights . Robert M. Alphabet Set description .[1] In doing so. assigned a term paper on the problem of finding the most efficient binary code. Output. Find A prefix-free binary code (a set of codewords) with minimum expected codeword length (equivalently. [edit]Formalized Input. which is the set of (binary) codewords. Huffman. Huffman avoided the major flaw of the suboptimalShannon-Fano coding by building the tree from the bottom up instead of from the top down.6 Optimal alphabetic binary trees (Hu-Tucker coding) 5. [edit]Problem [edit]Informal Given definition description A set of symbols and their weights (usually proportional to probabilities). a tree with minimum weighted path length from the root). Huffman and his MIT information theory classmates were given the choice of a term paper or a final exam. David A. . who had worked with information theory inventor Claude Shannon to develop a similar code. which is the symbol alphabet of size n. (usually proportional to probabilities). i. unable to prove any codes were the most efficient. Fano.

the code is termed a complete code.00 Optimality Information content (in bits) 3. . meaning that the code is uniquely decodeable.60 0. as a result. Goal. .10 0.32 0.74 2. you can always derive an equivalent code by adding extra symbols (with associated null probabilities).15 0.58 L(C) = 2. If this is not the case. to make the code complete while keeping it biunique.518 H(A) = 2.29 =1 a b c d e Sum Codewords (ci) 010 011 11 00 10 Output C Codeword length (in bits) (li) 3 3 2 2 2 Weighted path length (li wi ) 0.30 0. the sum of the probability budgets across all symbols is always less than or equal to one.423 0. Let Condition: for any code be the weighted path length of code C.45 0.16 0.411 0.32 2.74 1.30 0.332 0.64 1.205 For any code that is biunique. the sum is strictly equal to one.521 0.where ci is the codeword for . In this example. [edit]Samples Symbol (ai) Input (A.25 Probability budget (2-li) 1/8 1/8 1/4 1/4 1/4 = 1. W) Weights (wi) 0.79 (−log2 wi) ≈ Entropy (−wi log2 wi) 0.

The tree can then be read backwards.05}.0. but it is always one of the codes minimizing L(C). So for simplicity. across all symbols ai with non-zero probability wi. symbols with zero probability can [edit]Basic technique [edit]Compression A source generates 4 different symbols {a1. a Huffman code need not be unique.35. but it is very close to the theoretical limit established by Shannon.2. So not only is this code optimal in the sense that no other feasible code performs better.) As a consequence of Shannon's source coding theorem. the weighted average codeword length is 2. from right to left.4. of the information content of each symbol: (Note: A symbol with zero probability has zero contribution to the entropy. A binary tree is generated from left to right taking the two least probable symbols and putting them together to form another equivalent symbol having a probability that equals the sum of the two symbols.a4} with probability{0. The final Huffman code is: . since be left out of the formula above.205 bits per symbol. the entropy is a measure of the smallest codeword length that is theoretically possible for the given alphabet with associated weights.0. in general.As defined by Shannon (1948). the information content h (in bits) of each symbol ai with nonnull probability is The entropy H (in bits) is the weighted sum. only slightly larger than the calculated entropy of 2. assigning different bits to different branches.a2.25 bits per symbol. In this example. Note that. The process is repeated until there is just one symbol.a3.0.

the Huffman tree. then a new node whose children are the 2 nodes with smallest probability is created. A Huffman tree that omits unused symbols produces the most optimal code lengths. which contain the symbol itself.74 bits/symbol. it is still far from the theoretical limit because the probabilities of the symbols are different from negative powers of two. such that the new node's probability is equal to the sum of the children's probability. . and with the new node being now considered. but the entropy of the source is 1. Initially. Internal nodes contain symbol weight. As a common convention. These can be stored in a regular array. n. bit '0' represents following the left child and bit '1' represents following the right child. If this Huffman code is used to represent the signal. A node can be either a leaf node or an internal node. links to two child nodes and the optional link to a parent node.85 bits/symbol. The technique works by creating a binary tree of nodes. a link to a parent node which makes it easy to read the code (in reverse) starting from a leaf node. all nodes are leaf nodes. the size of which depends on the number of symbols. The process essentially begins with the leaf nodes containing the probabilities of the symbol they represent. Create a leaf node for each symbol and add it to the priority queue. theweight (frequency of appearance) of the symbol and optionally. With the previous 2 nodes merged into one node (thus not considering them anymore). The simplest construction algorithm uses a priority queue where the node with lowest probability is given highest priority: 1.Symbol Code a1 0 a2 10 a3 110 a4 111 The standard way to represent a signal made of 4 symbols is by using 2 bits/symbol. then the average length is lowered to 1. the procedure is repeated until only one node remains. A finished tree has up to n leaf nodes and n − 1 internal nodes.

Since efficient priority queue data structures require O(log n) time per insertion. Start with as many leaves as there are symbols. this algorithm operates in O(n log n) time. where n is the number of symbols. Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities. this is not actually the case because the symbols need to be sorted by probability before-hand. a process that takes O(n log n) time in itself.2. Create a new internal node. 2. 2. and combined weights (along with pointers to the trees) being put in the back of the second queue. 3. The remaining node is the root node. The remaining node is the root node and the tree is complete. Remove the two nodes of highest priority (lowest probability) from the queue 2. 3. with the two just-removed nodes as children (either node can be either child) and the sum of their weights as the new weight. 4. the first one containing the initial weights (along with pointers to the associated leaves). and a tree with n leaves has 2n−1 nodes. . there is a linear-time (O(n)) method to create a Huffman tree using two queues. Although this algorithm may appear "faster" complexity-wise than the previous algorithm using a priority queue. Dequeue the two nodes with the lowest weight by examining the fronts of both queues. Add the new node to the queue. Enqueue the new node into the rear of the second queue. Enqueue all leaf nodes into the first queue (by probability in increasing order so that the least likely item is in the head of the queue). the tree has now been generated. If the symbols are sorted by probability. 3. While there is more than one node in the queue: 1. 3. This assures that the lowest weight is always kept at the front of one of the two queues: 1. While there is more than one node in the queues: 1.

0 ≤ n < 1. Normally. as in the ASCII code. a string of characters such as the words "hello there" is represented using a fixed number of bits per character. This modification will retain the mathematical optimality of the Huffman coding while both minimizing variance and minimizing the length of the longest character code. It is generally beneficial to minimize the variance of codeword length. the free encyclopedia Arithmetic coding is a form of variable-length entropy encoding used in lossless data compression. simply break ties between queues by choosing the item in the first queue. time complexity is not very important in the choice of algorithm here. Here's an example using the French subject string "j'aime aller sur le bord de l'eau les jeudis ou les jours impairs": Arithmetic coding From Wikipedia. since n here is the number of symbols in the alphabet. arithmetic coding encodes the entire message into a single number.2 Defining a model 1. When a string is converted to arithmetic encoding. frequently used characters will be stored with fewer bits and not-so-frequently occurring characters will be stored with more bits.0).3 Encoding and decoding: overview .In many cases. resulting in fewer bits used in total.1 Equal probabilities 1. a communication buffer receiving Huffman-encoded data may need to be larger to deal with especially long symbols if the tree is especially unbalanced. which is typically a very small number (compared to the length of the message to be encoded). Contents [hide] • o o o 1 Implementation details and examples 1. For example. Arithmetic coding differs from other forms of entropy encoding such as Huffman coding in that rather than separating the input into component symbols and replacing each with a code. whereas complexity analysis concerns the behavior when n grows to be very large. a fraction n where (0. To minimize variance.

For example. The next step is to encode this ternary number using a fixed-point binary number of sufficient precision to recover it. A.1 Theoretical limit of compressed message 5 Connections with other compression methods 5. each equally likely to occur. consider a sequence taken from a set of three symbols.4 Encoding and decoding: example 1. round to 6 digits. 25% smaller than the naïve block encoding. This is feasible for long sequences because there are efficient. A more efficient solution is to represent the sequence as a rational number between 0 and 1 in base 3. To decode the value. the probability of each symbol occurring is equal. B.1 Huffman coding 5.o o • • • o • o o • • • • • • 1. see source coding .0112013.5 Sources of inefficiency 2 Adaptive arithmetic coding 3 Precision and renormalization 4 Arithmetic coding as a generalized change of radix 4. in-place algorithms for converting the base of arbitrarily precise numbers. arithmetic coders can produce near-optimal output for any given set of symbols and probabilities (the optimal value is −log2P bits for each symbol of probability P.2 Range encoding 6 US patents 7 Benchmarks and other technical characteristics 8 Teaching aid 9 See also 10 References 11 External links [edit]Implementation [edit]Equal details and examples probabilities In the simplest case.0010110012 — this is only 9 bits. knowing the original string had length 6. [edit]Defining a model In general. and C. such as 0. For example. where each digit represents a symbol. which is wasteful: one of the bit variations is never used. Simple block encoding would use 2 bits per symbol. and recover the string. the sequence "ABBCAB" could become 0. one can simply convert back to base 3.

1]. Models can even be adaptive. The more accurate this prediction is. static model for describing the output of a particular monitoring instrument over time might be:     60% chance of symbol NEUTRAL 20% chance of symbol POSITIVE 10% chance of symbol NEGATIVE 10% chance of symbol END-OF-DATA.theorem). except for the very last. the encoder has basically just three pieces of data to consider:   The next symbol that needs to be encoded The current interval (at the very start of the encoding process. Example: for the four-symbol model above: . but that will change)  The probabilities the model assigns to each of the various symbols that are possible at this stage (as mentioned earlier. is the same. when this symbol appears in the data stream.) Models can also handle alphabets other than the simple four-symbol set chosen for this example. (The presence of this symbol means that the stream will be 'internally terminated'. the decoder will know that the entire stream has been decoded. Example: a simple.) The encoder divides the current interval into sub-intervals. the closer to optimal the output will be. so that in a model for English text. Compression algorithms that use arithmetic coding start by determining a model of the data – basically a prediction of what patterns will be found in the symbols of the message. More sophisticated models are also possible: higher-order modelling changes its estimation of the current probability of a symbol based on the symbols that precede it (the context). [edit]Encoding and decoding: overview In general. The decoder must have the same model as the encoder. higher-order or adaptive models mean that these probabilities are not necessarily the same in each step. the interval is set to [0. for example. the percentage chance of "u" would be much higher when it follows a "Q" or a "q". Whichever interval corresponds to the actual symbol that is next to be encoded becomes the interval used in the next step. so that they continuously change their prediction of the data based on what the stream actually contains. each step of the encoding process. as is fairly common in data compression. each representing a fraction of the current interval proportional to the probability of that symbol in the current context.

0. In particular.36) -.60% of [0. 1).1). this indicates that the first symbol the encoder read must have been NEUTRAL.8) the interval for NEGATIVE would be [0.) The process starts with the same interval used by the encoder: [0. 0. Anyone who has the same final interval and model that is being used can reconstruct the symbol sequence that must have entered the encoder to result in that final interval.9. 0.538 falls into the sub-interval for NEUTRAL. also assuming that there are only as many digits as needed to decode the message. 0. The region is divided into subregions proportional to symbol frequencies. dividing it into the same four sub-intervals that the encoder must have. then the subregion containing the point is successively subdivided in the same way.8.538 (using decimal for clarity.6).6) . 0. [edit]Encoding and decoding: example A diagram showing decoding of 0. and using the same model. instead of binary. Consider the process for decoding a message encoded with the given four-symbol model. 0. it is only necessary to transmit one fraction that lies within that interval.9) the interval for END-OF-DATA would be [0. The fraction 0.    the interval for NEUTRAL would be [0. The message is encoded in the fraction 0. 0. the resulting interval unambiguously identifies the sequence of symbols that produced it. When all symbols have been encoded. however.538 (the circular point) in the example model. It is not necessary to transmit the final interval. Next divide the interval [0. so this is the first symbol of the message. it is only necessary to transmit enough digits (in whatever base) of the fraction so that all fractions that begin with those digits fall into the final interval.6) into sub-intervals:  the interval for NEUTRAL would be [0. [0.6) the interval for POSITIVE would be [0.6.

36. or else the message would be ambiguous without external information such as compressed stream size.6) Since .) This 8 bit output is larger than the information content. mistakenly reading more symbols from the fraction than were in fact encoded into it. the entropy of the message would be 1.48. the information content of a three-digit decimal is approximately 9. and a binary interval of [1011110. Since it is also the internal termination symbol. therefore.540).48) -. 0.71 bits is caused by the short example message not being able to exercise the coder effectively.516.54). 0. 0. there needs to be some other way to indicate where the stream stops. 6/27). the same message could have been encoded in the binary fraction 0. which is 1. 0.535.10001010 (equivalent to 0.6) the interval for NEGATIVE would be [0. 0. Again divide our current interval into sub-intervals:     the interval for NEUTRAL would be [0. 0. 0. If the intervals are readjusted for these frequencies.   the interval for POSITIVE would be [0. [5/27.1. 1110001). The claimed symbol probabilities were [0.6) the interval for END-OF-DATA would be [0.539. Now .20% of [0. 0. 0.534) the interval for END-OF-DATA would be [0. the decoding process could continue forever. or entropy of the message.537 or 0. 0.48.10% of [0. 0.33]. -.538 is within the interval [0.71 bits. or just 3 bits. This is correct. [1/9. 0. 0.516) the interval for POSITIVE would be [0. it means the decoding is complete. 2/9).6). 0.54.528.33. 0. 0. (The final zero must be specified in the binary fraction.54) -.33. The large difference between the example's 8 (or 7 with external compressed data size information) bits of output and the entropy of 4.6. This . Otherwise. 0. 1/3).534.10% of [0. 0.58 bits and the same NEUTRAL NEGATIVE ENDOFDATA message could be encoded as intervals [0.534. 0.5390625 decimal) at a cost of only 8 bits.48. this must be the next symbol.57 × 3 or 4.536. [edit]Sources of inefficiency The message 0. This suggests that the use of decimal instead of binary introduced some inefficiency. If the stream is not internally terminated.538 in the previous example could have been encoded by the equally short fractions 0. 0.538 falls within the interval of the END-OF-DATA symbol. but the actual frequencies in this example are [0. This could yield an output message of 111.2.528) the interval for NEGATIVE would be [0.966 bits. the second symbol of the message must have been NEGATIVE.1].

source coding. [edit]Adaptive arithmetic coding One advantage of arithmetic coding over other similar methods of data compression is the convenience of adaptation.g. Adaptation is the changing of the frequency (or probability) tables while processing the data. data compression. Contents [hide] . the amount of distortion introduced (e. when usinglossy data compression). Data compression From Wikipedia. Lossy compression reduces bits by identifying marginally important information and removing it. Compression is useful because it helps reduce the consumption of resources such as data space or transmission capacity. The synchronization is. Compression can be either lossy or lossless. including the degree of compression. Please help improve this article by adding citations to reliable sources. the free encyclopedia "Source coding" redirects here. and the computational resources required to compress and uncompress the data. The decoded data matches the original data as long as the frequency table in decoding is replaced in the same way and in the same step as in encoding. Because compressed data must be decompressed to be used. No information is lost in lossless compression. especially if the probability model is off. based on a combination of symbols occurring during the encoding and decoding process. it may be as effective as 2 to 3 times better in the result. a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed.. Lossless compression reduces bits by identifying and eliminating statistical redundancy. This article needs additional citations for verification. Unsourced material may be challenged and removed.[1] or bit-rate reduction involves encoding information using fewer bits than the original representation. For the term in computer programming. (November 2011) In computer science and information theory. usually. this extra processing imposes computational or other costs through decompression. The design of data compression schemes involve trade-offs among various factors. For instance. see Source code. Adaptive arithmetic coding significantly improves the compression ratio compared to static also an example of how statistical coding methods like arithmetic encoding can produce an output message that is larger than the input message. and the option to decompress the video in full before watching it may be inconvenient or require additional storage.

red pixel. so that "speech compression" or "voice coding" is sometimes distinguished as . . Similarly.1. Lossless compression is possible because most real-world data has statistical redundancy. methods of psychoacoustics are used to remove non-audible (or less audible) components of the signal.2 Data differencing 4 Outlook and currently unused potential 5 Uses 5.1.1 Machine learning 3. In lossy audio compression.1.1 Audio     o   • • • 6 See also 7 References 8 External links 5.1 Encoding theory 5. For example.2 Video 5.1.1. instead of coding "red pixel. there are many schemes to reduce size by eliminating redundancy.1 Lossy audio compression 5.2 Speech encoding 5.2 History 5.2..• • • o o • • o 1 Lossy 2 Lossless 3 Theory 3. to increase storage capacities with minimal degradation of picture quality. This is a simple example of run-length encoding.2 Timeline [edit]Lossy Lossless data compression algorithms usually exploit statistical redundancy to represent data more concisely without losing information.1 Coding methods 5. Compression of human speech is often performed with even more specialized techniques.2. DVDs use the lossy MPEG-2 Video codec forvideo compression." the data may be encoded as "279 red pixels".. an image may have areas of colour that do not change over several pixels. Lossy image compression is used in digital cameras.1.

DEFLATE is used in PKZIP. The Lempel–Ziv (LZ) compression methods are among the most popular algorithms for lossless storage. and turned into a practical method by Witten. statistical predictions can be coupled to an algorithm called arithmetic coding. [edit]Lossless Lossless data compression is contrasted with lossy data compression. For most LZ methods. such as prediction by partial matching. the human eye is more sensitive to subtle variations in luminance than it is to variations in color. There is a corresponding trade-off between information lost and the size reduction.a separate discipline from "audio compression". Voice compression is used in Internet telephony for example.g. Arithmetic coding. huge versioned document collection. including those used in musicfiles. which serve as the basis of the Zip method. A current LZbased coding scheme that performs well is LZX. The very best modern lossless compressors use probabilistic models. and video. and Cleary. Also noteworthy are the LZR (LZ–Renau) methods. The table itself is often Huffman encoded (e. gzip and PNG. JPEG image compression works in part by "rounding off" less-important visual information. some loss of information is acceptable. lossy data compression schemes are guided by research on how people perceive the data in question. internet archives. Generally. In these schemes. used in Microsoft's CAB format. LZW (Lempel–Ziv– Welch) is used in GIF images. this table is generated dynamically from earlier data in the input. For example. Arithmetic coding is used in the bilevel image-compression . The class of grammar-based codes are recently noticed because they can extremely compress highlyrepetitive text. biological data collection of same or related species. but compression can be slow. for instance. while audio compression is used for CD ripping and is decoded by audio players. invented by Jorma Rissanen. Neal. A number of popular compression formats exploit these perceptual differences. Sequitur and Re-Pairare practical grammar compression algorithms which public codes are available. The basic task of grammar-based codes is constructing a context-free grammar deriving a single string. detail can be dropped from the data to save storage space. Different audio and speech compression standards are listed under audio codecs. images. and lends itself especially well to adaptive data compression tasks where the predictions are strongly context-dependent. Depending upon the application. DEFLATE is a variation on LZ which is optimized for decompression speed and compression ratio. LZ methods utilize a table-based compression model where table entries are substituted for repeated strings of data. In a further refinement of these techniques. The Burrows–Wheeler transform can also be viewed as an indirect form of statistical modelling. SHRI. etc. LZX). achieves superior compression to the better-known Huffman algorithm.

one may use the term differential compression to refer to data differencing. and the document-compression standard DjVu.standard JBIG. and by rate–distortion theory for lossy compression. who published fundamental papers on the topic in the late 1940s and early 1950s. and decompression consists of producing a target given only a compressed file. while data compression consists of producing a compressed file given a target.Coding theory is also related. [edit]Theory The theoretical background of compression is provided by information theory (which is closely related to algorithmic information theory) for lossless compression. one can consider data compression as data differencing with empty source data. These fields of study were essentially created by Claude Shannon. [edit]Machine learning See also: Machine learning There is a close connection between machine learning and compression: a system that predicts the posterior probabilities of a sequence given its entire history can be used for optimal data compression (by using arithmetic coding on the output distribution). while an optimal compressor can be used for prediction (by finding the symbol that compresses best. [edit]Outlook and currently unused potential It is estimated that the total amount of the information that is stored on the world's storage devices could be further compressed with existing compression algorithms by a remaining average factor of 4. given the previous history). When one wishes to emphasize the connection. This is the same as considering absolute entropy (corresponding to data compression) as a special case of relative entropy (corresponding to data differencing) with no initial data. The text entry system.5 : 1. with patching producing atarget given a source and a difference. The idea of data compression is deeply connected with statistical inference. Dasher. This equivalence has been used as justification for data compression as a benchmark for "general intelligence". [2] [edit]Data differencing Main article: Data differencing Data compression can be viewed as a special case of data differencing:[3][4] Data differencing consists of producing a difference given a source and a target. is an inverse-arithmetic-coder. Thus. It is estimated that the combined technological capacity of the world to store information . the compressed file corresponding to a "difference from nothing".

The process is reversed upon decompression. or 7 hours of music compressed in the MP3 format at a medium bit rate. Lossy compression depends upon the quality required. reduces the transmission bandwidth and storage requirements of audio data.[5] [edit]Uses [edit]Audio See also: Audio codec Audio data compression. Lossless audio compression produces a representation of digital data that decompresses to an exact digital duplicate of the original audio stream. but when the corresponding content is optimally compressed. either by further compression or for editing. but typically yields files of 5 to 20% of the size of the uncompressed original. Processing of a lossily compressed file for some purpose usually produces a final result inferior to creation of the same . For example.300 exabytes of hardware digits in 2007. Compression ratios are around 50–60% of original size[7]. using methods such as coding.provides 1. The acceptable trade-off between loss of audio quality and transmission or storage size depends upon the application. thereby reducing the space required to store or transmit them. one 640MB compact disc (CD) holds approximately one hour of uncompressed high fidelity music. information redundancy is reduced. similar to those for generic lossless data compression. Lossy audio compression algorithms provide higher compression at the cost of fidelity. unlike playback from lossy compression techniques such as Vorbis and MP3. thereby allowing traditional lossless compression to work more efficiently. are used in numerous audio applications. Audio compression algorithmsare implemented in software as audio codecs. These algorithms almost all rely on psychoacoustics to eliminate less audible or meaningful sounds. Codecs like FLAC. A digital sound recorder can typically store around 200 hours of clearly intelligible speech in 640MB [6].[8] Lossless compression is unable to attain high compression ratios due to the complexity of wave forms and the rapid changes in sound forms. In both lossy and lossless compression. this only represents 295 exabytes of Shannon information. it is desirable to work from an unchanged original (uncompressed or losslessly compressed). less than 2 hours of music compressed losslessly. Shorten and TTA use linear prediction to estimate the spectrum of the signal. When audio files are to be processed. as distinguished from dynamic range compression. pattern recognition and linear prediction to reduce the amount of information used to represent the uncompressed data. Many of these algorithms use convolution with the filter [-1 1] to slightly whiten or flatten the spectrum.

compressed file from an uncompressed original. See list of lossless codecs for a complete list. and TTA. Other formats are associated with a distinct system. Newer ones include Free Lossless Audio Codec (FLAC). or as master copies. In addition to the direct applications (mp3 players or computers). used in Super Audio CD Meridian Lossless Packing. MPEG-4 ALS. digital television. Monkey's Audio. A number of lossless audio compression formats exist. and OptimFROG DualStream. Some audio formats feature a combination of a lossy format and a lossless correction. Shorten was an early lossless format. streaming media on the internet. and increasingly in terrestrial . WavPack. digitally compressed audio streams are used in most video DVDs. The fact that the lossy spectrograms are different from the uncompressed one indicates that they are in fact lossy. Dolby TrueHD. Blu-ray and HD DVD [edit]Lossy audio compression Comparison of acoustic spectrograms of a song in an uncompressed format and various lossy formats. satellite and cable radio. such as:   Direct Stream Transfer. this allows stripping the correction to easily obtain a lossy file. Such formats include MPEG-4 SLS(Scalable to Lossless). Microsoft's Windows Media Audio 9 Lossless (WMA Lossless). but nothing can be assumed about the effect of the changes onperceived quality. Lossy audio compression is used in a wide range of applications. used in DVD-Audio. lossless audio compression is often used for archival storage. Apple's Apple Lossless. In addition to sound editing or mixing.

such as sound editing and multitrack recording. are source-based coders. These coders use a model of the sound's generator (such as the human vocal tract with LPC) to whiten the audio signal (i. Most lossy compression reduces perceptual redundancy by first identifying sounds which are considered perceptually irrelevant. most lossy compression algorithms use transforms such as the modified discrete cosine transform (MDCT) to convert time domain sampled waveforms into a transform domain. by discarding less-critical data. rather than 50 percent to 60 percent). The innovation of lossy audio compression was to use psychoacoustics to recognize that not all data in an audio stream can be perceived by the human auditory system. However. reconstruction of an audio signal using a linear predictor shapes the coder's quantization noise into the spectrum of the target signal. temporal masking—where a signal is masked by another signal separated by time. that is. flatten its spectrum) prior to broadcasts. as a megabyte can store about a minute's worth of music at adequate quality. Once transformed. in some cases. they are very popular with end users (particularly MP3). Those sounds are coded with decreased accuracy or not coded at all. Audibility of spectral components is determined by first calculating a masking threshold.e. This makes lossy compression unsuitable for storing the intermediate results in professional audio engineering applications. Due to the nature of lossy algorithms. [edit]Coding methods In order to determine what information in an audio signal is perceptually irrelevant. such as the linear predictive coding (LPC) used with speech. Models of the human ear-brain combination incorporating such effects are often called psychoacoustic models. Lossy compression typically achieves far greater compression than lossless compression (data of 5 percent to 20 percent of the original stream.. below which it is estimated that sounds will be beyond the limits of human perception. Typical examples include high frequencies. Equal-loudness contours may also be used to weight the perceptual importance of different components. and. typically into the frequency domain. component frequencies can be allocated bits according to how audible they are. Other types of lossy compressors. or sounds that occur at the same time as louder sounds. The masking threshold is calculated using the absolute threshold of hearing and the principles of simultaneous masking—the phenomenon wherein a signal is masked by another signal separated by frequency. LPC may also be thought of as a basic perceptual coding technique. partially masking it. sounds that are very hard to hear. . audio quality suffers when a file is decompressed and recompressed (digital generation loss).

hence their popularity in speech coding for telephony. or interactive applications (such as the coding of speech for digital transmission in cell phone networks). The range of frequencies needed to convey the sounds of a human voice are normally far narrower than that needed for music. As a result. and latency is on the order of 23 ms (46 ms for twoway communication). (Often codecs create segments called a "frame" to create discrete data segments for encoding and decoding. if the coder/decoder simply reduces the number of bits used to quantize the signal). In contrast to the speed of compression. speech can be encoded at high quality using a relatively low bit rate. for example. however. and then code it in a manner that requires a larger segment of data at one time in order to decode. significant delays may seriously degrade the perceived quality. rather than after the entire data stream has been transmitted. In the minimum case.. Latency results from the methods used to encode and decode the data. the data must be decompressed as the data flows. in general. by some combination of two approaches:   Only encoding sounds that could be made by a single human voice. In such applications. This is accomplished. Throwing away more of the data in the signal—keeping just enough to reconstruct an "intelligible" voice rather than the full frequency range of human hearing. Perhaps the earliest algorithms used in speech encoding (and audio data compression in general) were the A-law algorithm and the µ-law algorithm. when there is two-way transmission of data. [edit]History . a large number of samples have to be analyzed in order to implement a psychoacoustic model in the frequency domain. The perceptual models used to estimate what a human ear can hear are generally somewhat different from those used for music. latency is 0 zero samples (e. Some codecs will analyze a longer segment of the data to optimize efficiency. In algorithms such as MP3. [edit]Speech encoding Speech encoding is an important category of audio data compression. such as with a telephone conversation. Not all audio codecs can be used for streaming applications. here latency refers to the number of samples which must be analysed before a block of audio is processed.g. and for such applications a codec designed to stream data effectively will usually be chosen. Time domain algorithms such as LPC also often have low latencies.) The inherent latency of the coding algorithm can be critical. and the sound is normally less complex.Lossy formats are often used for the distribution of streaming audio. which is proportional to the number of operations required by the algorithm.

cost of processing the compression and decompression.Solidyne 922: The world's first commercial audio bit compression card for PC. almost all the radio stations in the world were using similar technology. February 1988. there is a tradeoff between video quality. 20 years later. and system requirements. Most. using the psychoacoustic principle of the masking of critical bands first published in 1967. of the authors in the JSAC edition were also active in the MPEG-1 Audio committee. but combined data streams. clean digital audio for research purposes. The majority of video compression algorithms use lossy compression. this collection documented an entire variety of finished. The world's first commercial broadcast automation audio compression system was developed by Oscar Bonello. if not all.[9] Several of these papers remarked on the difficulty of obtaining good. In practice most video codecs also use audio compression techniques in parallel to compress the separate. Highly compressed video may present visible or distractingartifacts. Most video compression algorithms and codecs combine spatial image compression and temporal motion compensation. While there were some papers from before that time. and the broadcast automation system was launched in 1987 under the name Audicom. Large amounts of data may be eliminated while being perceptually indistinguishable. As in all lossy compression. 1990 A literature compendium for a large variety of audio coding systems was published in the IEEE Journal on Selected Areas in Communications (JSAC). an Engineering professor at the University of Buenos Aires. . Video compression is a practical implementation of source coding in information theory.[10] In 1983. [edit]Video See also: Video codec Video compression uses modern coding techniques to reduce redundancy in video data. masking) techniques and some kind of frequency analysis and backend noiseless coding. manufactured by a number of companies. working audio coders. nearly all of them using perceptual (i. [11] he started developing a practical application based on the recently developed IBM PC computer.e.

(March 2011) . video was stored as an analog signal on magnetic tape. Historically. random access. the complexity of the encoding and decoding algorithms. The compression usually employs lossy data compression. (December 2007) This article or section lacks a single coherent topic. Please help improve this article by adding citations to reliable sources. and a variety of such technologies began to emerge. (October 2009) This article needs additional citations for verification. Motion compensation From Wikipedia. often called macroblocks. These pixel groups or blocks of pixels are compared from one frame to the next and the video compression codec sends only the differences within those blocks. end-to-end delay. flames. flocks of animals. Commonly during explosions. and in some panning shots. Unsourced material may be challenged and removed. the state of the art of compression algorithm design. Unsourced material may be challenged and removed. and a number of other factors. the quantity of the data needed to represent it (also known as the bit rate). robustness to data losses and errors. Please help improve this article by rewording sentences or removing irrelevant information. the high-frequency detail leads to quality decreases or to increases in the variable bitrate. the free encyclopedia This article does not cite any references or sources. Specific concerns may appear on the talk page.Video compression typically operates on square-shaped groups of neighboring pixels. Please help improve the article with a good introductory style. Audio and video call for customized methods of compression. Around the time when the compact disc entered the market as a digital-format replacement for analog audio. There is a complex balance between the video quality. ease of editing. Please help improve this article by adding citations to reliable sources. (April 2011) A video codec is a device or software that enables video compression and/or decompression for digital video. it became feasible to also begin storing and using video in digital form. In areas of video with more motion. the compression must encode more data to keep up with the larger number of pixels that are changing. Engineers and mathematicians have tried a number of solutions for tackling this problem. Video codec From Wikipedia. the free encyclopedia This article provides insufficient context for those unfamiliar with the subject.

Another difference between intraframe and interframe compression is that with intraframe systems. If the frame contains areas where nothing has moved. When images can be accurately synthesised from previously transmitted/stored images. effectively being image compression.Motion compensation is an algorithmic technique employed in the encoding of video data for video compression. into the next one. [edit]Encoding theory Video data may be represented as a series of still image frames. Making 'cuts' in intraframe-compressed video is almost as easy as editing uncompressed video: one finds the beginning and ending of each frame. in a manner similar to those used in JPEG image compression. the system simply issues a short command that copies that part of the previous frame. or darken the copy: a longer command. uncompressed video. lighten. such as DV. the following frames cannot be reconstructed properly. Motion compensation describes a picture in terms of the transformation of a reference picture to the current picture. if the original frame is simply cut out (or lost in transmission). For example. Compression algorithms can average a color across these similar areas to reduce space. In most . The most commonly used method works by comparing each frame in the video with the previous one. rotate. but still much shorter than intraframe compression.[12] Some of these methods are inherently lossy while others may preserve all relevant information from the original. or by using perceptual features of human vision. the compression efficiency can be improved. The reference picture may be previous in time or even from the future. for example in the generation of MPEG-2 files. One of the most powerful techniques for compressing video is interframe compression. while intraframe compression uses only the current frame. the compressor emits a (slightly longer) command that tells the decompresser to shift. If sections of the frame move in a simple manner. Because interframe compression copies data from one frame to another. bit-for-bit. small differences in color are more difficult to perceive than are changes in brightness. The sequence of frames contains spatial and temporal redundancy that video compression algorithms attempt to eliminate or code in a smaller size. Similarities can be encoded by only storing differences between frames. and simply copies bit-for-bit each frame that one wants to keep. and discards the frames one doesn't want. Interframe compression works well for programs that will simply be played back by the viewer. compress each frame independently using intraframe compression. Some video formats. but can cause problems if the video sequence needs to be edited. each frame uses a similar amount of data. Interframe compression uses one or more earlier or later frames in a sequence to compress the current frame.

) . but are typically not used in practical products (except for the use of wavelet coding as still-image coders without motion compensation). nearly all commonly used video compression methods (e. Videotelephony 1993 MPEG-1 Part 2 ISO. Other methods.261 ITU-T Videoconferencing. Blu-ray. such as fractal compression. History of Video Compression Standards Year Standard Publisher Popular Implementations 1984 H. Today. IEC. Videotelephony. certain frames (such as "I frames" in MPEG-2) aren't allowed to copy data from other frames. this process demands a lot more computing power than editing intraframe compressed video with the same picture quality. Interest in fractal compression seems to be waning.120 ITU-T 1990 H.interframe systems. matching pursuit and the use of a discrete wavelet transform (DWT) have been the subject of some research. those in standards approved by the ITU-T or ISO) apply a discrete cosine transform (DCT) for spatial redundancy reduction. and so require much more data than other frames nearby. SVCD 1996 H.263 ITU-T Videoconferencing. IEC Video on Internet (DivX.. IEC Video-CD 1995 H. However. This has allowed newer formats like HDV to be used for editing..262/MPEG-2 Part ISO. Digital Video Broadcasting.g.[citation needed] [edit]Timeline The following table is a partial history of international video compression standards. It is possible to build a computer-based video editor that spots problems caused when I frames are edited out while other frames need them. Video on Mobile Phones (3GP) 1999 MPEG-4 Part 2 ISO. due to recent theoretical analysis showing a comparative lack of effectiveness of such methods. Xvid .. ITU-T 2 DVD Video.

g.  Speech enhancement: enhancing the intelligibility and/or perceptual quality of a speech signal. Speaker recognition. where the aim is to recognize the identity of the speaker. information extraction techniques.[clarification needed] It is also closely tied to natural language processing (NLP). is important in the telecommunication area. E. Speech coding. text-to-speech synthesis may use a syntactic parser on its input text and speech recognition's output may be used by e. iPod Video. ITU-T Blu-ray. UHDTV Speech processing From Wikipedia.264/MPEG-4 AVC ISO. Digital Video Broadcasting. a specialized form of data compression. as its input can come from / output can go to NLP applications.2003 H.g.  Speech synthesis: the artificial synthesis of speech. . HDTV broadcast. BBC Video on Internet. IEC. The signals are usually processed in a digital representation. which usually means computer-generated speech. so speech processing can be regarded as a special case of digital signal processing. like audio noise reduction for audio signals. HD DVD 2008 VC-2 (Dirac) ISO. Speech processing can be divided into the following categories:    Speech recognition. which deals with analysis of the linguistic content of a speech signal. applied to speech signal. such as analysis of vocal loading and dysfunction of the vocal cords.  Voice analysis for medical purposes. the free encyclopedia Speech processing is the study of speech signals and the processing methods of these signals.

The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software. This article needs additional citations for verification. andaircraft (usually termed Direct Voice Input). Speech recognition applications include voice user interfaces such as voice dialing (e. "Hello. the free encyclopedia For the human linguistic concept.. computer speech recognition. (February 2011) This article may need to be rewritten entirely to comply with Wikipedia's quality standards. Speaker recognition From Wikipedia. You can help. Unsourced material may be challenged and removed. search (e..g. "I would like to make a collect call").g. simple data entry (e. Speech recognition is a broader solution that refers to technology that can recognize speech without being targeted at single speaker—such as acall system that can recognize arbitrary voices. a radiology report). Please help improve this article by adding citations to reliable sources. the free encyclopedia This article needs additional citations for verification. Recognizing the speaker can simplify the task of translating speech. e. (February 2011) The display of the Speech Recognition screensaver on a PC..g.g. preparation of structured documents (e.g. speech to text... see Speech perception. e. domotic appliance control. call routing (e. find a podcast where particular words were spoken).Speech recognition From Wikipedia. Unsourced material may .g. or just STT) converts spoken words to text. Please help improve this article by adding citations to reliable sources." Speech recognition (also known as automatic speech recognition..g. word processors or emails). The discussion page may contain suggestions. in which the character responds to questions. speech-to-text processing (e. "Where are you?" or statements. "Call home").g. entering a credit card number).

Please help improve it by rewriting it in an encyclopedic style. the free encyclopedia This article does not cite any references or sources. Voice recognition is combination of the two where it uses learned aspects of a speakers voice to determine what is being said. These two terms are frequently confused. Please help improve this article by adding citations to reliable sources.g. In addition. (November 2007) Voice recognition redirects here. Speech coding From Wikipedia. Speaker verification has earned speaker recognition its classification as a "behavioral biometric"... (November 2011) Speech coding is the application of data compression of digital audio signals containing speech. there is a difference between speaker recognition (recognizing who is speaking) and speaker diarisation (recognizing when the same speaker is speaking). see Speech recognition. voice pitch. The techniques used in speech coding are similar to that in audio data compression and audio coding where knowledge in psychoacoustics is used to transmit only data that is relevant to the human . Speaker recognition[1] is the computing task of validating a user's claimed identity using characteristics extracted from their voices . (December 2007) This article is written like a personal reflection or essay rather than an encyclopedic description of the subject. size and shape of the throat and mouth) and learned behavioral patterns (e. combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream. the system cannot recognize speech from random speakers very accurately. Unsourced material may be challenged and removed. but it can reach high accuracy for individual voices for which it has been trained. There is a difference between speaker recognition (recognizing who is speaking) and speech recognition (recognizing what is being said). Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech challenged and removed. These acoustic patterns reflect both anatomy (e. as isvoice recognition.g. The two most important applications of speech coding are mobile telephony and Voice over IP. For software that converts speech to text. speaking style). there is a difference between the act of authentication (commonly referred to as speaker verification or speaker authentication) and identification. Speaker recognition has a history dating back some four decades and uses the acoustic features of speech that have been found to differ between individuals. Finally.

711. make these very simple instantaneous compression algorithms acceptable for speech. In addition. timbre etc. which has a scalable structure. Although this would generate unacceptable distortion in a music signal. Speech coding differs from other forms of audio coding in that speech is a much simpler signal than most other audio signals. since it is possible that degraded speech is completely intelligible. and a lot more statistical information is available about the properties of speech.711) used in traditional PCM digital telephony can be seen as a very early precursor of speech encoding. the A-law/μ-law algorithms were chosen by the designers of the early digital telephony systems. besides the actual literal content. In 2008. but after careful consideration. that are all important for perfect intelligibility. and there has been no need to replace them in the stationary phone network. also speaker identity. combined with the simple frequency structure of speech as a periodic waveform having a single fundamental frequency with occasional added noise bursts. (February 2011) . Please help improve this article by adding citations to reliable sources. with a constrained amount of transmitted data. emotions. the peaky nature of speech waveforms. The intelligibility of speech includes. the A-law and μ-law algorithms (G. was standardized by ITU-T.1 codec. The more abstract concept of pleasantness of degraded speech is a different property than intelligibility. only information in the frequency band 400 Hz to 3500 Hz is transmitted but the reconstructed signal is still adequate for intelligibility. Their audio performance remains acceptable. For example. As a result. but subjectively annoying to the listener. A wide variety of other algorithms were tried at the time. requiring only 8 bits per sample but giving effectively 12 bits of resolution. some auditory information which is relevant in audio coding can be unnecessary in the speech coding context. Unsourced material may be challenged and removed. In speech coding. G.auditory system. their 33% bandwidth reduction for a very low complexity made them an excellent engineering compromise. the free encyclopedia This article needs additional citations for verification. in voiceband speech coding. most speech applications require low coding delay. intonation. Voice analysis From Wikipedia. the most important criterion is preservation of intelligibility and "pleasantness" of speech. mostly variants on delta modulation. The input sampling rate is 16 kHz. [edit]Sample companding viewed as a form of speech coding From this viewpoint. At the time of their design. as long coding delays interfere with speech interaction.

but also speaker identification. such as in speech recognition. Less invasive imaging methods such as x-rays or ultrasounds do not work because the vocal cords are surrounded by cartilage which distort image quality. and the laryngeal musclature is intensely active during speech or singing and is subject to tiring. the process of speaking exerts a load on the vocal cords where the tissue will suffer from tiring. invasive measurement of movement. In order to objectively evaluate the improvement in voice quality there has to be some measure of voice quality. since the folds are subject to collision forces with each vibratory cycle and to drying from the air being forced through the small gap between them. The vocal cords of a person speaking for an extended period of time will suffer from tiring. Such studies include mostly medical analysis of the voice i. analysis of the voice of patients who have had a polyp removed from his or her vocal cords through an operation. Another active research topic in medical voice analysis is vocal loading evaluation. The location of the vocal folds effectively prohibits direct. Movements in the vocal cords are rapid. Contents [hide] • • • • 1 Typical voice problems 2 Analysis methods 3 External links 4 See also [edit]Typical voice problems A medical study of the voice can be. a fiberoptic probe leading to the camera . dynamic analysis of the vocal folds and their movement is physically difficult. Among professional voice users (i.e. and high-speed videos provide an option but in order to see the vocal folds. for instance. An experienced voice therapist can quite reliably evaluate the voice. thus preventing usage of ordinary video. that is. fundamental frequencies are usually between 80 and 300 Hz. [edit]Analysis methods Voice problems that require voice analysis most commonly originate from the vocal folds or the laryngeal musculature that controls them. sales people) this tiring can cause voice failures and sick leaves. Stroboscopic.phoniatrics.e. some believe that the truthfulness or emotional state of speakers can be determined using Voice Stress Analysis orLayered Voice Analysis. teachers. but this requires extensive training and is still always subjective. To evaluate these problems vocal loading needs to be objectively measured. However.Voice analysis is the study of speech sounds for purposes other than linguistic content. More controversially.

but can provide useful indirect evidence of that movement. The most important indirect methods are currently inverse filtering of either microphone or oral airflow recordings and electroglottography (EGG). In addition. In inverse filtering. It thus yields one-dimensional information of the contact area. The other kind of noninvasive indirect indication of vocal fold motion is the electroglottography. This method produces an estimate of the waveform of the glottal airflow pulses. the free encyclopedia See also: Speech generating device Stephen Hawking is one of the most famous people using speech synthesis to communicate . which makes speaking difficult. Neither inverse filtering nor EGG are sufficient to completely describe the complex 3-dimensional pattern of vocal fold movement. In addition. placing objects in the pharynx usually triggers a gag reflex that stops voicing and closes the larynx. stroboscopic imaging is only useful when the vocal fold vibratory pattern is closely periodic. Speech synthesis From Wikipedia. which in turn reflect the movements of the vocal folds. the speech sound (the radiated acoustic pressure waveform. in which electrodes placed on either side of the subject's throat at the level of the vocal folds record the changes in the conductivity of the throat according to how large a portion of the vocal folds are touching each other. as obtained from a microphone) or the oral airflow waveform from a circumferentially vented (CV) mask is recorded outside the mouth and then filtered by a mathematical method to remove the effects of the vocal tract.has to be positioned in the throat.

Many computer operating systems have included speech synthesizers since the early 1980s.4 HMM-based synthesis 3.1 Concatenative synthesis    o o o o • o o o o 3. A computer system used for this purpose is called a speech synthesizer.[2] The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood.1 Unit selection synthesis 3.3 Evaluation challenges 4.2 Text-to-phoneme challenges 4. but may lack clarity.Speech synthesis is the artificial production of human speech. Contents [hide] • • o • o 1 Overview of text processing 2 History 2. a synthesizer can incorporate a model of thevocal tract and other human voice characteristics to create a completely "synthetic" voice output. other systems render symbolic linguistic representations like phonetic transcriptions into speech.1 Text normalization challenges 4. the storage of entire words or sentences allows for high-quality output.2 Formant synthesis 3. and can be implemented in software or hardware.1 Electronic devices 3 Synthesizer technologies 3.3 Domain-specific synthesis 3.1. For specific usage domains.1.5 Sinewave synthesis 4 Challenges 4.1. Systems differ in the size of the stored speech units. Alternatively. A text-tospeech (TTS) system converts normal language text into speech. [1] Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. a system that stores phones or diphones provides the largest output range.4 Prosodics and emotional content .3 Articulatory synthesis 3. An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written works on a home computer.2 Diphone synthesis 3.

• • o o o o o o o • • • • • 5 Dedicated hardware 6 Computer operating systems or outlets with speech synthesis 6.7 Others 7 Speech synthesis markup languages 8 Applications 9 See also 10 References 11 External links [edit]Overview of text processing Overview of a typical TTS system Sample of Microsoft Sam Microsoft Windows XP's default speech synthesizer voice saying "The quick brown fox jumps over the lazy dog 1.2 Apple 6.567.4 Microsoft Windows 6. .1 Atari 6.890 times.234.6 Internet 6.3 AmigaOS 6. soi" Problems listening to this file? See media help.5 Android 6.

teleconferencing systems . Advancing technology has allowed images and audio to . and used for many applications such as mobile phones. The back-end—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound. pre-processing. The front-end has two major tasks. Enhancing of speech degraded by noise. The objective of enhancement is improvement in intelligibility and/or overall perceptual quality of degraded speech signal using audio signal processing techniques. this part includes the computation of the target prosody (pitch contour. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-tophoneme conversion.[4] which is then imposed on Speech enhancement From Wikipedia. This process is often called text normalization. or tokenization. In certain systems. to protect messages in text form. speech recognition. it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. or noise reduction. the primary use of encryption has been. and divides and marks the text into prosodic units.A text-to-speech system (or "engine") is composed of two parts:[3] a front-end and aback-end. The front-end then assigns phonetic transcriptions to each word. phoneme durations). First. like phrases. and hearing aids [1]. the free encyclopedia Speech enhancement aims to improve speech quality by using various algorithms. clauses. and sentences. Contents [hide] • • • • 1 Algorithms 2 See also 3 References 4 External links The Representation of Speech Historically. VoIP. is the most important field of speech enhancement. of course.

100 16-bit signed integers for every second of sound. For comparison purposes. and only frequencies of up to 3000 cycles per second (or 3000 Hertz) are required.00 11. Negative numbers are often indicated in floating-point notation by making the sign bit a 1 without changing any other part of the number. For many communications applications. which is used in the JPEG (Joint Photographic Experts Group) file format. and they are represented by a type of floating-point notation to allow one byte to represent an adequate range of levels. which requires (because of a mathematical law called the Nyquist theorem) 6000 samples of the level of the audio signal (after it has been bandlimited to the range of frequencies to be reproduced. although other conventions are used as well. Simple floating-point notation. A Compact Disc stores stereo musical recordings in the form of two digital audio stored and communicated in digital form. the resulting electrical signal has a value that changes over time.0000 The sign bit is always shown as 0. which indicates a positive number. Only a single audio channel is used. might look like this: S 0 0 0 0 0 0 0 0 EE 11 11 10 10 01 01 00 00 MMMMM 11111 10000 11111 10000 11111 10000 11111 10000 1111. .1111 1. otherwise aliasing may result) to be taken each second.400 bytes per second. the same level of fidelity is not required.11 100.1 1000. oscillating between positive and negative.000 1. When sound is converted to an analogue electrical signal by an appropriate transducer (a device for converting changing levels of one quantity to changing levels of another) such as a microphone. samples of audio waveforms are one byte in length. each one containing 44. For transmitting a telephone conversation digitally. This leads to a total data rate of 176. for an eight-bit byte. the floating-point notations shown have all been scaled so that 1 represents the smallest nonzero number that can be indicated.111 10.0 111. A particularly effective method of compressing images is the Discrete Cosine Transform.

S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EE 11 11 10 10 01 01 00 00 00 00 00 00 00 00 00 MMMMM 11111 10000 11111 10000 11111 10000 11111 10000 01111 01000 00111 00100 00011 00010 00001 11111000 10000000 1111100 1000000 111110 100000 11111 10000 1111 1000 111 100 11 10 1 Another way of making a floating-point representation more efficient involves noting that.aaaa Here. Instead of using gradual underflow. the variable bits of the mantissa are noted by aaaa.aa 1a. instead of being represented as all ones in one line. This could produce a result like this: S 0 0 0 0 0 0 0 0 EEE 111 110 101 100 011 010 001 000 MMMM aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa 1aaaa000 1aaaa00 1aaaa0 1aaaa 1aaa. With gradual underflow. that bit is only allowed to be zero for one exponent value. one could use the basic floating-point representation we started with.a 1aa. in the first case. but simply omit the bit that is always equal to 1. where an unnormalized mantissa is permitted for the smallest exponent value. for both compactness and clarity.One way the range of values that can be represented can be extended is by allowing gradual underflow. to behave . the first mantissa bit (the field of a floating-point number that represents the actual number directly is called the mantissa because it would correspond to the fractional part of the number's logarithm to the base used for the exponent) is always one. This is achieved by reserving a special exponent value. Today's personal computers use a standard floating-point format that combines gradual underflow with suppressing the first one bit in the mantissa. the lowest one. and all zeroes in a following line.

In this way. for that exponent value. but treats the degree of unnormalization of the mantissa as the most significant part of the exponent field. It works like this (the third column shows an alternate version of this format. This retains the first one bit in the mantissa. another possibility is to also invert all the other bits in the number. does not have its first one bit suppressed. Another method of representing floating point quantities efficiently is something I call extremely gradual underflow. for some of the simpler floating-point formats. and the mantissa. that format can be coded so as to allow . an integer comparison instruction can also be used to test if one floating-point number is larger than another. to be explained below): S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EE 11 10 01 00 11 10 01 00 11 10 01 00 11 10 01 00 11 10 01 00 MMMMM 1aaaa 1aaaa 1aaaa 1aaaa 01aaa 01aaa 01aaa 01aaa 001aa 001aa 001aa 001aa 0001a 0001a 0001a 0001a 00001 00001 00001 00001 1aaaa000000000000000 1aaaa00000000000000 1aaaa0000000000000 1aaaa000000000000 1aaa000000000000 1aaa00000000000 1aaa0000000000 1aaa000000000 1aa000000000 1aa00000000 1aa0000000 1aa000000 1a000000 1a00000 1a0000 1a000 1000 100 10 1 S 0 0 0 0 S 0 0 0 0 S 0 0 0 0 S 0 0 0 0 S 0 0 0 0 M 1 1 1 1 MM 01 01 01 01 MMM 001 001 001 001 MMMM 0001 0001 0001 0001 MMMMM 00001 00001 00001 00001 EE 11 10 01 00 EE 11 10 01 00 EE 11 10 01 00 EE 11 10 01 00 EE 11 10 01 00 MMMM aaaa aaaa aaaa aaaa MMM aaa aaa aaa aaa MM aa aa aa aa M a a a a Although usually a negative number is indicated simply by setting the sign bit to 1. However. That exponent value is required to multiply the mantissa by the same amount as the next higher exponent value (instead of a power of the radix that is one less).differently from the others. This definitely will not work for the complicated extremely gradual underflow format as it is shown here.

used in the United States and Japan (and. if the table above is continued.1 0. for very small numbers the idea of allowing the exponent field to shrink suggests itself. the convention for representing the sign of numbers is different. we obtain: S 0 0 0 0 EE 11 10 01 00 MMMMM 00001 00001 00001 00001 1000 100 10 1 0. The following table illustrates these formats. Thus. When this is done. This is the format shown in the third column above. and which is also sometimes used for satellite audio transmissions. Canada as well). suppressed.001 S 0 0 0 0 MMMMM 00001 00001 00001 00001 EE 11 10 01 00 N/A N/A N/A S MMMMMM E 0 000001 1 0 000001 0 S MMMMMMM 0 0000001 Something very similar is used to represent sound signals in 8-bit form using the A-law. Mu-law encoding. which is always a 1 when the exponent is a power of two. as follows: the exponent field can be made movable. and it can be placed after the first 1 bit in the mantissa field.this to work. which is the standard for European microwave telephone transmission. However.01 0. I would suspect. with the first bit of the mantissa. instead operates as a conventional floating-point format. with capital letters indicating bits that are complemented: Linear value Bit Mu-Law Point Gradual Underflow Extremely Gradual A-Law (1) Suppressed Bit A-Law (2) Underflow Floating-Point with Floating-Point 0 1 11 aaaa 1111aaaa 0 1 10 aaaa 1110aaaa 0 1 01 aaaa 1101aaaa 0 1 00 aaaa 1100aaaa 1111aaaa 1110aaaa 1101aaaa 1100aaaa Suppressed Floating- +1aaaa000000000000000000 1000AAAA 0 111 aaaa +01aaaa00000000000000000 1001AAAA 0 110 aaaa +001aaaa0000000000000000 1010AAAA 0 101 aaaa +0001aaaa000000000000000 1011AAAA 0 100 aaaa 0 111 aaaa 0 110 aaaa 0 101 aaaa 0 100 aaaa .

this seems likely. or I had misconstrued it. It may be that the floating-point encoding used with Mu-Law encoding is applied not to the input signal value. that Mu-Law encoding acts on 13-bit values. . even if there is a onebit discrepancy in both cases. and A-Law encoding acts on 24-bit values. which is at least one less bit than for Mu-Law encoding. to be bizarre. The third column indicates what other sources appear to give for A-Law encoding. it appears to me. in comparison to standards for high-quality digital audio. but to its logarithm. or it may be that my original source for information on A-Law encoding either was not accurate. most descriptions of A-Law encoding and Mu-Law encoding state that it is Mu-Law encoding that has the greater dynamic range. as using 24-bit digitization as the first step in digitizing a telephone conversation appears.+00001aaab00000000000000 0 1100AAAB 0 011 aaab +000001aaab0000000000000 0 1101AAAB 0 010 aaab +0000001aaab000000000000 0 1110AAAB 0 001 aaab +00000001aaab00000000000 0 1111AAAB 0 000 1aaa +000000001aa000000000000 0 0 000 01aa 100001aa +0000000001ab00000000000 0 0 000 001a 1000001a +00000000001aa0000000000 0 0 000 0001 10000001 +000000000001aa000000000 0 +0000000000001a000000000 0 +00000000000001a00000000 0 +000000000000001a0000000 0 +0000000000000001a000000 0 +00000000000000001000000 0 +00000000000000000100000 0 +00000000000000000010000 0 +00000000000000000001000 0 +00000000000000000000100 0 +00000000000000000000010 0 +00000000000000000000001 0 +0 10000000 -0 01111111 -00000001aaab00000000000 0111AAAB 1 000 1aaa -1aaaa000000000000000000 0000AAAA 1 111 aaaa 01 11 aaa 1011aaab 01 10 aaa 1010aaab 01 01 aaa 1001aaab 01 00 aaa 10001aaa 001 11 aa 001 10 ab 001 01 aa 001 00 aa 0001 11 a 0001 10 a 0001 01 a 0001 00 a 00001 11 00001 10 00001 01 00001 00 000001 1 000001 0 0000001 10111aaa 10110aaa 10101aaa 10100aaa 100111aa 100110ab 100101aa 100100aa 1000111a 1000110a 1000101a 1000100a 10000111 10000110 10000101 10000100 10000011 10000010 10000001 10000000 01111111 0 011 aaab 0 010 aaab 0 001 aaab 0 000 aaab 1 01 00 aaa 00001aaa 1 1 11 aaaa 0111aaaa 00100aaa 0111aaaa 1 000 aaab 1 111 aaaa Usually. acting on 14-bit values while A-Law encoding acts on 13-bit values. and this does cause it to act on 12-bit values (including the sign bit). as shown on the diagram.

In the latter part of World War II. the United States developed a highly secure speech scrambling system which used the vocoder principle to convert speech . refer to this kind of technique. which are to some extent processed separately. where the difference between consecutive samples. Other techniques of compressing audio waveforms include delta modulation. adaptive pulse code modulation. which means linear predictive coding. rather than the samples themselves. work by dividing the audio spectrum up into "critical bands". would be that of a 56-bit integer. but instead to a method that can very effectively reduce the amount of data required to transmit a speech signal. if this method. the difference between the softest and loudest signals Philips' DCC (Digital Compact Cassette). if the two sounds are in different parts of the frequency spectrum.tutorial. There was a good page about Linear Predictive Coding at the page http://asylum.Also. One problem with using floating-point representations of signals for digital high-fidelity audio . This is why some methods of music compression. Transmitting 6000 bytes per second is an improvement over 176. and assigning the available codes for levels for the current sample symmetrically around the extrapolated point. with a two-bit exponent. would have the same precision as a 14-bit signed integer.sf. the result. The term LPC. But the dynamic range. A technique called ADPCM. because it is based on the way the human vocal tract forms speech sounds. but it is still a fairly high data rate.html but that URL is no longer valid. 13 bits of mantissa. are transmitted. requiring a transmission rate of and today's popular MP3 audio format.400 bytes per second. works by such methods as extrapolating the previous two samples in a straight that the human ear can still hear relatively faint sounds while another sound is present.000 baud. Many early digital audio systems used 14 bits per sample rather than 16 bits. for the loudest signals. as it might seem.although this particular format seems precise enough to largely make that problem minor . were used for encoding audio signals with 16 bits per sample. such as those used with Sony's MiniDisc format. does not.

was determined for periods of one fiftieth of a second. or white a digital format.147. The fundamental frequency of the speaking voice was represented by 35 codes. Either a waveform with the frequency of the fundamental. The intensities of sound in the bands indicated both the loudness of the fundamental signal. [ .176. and an article by David Kahn in the September. 1984 issue of Spectrum described it. The records used by the two stations communicating were kept synchronized by the use of quartz crystal oscillators where the quartz crystals were kept at a controlled temperature. and the result was transmitted using the spread-spectrum technique.400-baud signal or to a 4. This uses only two-thirds of the capacity of a 2. was used as the source of the reconstructed sound in the reciever.400-baud modem.782. was mediocre. The sound quality this provided. This was also sampled fifty times a second. This loudness was represented by one of six levels. This format was then enciphered by means of a one-timepad. containing a signal which had six distinct levels.483. which is 2. 50 times a second. known as CELP. This involved the transmission of twelve base-6 digits. Speech was converted for transmission as follows: The loudness of the portion of the sound in each of ten frequency bands. and it was then filtered in the ten bands to match the observed intensities in these bands. and is quite a moderate data rate. A standard for linear predictive coding. and a full set of harmonics.648. The one-time-pad was in the form of a phonograph record. this roughly corresponds to transmitting 200 bytes a second.800-baud signal.336. which is just over 2^31. The system was called SIGSALY. comes in two versions which convert the human voice to a 2. on average 280 Hz in width (ranging from 150 Hz to 2950 Hz). a 36th code indicated that a white noise source should be used instead in reconstructing the voice. Since 6 to the 12th power is 2. however. and the resonance of the vocal tract with respect to those harmonics of the fundamental signal that fell within the band.

working in the audio frequency range. passes all frequencies. in its most basic form. Being a frequency dependent amplifier. an audio filter is designed to amplify. pass band attenuation. sound effects. which pass through frequencies below their cutoff frequencies. and stop band attenuation. the filters are designed according to a set of objective criteria such as pass band. pass or attenuate (negative amplification) some frequency ranges. and progressively attenuates frequencies above the cutoff frequency. Common types include low-pass. Please help improve this article by adding citations to reliable sources. A band-reject filter. synthesizers. 0 Hz to beyond 20 kHz. an audio filter can provide a feedback loop. such as in the design of graphic equalizers or CD players. stop band. passing high frequencies above the cutoff frequency. and the stop bands are the frequency ranges for which the audio must be attenuated by a specified minimum. CD players and virtual reality systems. attenuates frequencies between its two cutoff frequencies. the free encyclopedia Digital domain parametric equalisation This article does not cite any references or sources. and progressively attenuating frequencies below the cutoff frequency. Unsourced material may be challenged and removed. . A bandpass filter passes frequencies between its two cutoff frequencies. Many types of filters exist for applications including graphic equalizers. but affects the phase of any given sinusoidal component according to its frequency. Audio filters can also be designed to provide gain (boost) as well as attenuation. while passing those outside the 'reject' range.(December 2009) An audio filter is a frequency dependent amplifier circuit. while attenuating those outside the range. An all-pass filter. which introduces resonance (ringing) alongside attenuation. where the pass bands are the frequency ranges for which audio is attenuated less than a specified maximum. In some applications.Audio filter From Wikipedia. A high-pass filter does the opposite. In more complex cases.

[edit]See  also Audio crossover . such as with synthesizers or sound effects. or harmonic content of an audio signal. Generically.In other applications. [edit]Self oscillation Not to be confused with Self-exciting oscillation. Self oscillation occurs when the resonance or Q factor of the cutoff frequency of the filter is set high enough that the internal feedback causes the filter circuitry to become a sine tonesine wave oscillator. the aesthetic of the filter must be evaluated subjectively. the term 'audio filter' can be applied to mean anything which changes the timbre. Audio filters can be implemented in analog circuitry as analog filters or in DSP code or computer software as digital filters.