You are on page 1of 11

# German University in Cairo Advanced Media Lab, 9th Semester DMET904 Advanced Computer Lab, 9th Semester CSEN

903 Experiment 1 Preparation

1 Experiment 1: Audio Compression and Psychoacoustics
1.1 Objective
This experiment aims to give an introduction to Sampling, Audio Compression techniques, psychoacoustic model.

1.2 Pre-requisites
 General programming and UNIX skills.

1.3 References
[1] An Introduction to Compressed Audio with Ogg Vorbis – Graham Mitchell http://grahammitchell.com/writings/vorbis_intro.html [2] Documentation on Vorbis Ogg: http://www.vorbis.com/faq/ [3] Documentation on Audacity: http://audacity.sourceforge.net/ [4] Other material about Digital Audio Compression, Mp3 and Lame [5] Collection of audio test files: http://www.dogstar.dantimax.dk/testwavs/

1.4 Theoretical Background
Music and Waveforms
Music is made up of waves. Everything can be heard is because something is vibrating and creating sound waves. When a violin player bows a string, the strings are vibrating at a certain frequency. In a trumpet, a column of air is vibrating. With an electric guitar, the strings are vibrating and send a signal through the amplifier. When human speaks or sings, the vocal cords are vibrating. All of these things generate sound waves. The wave travels through the air and hits the eardrum and causes it to vibrate. The brain interprets the signals coming from the eardrum and that is how human "hears" a sound.

Figure 1: Properties of waves

1/11

The larger is the amplitude of a wave. Frequency is measured in hertz (Hz). Amplitude is measured in decibels (dB). The sampling rate . The frequency of a wave determines its pitch. 9th Semester CSEN 903 Experiment 1 Preparation The properties of the wave affect how it sounds. there is a sound wave to be sampled. 100 dB(A) 120 dB(A) 140 dB(A) susurration employers' liability insurance association recommends ear protection ear protection mandatory for employers maximum allowed volume in concerts (DIN 15905-5) permanent damage to the ears after longer exposure level of pain Digital Audio Sampling The goal of digital recording technology is to create a recording with very high fidelity (similarity between the original signal and the reproduced signal) and perfect reproduction.. there are two variables to be controlled: 1.. the louder is the volume. The decibel range for human hearing is quite large. and low frequency waves have a low pitch. The amplitude of a wave refers to half the distance between a wave's highest point and its lowest. or number of cycles per second. To accomplish these two goals.how many samples are taken per second 2/11 .. Here is a typical wave: Figure 2: Sound wave to be sampled When the wave is sampled with an analog-to-digital converter (ADC). Some examples: 30 dB(A) 80 dB(A) 85 dB(A) 99 . The conversion is done by sampling the analog wave and then quantizing the result. High frequency waves have a high pitch.German University in Cairo Advanced Media Lab. 20 kHz. 20 Hz to roughly 16 . For example. The frequency of a wave refers to how many times per second the wave transitions from its highest point to its lowest point and back again. digital recording converts the analog wave into a stream of numbers and records the numbers instead of the wave. The average human can hear frequencies from 15 . 9th Semester DMET904 Advanced Computer Lab.

The sampling precision . the ADC looks at the wave and picks the closest number between 0 and 9. 9th Semester CSEN 903 Experiment 1 Preparation 2. In the following figure. let's assume that the sampling rate is 1. both the rate and the precision have been improved by a factor of 2 (20 gradations at a rate of 2. The sampling error can be reduced by increasing both the sampling rate and the precision.how many different gradations (quantization levels) are possible when taking the sample In the following figure. as the result is the blue line shown in the following figure: Figure 4: Sampling output It can be seen that the blue line lost quite a bit of the detail originally found in the red line. The number chosen is shown along the bottom of the figure.000 per second and the precision is 10: Figure 3: Sampling Rate 1 kHz The green rectangles represent samples. When the DAC recreates the wave from these numbers.German University in Cairo Advanced Media Lab. Every one-thousandth of a second. and that means the fidelity of the reproduced wave is not very good. These numbers are a digital representation of the original wave. This is called the sampling error. 9th Semester DMET904 Advanced Computer Lab.000 samples per second): 3/11 .

the rate and the precision have been doubled again (40 gradations at 4. By the Nyquist Theorem. as the rate and precision increase. what should be the sampling frequency for human speech? 4/11 . 20 kHz) and also because that is how much information could be stored on a video tape. The frequency range of human voice is typically between 500Hz to 4000Hz. In the 1970's. the sampling rate (number of samples per second) must be at least twice as high as the highest recorded frequency. was chosen because it exceeded the target sample rate of 40 kHz (twice the highest frequency humans can hear. or thousands of samples per second. A sample rate of 44. used to sample audio with CD quality -.1 kHz). Bit rates vs.100 samples per second (44. 9th Semester DMET904 Advanced Computer Lab. It is measured in kilohertz. the fidelity (the similarity between the original wave and the DAC's output) improves. 9th Semester CSEN 903 Experiment 1 Preparation Figure 5: Sampling rate 2 kHz In the following figure. the first storage medium of choice. Sample rates A sample rate measures the frequency with which the signal is stored. Philips and Sony began looking for a way to improve audio quality for recorded music.the highest quality can be purchased.000 samples per second): Figure 6: Sampling rate 4 kHz As can be seen.German University in Cairo Advanced Media Lab. According to Nyquist Theorem.

in your car or in your portable music player. Each sample is 16 bits. A personal computer could have the storage space. If someone’s music collection consists of 1307 songs. or 1411 kbps. ranging from -32. and since modern music is recorded in stereo.768 to 32. As shown above. but it adds up quickly. This range of values for the amplitude allows subtle volume differences to be accurately represented.411. 9th Semester CSEN 903 Experiment 1 Preparation Figure 7: CD-quality Audio sampling rate Each "sample" is a 16-bit number. PCM digital audio produces quite an accurate picture of the "live" sound. Why is this not currently feasible? The answer is size. 5/11 . This number indicates the amplitude of the wave at the instant of sampling. an uncompressed CD-quality uses 176. This results in (2 * 44100 * 2) = 176. A little math can reveal the space required to store sound information at CD quality. Sampling audio in this digital fashion is known as Pulse Code Modulation (PCM).584. 9th Semester DMET904 Advanced Computer Lab. Thus an uncompressed CD-quality uses 176. The Size Problem It is possible to "rip" the audio data from a music CD and store it into "WAV" files on a computer.423 minutes and 23 seconds (over three days and eighteen hours) and it would require an estimated 53 gigabytes of hard drive space to store in perfect CDquality.400 bytes to store one second of the sample or 10. given many computers have hard drives holding tens and even hundreds of gigabytes. Bit rate is a measure of the amount of data stored for every second of audio. The ultimate size of the file is driven by its bit rate. but most portable music players do not have this much space. and is the most popular method of digital sampling. This is roughly 1411 kilobits per second. there is both a left and a right channel.768 to 32. so these files can be played back on demand. or two bytes.200 bits to store each second.German University in Cairo Advanced Media Lab.400 bytes to store one second's worth of samples.767. the total playing time of all songs combined is 5. Thus a sampled wave oscillating back and forth from -32.767 would be the loudest and a wave oscillating from -1 to 1 would be the quietest and zeroes in a row would indicate complete silence.000 bytes (approximately 10 megabytes) to store just one minute of CD-quality audio. This may not sound too alarming. There are 44. and only the keenest listeners with good equipment can distinguish between it and the original.400 bytes or 1.100 samples each second.

left rear and right rear + subwoofer (approx. this will only become more of a problem: such audio typically contains 5. there is a solution: compression. 120 Hz)). to the original file. especially when other sounds are present. While this is enough for some. Thus. like executable programs or applications. compressing and then uncompressing a file results in something similar. for any given input file. This model will be discussed in the next section. then the compression scheme is lossless. There are two categories of compression: lossless and lossy. The compression algorithm uses psychoacoustic to remove data or sound which the ear cannot hear. center. smaller file can be expanded back into the original file without losing any information whatsoever. No information is lost. Clearly. but even they only manage about a 50% reduction in filesize on average. Even "nextgeneration" utilities like WinRAR and bzip2 only manage a few percent more. Lossy Compression Lossy compression is a compression algorithm in which process some information will be lost. And with most recent DVD audio/video discs it's even more worse: up to 7. Lossless Compression Lossless compression means that the compressed. Compression is the technique of making a file takes up less space while still containing the same information. photographs and sounds. Unfortunately. There are special-purpose compressors (like flac) which were designed solely for losslessly compressing audio. If the original file is bit-for-bit identical. Fortunately. 20 . That is: take a file. and uncompress it again. 9th Semester CSEN 903 Experiment 1 Preparation As video DVDs with "surround sound" audio become more popular. but not identical. for music files to be truly portable they must be even smaller. And by modeling how the ear (and the brain) hears sound. The trick is to remove little bits of information in places where it cannot be perceived. compressing audio losslessly is hard. the amount of information that has to be stored without affecting the 6/11 .German University in Cairo Advanced Media Lab. it is possible to find places to remove information that would not have perceived anyway. Although the range of human hearing is about 20 Hz to about 20 kHz. This is not good for things which must be interpreted by a computer. 100% of the time. By filtering out tones outside this range. most can't hear anything above 15 kHz. Psychoacoustic Psychoacoustic is the principles of the human perception of sound. But it is often just fine for things where the "interpretation" is being done by a human. compress it.1 channels (left. 9th Semester DMET904 Advanced Computer Lab.. nearly tripling the space requirements. General-purpose compression programs like WinZip and gzip only manage about 5% on average. right. most CDquality audio contains information for reproducing these tones.e. However.1 channels with 24-bit samples at 96 kHz. it is impossible to carry around the entire music collection in the CD-quality.. requiring almost ten times the space. i. Lossy audio compression works using a psychoacoustic model.

the broader the range of frequencies it can mask. these tones can be filtered out. The tones below the threshold are inaudible. The greater the power of the masking tone. 9th Semester DMET904 Advanced Computer Lab. Figure 9: The audible level for a single masking tone of 1 kHz 7/11 . if two tones are widely separated in frequency. Figure 8: The threshold of human hearing for pure tones. 9th Semester CSEN 903 Experiment 1 Preparation perceived sound quality can be reduced. A lower tone can effectively mask (makes unable to hear) a higher tone.German University in Cairo Advanced Media Lab. Figure 9 plots the audible level for a single masking tone of 1 kHz. As consequence. Since the tones below the threshold are inaudible. This phenomenon called frequency masking. then little frequency masking occurs. Figure 8 shows the threshold of human hearing for pure tones.

Thus. then the encoder will use 128 kilobits to encode each second of the song. Figure 10: Temporal Masking Using sophisticated techniques such as these.German University in Cairo Advanced Media Lab. possibly without consideration of the drain on 8/11 . a file is encoded at 128 kbps. the encoder will still have to use 128 kilobits to encode that second. Thus that second will be represented rather poorly. On the other hand. where it could have used. halfway through the song. for example. If.OGG The name "Ogg" [After facing trademark rights problems the „Squish“ project was renamed to OGG. This usually results in files which are slightly smaller than CBR files even at the same target bit rate. This phenomenon called temporal masking. This gives the encoder the freedom to save bits on simple sections that don't need as many bits to represent them well and thus have some "extra" bits left over to use for sections that really need them. the softer tone which presence immediately before or after the occurrence of the louder tone can be filtered out. . before it can hear another tone. CD-quality sound but are a mere 10 to 20 percent of the size. where the lead guitar is ripping into a solo. the drummer is going crazy on the cymbals and the bass guitar is playing a funky groove. 9th Semester CSEN 903 Experiment 1 Preparation Similarly. any loud tone will cause the hearing receptors in the inner ear to become saturated and it requires some time to recover. no matter what. lossy audio compression formats such as Ogg Vorbis and mp3 can achieve results which are provably indistinguishable from the original. the more temporal masking occurs and the more time we need to start hearing soft tones. to ogg: „to do anything forcefully. CBR and VBR Early mp3 encoders (and most to this day) used what is called a "constant bit rate" (CBR). The louder is the masking tone. Newer mp3 encoders support what is called "variable bit rate" mode (VBR). 9th Semester DMET904 Advanced Computer Lab. but which sound much better in the busy sections. So the first measure (consisting of perhaps two drum clicks) will use 128 kilobits and will represent that second nearly exactly. say 300.

and Serious Sam: The Second Encounter engines. which many people find more pleasant than the metallic warbling in the 9/11 . The bitrates mentioned above are only approximate. 9th Semester DMET904 Advanced Computer Lab. Ogg's various codecs have been incorporated into a number of different free and commercial media players as well as portable media players from different manufacturers. The resulting frequency-domain data is broken into noise floor and residue components. Many popular game engines also support the Ogg format. video and text (such as subtitles). lossy audio codec project headed by the Xiph. The noise floor approach gives Vorbis its characteristic analog noise-like failure mode (when the bitrate is too low to encode the audio without perceptible loss). The decompression algorithm reverses these stages. and the human speech audio compression format Speex [http://www. Edison Carter's Controller on the Max Headroom television program.1 kHz (standard CD audio sampling frequency) stereo input. The more popular Ogg Vorbis codec has built-in support on many software players and extensions are available for nearly all the rest. Vorbis is inherently variable-bitrate (VBR). should have the same quality of sound in all versions of the encoder. a proposed standard protocol).German University in Cairo Advanced Media Lab. Vorbis Vorbis is a free and open source.xiph. not the least of which have been the Doom 3. Quality settings run from -1 to 10 and are an arbitrary metric-. Other codecs are less well supported although extensions are often available. History In May 2003. Vorbis-encoded audio in the Ogg container. It is most commonly used in conjunction with the Ogg container and is then called Ogg Vorbis. it is commonly used to encode free content (such as free music. multimedia on Wikimedia projects and Creative Commons files) and has started to be supported by a significant minority of digital audio players.: MP3). Because the format is free.files encoded at -q5. Technical details Given 44. and then quantized and entropy coded using a codebook-based vector quantization algorithm. so bitrate may vary considerably from sample to sample. the encoder will produce output from roughly 45 to 500 kbit/s depending on the specified quality setting.g. Vorbis uses the modified discrete cosine transform (MDCT) for converting sound data from the time domain to the frequency domain. 9th Semester CSEN 903 Experiment 1 Preparation future resources“] refers to the file format which can multiplex a number of separate independent open source codecs for audio. Although Ogg hasn't reached anywhere near the ubiquity of the MPEG standards (e.speex.org/]. two Internet RFCs were published relating to the format.]. Files ending in the . but newer versions should be able to achieve that quality with a lower bitrate.org/] codecs that are often encapsulated in Ogg are the video codec Theora [Theora is named for Theora Jones. The Ogg bitstream was defined in RFC 3533 (which is classified as 'informative') and its Internet content type (application/ogg) in RFC 3534 (which is.Org Foundation and intended to serve as a replacement for MP3. for example. Other prominent Xiph [http://www. that is. as of 2006. Unreal Tournament 2004. The term "ogg" is often used to refer to audio file format Ogg Vorbis.ogg extension may be of any Ogg media filetype. as of 2006.