Media And Storage

Session 7 AM Zeus-Brown

Learning outcomes

• What is digitisation • Some simple methods of digitisation • What's digital and what's analogue

Group Discussion

• In Two groups of spend 5 minutes discussing the following
– What is Analogue media ? – What is Digital media? – The difference between digital and analogue

• Two main areas we will look at
– Image digitisation – Audio digitisation

Areas of study

• The next few slide will be shown to you quickly write down the 1st thing you see


Image 1



• Taking an image from the real world to the digital
– You can take a photograph using a conventional film camera, process the film chemically, print it onto photographic paper and then use a digital scanner to sample the print (record the pattern of light as a series of pixel values). – You can directly sample the original light that bounces off your subject, immediately breaking that light pattern down into a series of pixel values -- in other words, you can use a digital camera.


• Two types of capture
– A CCD transports the charge across the chip and reads it at one corner of the array. An analog-to-digital converter (ADC) then turns each pixel's value into a digital value by measuring the amount of charge at each photosite and converting that measurement to binary form. – CMOS devices use several transistors at each pixel to amplify and move the charge using more traditional wires. The CMOS signal is digital, so it needs no ADC.


• pros and cons:
– CCD sensors create high-quality, low-noise images. CMOS sensors are generally more susceptible to noise.


• Pro and cons
– Because each pixel on a CMOS sensor has several transistors located next to it, the light sensitivity of a CMOS chip is lower. – Many of the photons hit the transistors instead of the photodiode. – CMOS sensors traditionally consume little power. – CCDs, on the other hand, use a process that consumes lots of power. CCDs consume as much as 100 times more power than an equivalent CMOS sensor. – CCD sensors have been mass produced for a longer period of time, so they are more mature. – They tend to have higher quality pixels, and more of them.

• Additive colour (starts Black)
– Adding colours in the RGB range goes to white
• Where might you see this – The world – Adding light frequencies


• Subtractive colour (starts white)
– Adding colours in the RGB range goes to black
• Where might you see this – Printers – Filtering out light frequencies

• Splitting the colours
– each colour is then mapped – also see rotation filter




This method tends to be use in larger more expensive cameras

• The Bayer filter
– overlay the filter over each individual photosite


A CCD A sample of the image zoomed in

• Scanner work in much the same way
– the light bar send out light – this is picked up by the sensor and then it’s the same as a camera

Scanner work in much the same way

– –

the light bar send out light this is picked up by the sensor and then it’s the same as a camera


• The DCT is a linear, invertible function F : RN -> RN (where R denotes the set of real numbers), or equivalently an invertible N × N square matrix.

Discrete cosine transform

• There are several variants of the DCT with slightly modified definitions. • The N real numbers x0, ..., xN-1 are transformed into the N real numbers X0, ..., XN-1 according to one of the formulas:

• •

The DCT-I is exactly equivalent (up to an overall scale factor of 2), to a DFT of 2N − 2 real numbers with even symmetry. For example, a DCT-I of N=5 real numbers abcde is exactly equivalent to a DFT of eight real numbers abcdedcb (even symmetry), divided by two. (In contrast, DCT types II-IV involve a half-sample shift in the equivalent DFT.) Note, however, that the DCT-I is not defined for N less than 2. (All other DCT types are defined for any positive N.) Thus, the DCT-I corresponds to the boundary conditions: xn is even around n=0 and even around n=N-1; similarly for Xk.


The are many other versions of DCT if you wish to study these in your own time please feel free to do so

• What did you notice in the lab

Feed back from lab

• Digitally created images
– Working in layers – Vector and raster – Working out fill or no fill

Other image files

– Simple algorithm
• Count the lines • Fill on odd

• The human eye is good at seeing small differences in brightness over a relatively large area, but not so good at distinguishing the exact strength of a high frequency brightness variation. • This fact allows one to get away with greatly reducing the amount of information in the high frequency components. • This is done by simply dividing each component in the frequency domain by a constant for that component, and then rounding to the nearest integer. • This is the main lossy operation in the whole process. • As a result of this, it is typically the case that many of the higher frequency components are rounded to zero, and many of the rest become small positive or negative numbers, which take many fewer bits to store.



• CDs
– 44,100 samples/second * 16 bits/sample * 2 channels = 1,411,200 bits per second

Digitised sound

• So what does that mean ?
– Let's break that down:
• 1.4 million bits per second equals 176,000 bytes per second. • If an average song is three minutes long, then the average song on a CD consumes about 32 million bytes of space. • That's a lot of space for one song, and it's especially large when you consider that over a 56K modem, it would take close to two hours to download that one song.

• To make a good compression algorithm for sound, a technique called perceptual noise shaping is used. • It is "perceptual" partly because the MP3 format uses characteristics of the human ear to design the compression algorithm. For example:
– There are certain sounds that the human ear cannot hear. – There are certain sounds that the human ear hears much better than others. – If there are two sounds playing simultaneously, we hear the louder one but cannot hear the softer one.

Think back to compression

• Sample and Bitrate what are they
– Sample rate how often you sample – Bitrate how detailed your sample is

– Sample rate of 9 per 1 sec – Bitrate of 8 per sample

Digitised version

How much detail will be lost in digitisation ?

• the most Effective lossy compression for audio data is
– Identify data that doesn’t matter – In the sense of not affecting the perceived audio is no different from the original – Thus disregarding sounds that the human ear cant hear –

Perceptually Based Compression

Reasons you might not hear

• There are two particular reasons why the human ear may fail to hear a sound and they are
– A sound may be to quite to hear – A Sound may be mask by another sound

• Of course nether of the reasons are as simple as they may 1st seem
– The threshold of hearing is the minimum level but this is varies along a none linear line
• Avery high or very low Frequency sound must be much louder than a midrange tone – WHY do we heart better at mid range? • There is no point compressing audio out side this threshold – To do this we apply the psycho-acoustical model, a mathematical description of the way the ear and brain react to sound

• Threshold of Hearing • Sound level measurements in decibels are generally referenced to a standard threshold of hearing at 1000 Hz for the human ear which can be stated in terms of sound intensity: • or in terms of sound pressure:

• This value has wide acceptance as a nominal standard threshold and corresponds to 0 decibels. • It represents a pressure change of less than one billionth of standard atmospheric pressure. • This is indicative of the incredible sensitivity of human hearing. • The actual average threshold of hearing at 1000 Hz is more like 2.5 x 10-12 watts/m2 or about 4 decibels, but zero decibels is a convenient reference. • The threshold of hearing varies with frequency, as illustrated by the measured hearing curves.

Figure base: Threshold of hearing curve in a quiet environment. [Zwicker and Fastl, 1999]



simultaneous masking

•Threshold of detection curve in the presence of a masking noise with a bandwidth equal to the critical bandwidth, a centre frequency of 1 kHz and a level of 60 dBspl [Zwicker and Fastl, 1999].

simultaneous masking

The threshold of detection caused by a masking noise with a bandwidth equal to the critical bandwidth with a centre frequency of 1 kHz and various levels [Zwicker and Fastl, 1999].


• What did you notice from lab • Audio task

Feed back from lab

1. 2.


N, Chapman J, Chapman 2002 Wiley Digital Multimedia Zwicker and Fastl, 1999 Zwicker, E. and Fastl, H. (1999). Psychoacoustics: facts and models. Springer series in information sciences, 22. Springer, Berlin ; New York, 2nd updated edition.

Sign up to vote on this title
UsefulNot useful