You are on page 1of 9

THE EMU10K1 DIGITAL

AUDIO PROCESSOR
THIS PC AUDIO SOLUTION FULFILLS ITS ORIGINAL DESIGN GOAL AS A

MICROSOFT DIRECTSOUND ACCELERATOR AND, WITH ITS ENVIRONMENTAL


SIMULATION CAPABILITIES, PROMPTED THE DEVELOPMENT OF THE

ENVIRONMENTAL AUDIO EXTENSIONS (EAX) TO MICROSOFT DIRECTSOUND3D.

ORIGINALLY DESIGNED FOR THE CONSUMER COMPUTER MARKET, THE

EMU10K1 ALSO SUPPORTS PROFESSIONAL AUDIO CONTENT DEVELOPMENT.

Consumers and professional audio kHz audio sampling rate, it contains dedicat-
content developers continue to demand more ed, high-quality sample rate converters.
powerful audio systems in the virtual worlds
of games. These needs led our company to PC audio subsystem
produce the EMU10K1 digital audio proces- This subsystem has evolved to be quite com-
sor, which places both a high-quality music plex with numerous methods for software appli-
synthesizer and a powerful audio effects cations to interact with various audio sources
processor on the same die. and destinations. Applications can either call the
Implemented in a 0.35-micron, three-metal- operating system to play back audio stored in
layer CMOS process, the processor resides on disk files or directly place the digital audio wave-
a 6.7-mm × 6.5-mm die containing 2,439,711 forms in system memory and make device dri-
transistors, a 33-MHz PCI clock, and 50-MHz ver calls for playback. Applications can produce
Thomas C. Savell and 100-MHz internal audio clocks. It has a music using an abstract language known as
64-channel wavetable synthesizer with per- MIDI1 that supports commands such as “play
Joint Emu/Creative channel sample rate converter, digital filter, middle-C” and “use grand piano sound.” MIDI
envelope generator, low-frequency oscillator, commands can even originate from an external
Technology Center and routing/mixing logic. The EMU10K1 synthesizer or keyboard controller connected via
accesses digital audio data stored in system a cable to the computer. In addition to these
memory via a 64-channel PCI bus master sys- sources, other devices such as CD-ROM, DVD,
tem with virtual memory mapping identical to and microphones also produce audio that the
that in Intel CPUs. A powerful audio effects user can hear and record to a disk file. Recent
processor adds environmental simulation, 3D 3D computer games tend to include 3D audio
positioning, and special effects to audio from as well as graphics, requiring additional pro-
the wavetable synthesizer and numerous other cessing to create a virtual 3D audio environment.
digital audio sources such as S/PDIF
(Sony/Philips Digital Interface), I2S (Philips Architecture
Inter-IC Sound bus), and AC97 (Audio Codec The major architectural elements of the
97). Since the processor operates at a fixed 48- EMU10K1 are the PCI bus master and slave

2 0272-1732/99/$10.00  1999 IEEE


EMU10K1 digital audio processor

PCI DAC
PCI
CPU slave Speakers

Transmitters
64-channel DAC
wavetable Effects
synthesizer processor
and
Waveforms PCI master
digital
in PCI with virtual Digital audio
audio
system memory (S/PDIF)
mixer
memory mapping SRC

S/PDIF Digital SRC


CD
audio
Microphone receivers
ADC

Figure 1. PC audio subsystem. All audio sources pass through the effects processor where they are mixed together to create
a virtual 3D environment for rendering on a multiple-speaker system.

interfaces, 64-channel wave- Mixing amplifier


table synthesizer, effects pro- 48
cessor, high-quality sample Interpolating kHz Low-pass
+
Out
Amp bus
rate converters, and digital oscillator filter
audio receivers and transmit-
ters. Figure 1 illustrates the
connection between blocks Envelope Envelope Envelope
and their interaction with the generator generator generator Out
+ bus
application environment.

Wavetable synthesizer Figure 2. EMU10K1 wavetable synthesizer.


Each of the 64 channels in
the wavetable synthesizer con-
sists of an interpolating oscillator; a resonant duration and further conserve sample memo-
two-pole, digital low-pass filter; three envelope ry. It responds to the player’s touch with greater
generators; and a mixing amplifier with mul- expressiveness, using the envelope generator to
tiple selectable output buses. See Figure 2. control the amplitude and filter cutoff.

Background. Wavetable synthesis uses a digital Interpolating oscillator. The EMU10K1


recording of a natural sound such as a note wavetable synthesis oscillator contains an
played on a piano. The wavetable synthesizer interpolating sample rate converter and an
replays this recording in response to a player addressing unit capable of looping and pro-
striking keys on a keyboard. When synthesiz- ceeding at an arbitrary, noninteger rate. The
ing an instrument such as a piano, it is not nec- addressing unit maintains a phase accumula-
essary to record every note the instrument can tor that contains the current memory address,
play. To conserve memory, the sound designer stored in an integer.fraction format. The inte-
samples a small number of notes from across ger portion determines the memory read loca-
the entire range of the instrument, and the syn- tions, and the fractional portion determines
thesizer creates intermediate notes on the fly the interpolation phase shift. The addressing
with pitch-shifting hardware. The equivalence unit adds a phase increment, stored in the
of pitch shifting and sample rate conversion same integer.fraction format, to the current
allows the user freedom in choosing the source address every sample period.
sample rate. The wavetable synthesizer includes The looping hardware uses two program-
looping hardware to create notes of arbitrary mable addresses, the loop start address and the

MARCH–APRIL 1999 3
EMU10K1

loop end address. When the Envelope generator. Traditional envelope gen-
In ∗ + Out waveform address passes the erators for music synthesizers contain four
loop end address, the address- phases: attack, decay, sustain, and release. The
Z −1 ing unit loads the current envelope generator of the EMU10K1 actual-
DCNorm
address with a value equal to ly has six phases: delay, attack, hold, decay,
2 the loop start address plus the sustain, and release, as illustrated in Figure 4.
∗ amount by which the address The delay and hold phases provide addition-
would have passed the loop al control to the sound designer.
B1 Z−1 end address. This causes the Simple playback of recorded samples does
fetching of data at the loop not have the expressiveness of a musician play-
−1
start address again and repeat- ing a real instrument. This is because there are
B2 ∗
ing the loop until stopping numerous ways the instrumental sound
the oscillator. Note that the changes in response to a musician’s playing
Figure 3. Digital filter topology. current address register can style. When the musician strikes an instrument
accept any value to start with, harder, the sound is louder, has a sharper attack,
and it proceeds without inter- and is brighter. A softer stroke results in a qui-
ruption until it reaches or passes the loop end eter sound, a more gradual attack, and is duller.
address. This accommodates a different, non- The envelope generator of the EMU10K1 pro-
repeating waveform during the initial attack duces these effects by separately controlling the
of a note, while providing an effective form of amplifier gain and filter cutoff frequency.
data compression and placing no predeter-
mined limit on the length of time a sound can Virtual memory mapping
play. Within the PC environment, application
The 8-point interpolating sample rate con- memory is virtualized into 4-Kbyte pages that
verter uses the fractional address to determine are mapped into a larger contiguous address
the position between input samples at which space. An application program must request
to output a new sample. This is performed an allocation of memory, a buffer, from the
using the well known Smith-Gossett2 algo- operating system. Throughout the course of
rithm, which requires 16 multiplies and 24 normal computer usage, programs allocate and
adds. An on-chip ROM stores the filter coeffi- deallocate memory buffers millions of times.
cients used in the interpolation algorithm. The As the process continues, it may become
EMU10K1 uses extended-precision arithmetic impossible to find a single block of physical
to tolerate intermediate overflow operations memory to satisfy an allocation request. The
and to detect output overflow. In the case of operating system uses virtual memory map-
output overflow, the sample rate converter pro- ping to solve this problem. Each buffer of
duces a saturated output, avoiding severe two’s- memory, while addressed as a contiguous
complement wraparound distortion. whole, may in fact be fragmented into many 4-
Kbyte pieces randomly scattered throughout
Resonant digital filter. The resonant digital fil- the physical memory. Since a DMA bus mas-
ter implementation is a two-pole low-pass ter must present physical addresses to the bus,
design. Figure 3 shows the filter topology, there must be a way to resolve the virtual
which requires two storage elements, three mul- addresses used by the application program.
tiplies, and four adds. The EMU10K1 uses
techniques presented by Rossum3 in 1991 to Double buffering. One solution is to allocate a
reduce quantization noise, increase the pole single, physically contiguous buffer into
angle resolution at low frequencies, and increase which audio is copied from fragmented vir-
the pole radius resolution near the unit circle. tual memory prior to DMA transport. Due
The implementation of these techniques to operating system limitations, physically
requires special encoding of the coefficients and contiguous buffer allocation is not guaran-
an additional two’s-complement operation. teed, and it can only be requested during sys-
The filter is capable of more than 20 dB of tem start-up. Even if it is successful, the
resonance. The envelope generator can sweep memory cannot be dynamically allocated and
the filter cutoff frequency in real time. deallocated at runtime. This is a decided dis-

4 IEEE MICRO
advantage to the user, as a large segment of Hold
main memory must be permanently assigned
Decay
as an audio buffer. It also incurs the CPU Attack Sustain
overhead of copying buffers from virtualized
memory to the physically contiguous buffer. Release

Scatter-gather. Another way is to pass a scat- Delay


ter-gather list to the stream transport engine. Looped
A scatter-gather list is an ordered set of phys- section
ical addresses that must be parsed as the of
waveform
stream is being transported. This allows
dynamic allocation and deallocation, and does
not require the copying of buffers. However, Figure 4. The various phases of the EMU10K1 envelope gen-
the same sound is often triggered multiple erator overlaid on the digital audio waveform. The attack
times simultaneously, and for music and game phase typically contains more high-frequency information as
applications each instance may require differ- well as wideband noise. The high frequencies and noise con-
ent sample rate conversion ratios. This tent become attenuated during the decay phase, and a fun-
requires redundant copies of the same scatter- damental oscillation, or pitch, becomes prevalent. Once this
gather list. Even more significantly, sounds occurs, a few points that are looped during the sustain and
that loop to allow them to play for long peri- release phases can represent the remainder of the sound.
ods require excessively long lists.

Page table. The EMU10K1 uses a better a multiple-speaker system. It satisfies the mix-
method, translating the addresses in a similar ing requirements of the PC environment with
fashion as the CPU by using a page table and a total of 31 inputs and 32 outputs. The 24-
a translation look-aside buffer. The page table bit audio I/O capability provides an ample
is located in system memory and contains the 144-dB dynamic range, enough to span the
physical addresses of each 4-Kbyte page with- entire human hearing range from perceived
in the logical sound memory, as shown in Fig- silence to past the threshold of eardrum dam-
ure 5 (next page). age. However, all instructions use 32-bit inte-
The on-chip translation look-aside buffer ger or fixed-point operands to support the
contains the physical mapping of the current additional precision required for complex
logical address. As the stream transport engine operations such as recursive filtering.
moves through logical memory, it detects
when the logical to physical mapping is no ALU. At the core of most signal-processing
longer valid, and issues a read from the page algorithms is the multiply-accumulate opera-
table to reload the translation look-aside tion. Naturally, the center of the EMU10K1
buffer. This method is very efficient for both effects processor is a high-speed arithmetic
the CPU and audio stream transport engine. logic unit that implements variants of this
It permits dynamic allocation of audio buffers powerful operation. All instructions use four
and has a minimum of redundancy. register addresses: result, accumulator, multi-
As wavetable oscillators proceed, they sub- plier, and multiplicand. For maximum flexi-
mit PCI bus master requests for data. A two- bility, the operand address space is
level priority scheme based on the degree of symmetrical in that any address is valid for use
data starvation arbitrates bus master access. in any operand. The effects processor forwards
This reduces the probability that an oscillator the result register to conceal pipeline delays
will run out of input samples. A single bad and permits the result to be used as an input
sample can be audible, especially when operand on the next instruction period. It
inputting the audio to a recursive delay line. processes 32-bit fixed-point and integer
operands and has a 67-bit accumulator to
Effects processor allow for intermediate overflows during multi-
The EMU10K1 effects processor creates stage operations such as finite-length impulse
the final output that the listener will hear on response filtering.

MARCH–APRIL 1999 5
EMU10K1

Scatter-gather list
Seg 1 Start Seg 3 End
Seg 1 End Seg 4 Start
Seg 2 Start Seg 4 End Scatter-
Seg 2 End gather
Seg 5 Start
approach
Seg 3 Start Seg 5 End

Physical memory

Page
table
0 12
1 13
2 2
3 3
Mapping
4 4
5 8
6 17
7 18
8 19
9 10 Page
table
approach
Logical memory

Play list
Start
Loop Start
Loop End
End

Figure 5. Scatter-gather (upper) versus page table (lower) virtual memory approaches. Both approaches render the same wave-
form. However, scatter-gather can require a very long list to support looping, while the page table requires no extra data.

Delay lines. Delay line buffer registers provide even present a challenge to a local memory,
zero-latency access to a small on-chip delay since delay memories must be many thousands
memory and a large delay memory located in of bytes deep to be useful. By dedicating a small
the PC host memory. A dedicated delay line buffer memory to store the “current” data for
processor performs the tedious job of main- each delay pointer, access time can be instan-
taining all the circular addresses. The micro- taneous. The data movement engine then can
code works with logical offset addresses that do spread out the accesses into the large delay
not need updates on every sample period, memory over the course of a sample period.
except for special effects that require modula-
tion of the delay length. Still, only the modu- Instruction sequencing. The effects processor
lation requires microcode for calculation; the has a novel architecture: it does not contain a
delay line processor automatically performs the program counter or support branch/loop con-
circular address increment operation. structs. Instead, a powerful conditional exe-
A dedicated data movement engine absorbs cution mechanism provisionally stores the
the long latency of a large external memory. results of operations. A special SKIP instruc-
The effects processor requires instant access to tion that implements the following logic
the delay line. That is not possible when the accomplishes conditional execution:
delay memory is physically located across an
arbitrated bus such as the PCI. Access time can IF (cond) THEN SKIP instruction_count

6 IEEE MICRO
The conditional expression, cond, is a er differs slightly. Even if the output sample
Boolean AND-OR-NOT combination of a rate were also 44.1 kHz, the slight differences
mask operand with a hardware register that in clock frequency would cause the relative
stores condition codes from previous instruc- phases of the input and output sample rates
tions. This supports both conventional expres- to drift over time, eventually resulting in
sions such as less-than and more complex repeated or dropped samples.
conditions such as negative-and-overflow-or- It is possible to force the clock frequencies
zero. The execution of a SKIP instruction does to be exactly synchronous by using a tracking
not alter the order in which instructions are phase-locked loop, or PLL, rather than a fixed-
fetched. Rather, the processor still fetches, frequency oscillator. Professional recording
decodes, and executes skipped instructions, studios distribute a master clock to all inter-
but does not write back the results. connected digital audio devices, which derive
The processor determines the number of local clocks from the master. This guarantees
consecutive instructions to be skipped from synchronicity of all digital audio streams. This
the instruction_count operand. This permits approach is expensive and difficult, requiring
skipping entire blocks of code and facilitates PLL-based synchronization capabilities in all
real-time loading and unloading of signal- digital audio devices. Devices that cannot syn-
processing programs without affecting audio chronize to an external clock source must
output. The SKIP instruction even serves as become the master clock source. This is a dis-
the NOP instruction, using the construct, tinct disadvantage as there can only be one
ALWAYS SKIP ZERO, which does not store master clock at a time.
results and continues executing the program A better solution is to use sample rate con-
on the next instruction cycle. version to resolve the incoming sample rate to
For every output sample period, the effects the output rate. This requires a sample rate
processor reads all microcode memory in detector that continuously updates an estimate
sequential order and skips the writing of result of the asynchronous digital input rate. The sam-
registers based on the conditions specified in ple rate estimate maintains a phase accumulator
SKIP instructions. This provides for if-then- that controls a 16-point Smith-Gossett2 sample
else execution, but not looping. A very impor- rate converter. Such an asynchronous sample
tant side effect of this architecture is that the rate converter avoids the cost of a tracking PLL
total execution time of all concurrent pro- and provides support for multiple, simultaneous
grams is always exactly one sample period. asynchronous audio streams. The EMU10K1
In a general-purpose DSP, one must take can support three simultaneous asynchronous
care to ensure that the entire program executes stereo streams using high-quality asynchronous
within a sample period. Otherwise, audible sample rate conversion.
defects can result. In the case of infinite-length
impulse response filters and recursive delay Digital audio recording
lines, a single sample defect can remain audi- Speech recognition, Internet telephony,
ble for a very long time. Strange bugs in audio- music recording, and content authoring appli-
processing algorithms have been traced to an cations need digital audio recording capabil-
occasional interrupt service routine that caused ities. Noncritical recordings can use reduced
program execution time to exceed the length sample rates to decrease data storage require-
of a sample period. These types of bugs are not ments. The standard kHz rates needed with-
possible in the EMU10K1 effects processor. in the PC are 48, 44.1, 32, 24, 22.05, 16,
11.025, and 8. The conventional method is
Asynchronous digital audio receivers to reduce the clock rate of an analog-to-digital
We designed the EMU10K1 to receive dig- converter, or ADC. However, this requires the
ital audio directly from devices such as CD- use of separate ADC and DAC chips rather
ROM and DVD drives. However, the sample than an inexpensive monolithic codec.
rate of compact disc audio is 44.1 kHz, and To ensure adequate alias rejection at all sam-
the EMU10K1 output sample rate is 48 kHz. ple rates, more expensive tracking analog filters
Due to manufacturing tolerances and drift, must be used as well. The EMU10K1 supports
the clock frequency of each compact disc play- the various recording sample rates with very

MARCH–APRIL 1999 7
EMU10K1

Sample
IN1 rate ∗ Digital audio mixing
converter Audio mixing is a weighted summation of the inputs with saturation to avoid the severe
Sample distortion caused by two’s-complement overflow. Mixing multiple digital audio streams
IN2 rate ∗ requires that the streams have exactly the same sample rate. However, CD audio sampled
converter at 44.1 kHz will inevitably need to be mixed with voice or sound effects sampled at some
+ Out
Sample other rate. Lower sampling rates provide an effective data compression method for sounds
IN3 rate ∗ that do not have substantial high-frequency content. In addition, the native sample rate of
converter the digital audio output may be different, typically 48 kHz. The solution is to convert all sound
sources to the output sample rate before mixing, as shown in Figure A.
Digital audio is simply a numeric representation of an analog waveform. There are infinitely
Sample
INn rate ∗ many digital representations of the same analog waveform. Sample rate conversion is a way
converter to transform one representation into another, which can be done with various degrees of
quality and corresponding complexity. To create a new sample stream at a different rate, one
Figure A. Digital audio mixing. must interpolate between the original samples. Figure B illustrates the three commonly used
forms of interpolation: nearest-neighbor, linear, and multipoint.
The simplest form, nearest-neighbor interpolation, requires no arithmetic and has the
1 worst quality. A better method is linear interpolation, which requires a small amount of arith-
0.5 metic and has reasonable quality. Multipoint interpolation requires significant arithmetic, a
0
coefficients table, and multiple input samples to create a single output sample,1 but produces
the best quality.
−0.5
We can view sample rate conversion as a three-stage process. First, the conversion algo-
−1
0 1 2 3 4 5 6 7 8 9 10 rithm inserts a number of intermediate zero-valued samples in between the original sam-
(1)
ples, thus creating a new sample stream at a higher sample rate. The new rate is an integer
1
multiple of the original rate and is typically thousands of times higher when converting arbi-
0.5
trary sample rates. Then, the algorithm filters the new high-rate stream to the lower of the
0 input and output Nyquist frequencies. The stopband of the low-pass filter must sufficiently
−0.5 reject aliases to achieve the desired audio quality. Finally, the low-pass filtered stream is
−1 decimated to generate output samples at the desired rate.
0 1 2 3 4 5 6 7 8 9 10
(2) High-order sample rate conversion is an extremely computationally intensive operation.
1 To maintain equivalent quality from input to output, noise due to aliasing must be below the
0.5 magnitude of the least significant bit on the input signal. For example, a 20-bit digital audio
0 waveform has a dynamic range of about 120 decibels. To get equivalent 20-bit output, noise
−0.5 introduced in the sample rate conversion process must be less than −120 dB. For a 20-bit
−1 stereo signal, this requires more than 320 MIPS of processing power.
0 1 2 3 4 5 6 7 8 9 10
(3)

Figure B. Interpolation methods: nearest- References


neighbor (1), linear (2), and ideal multi- 1. R.E. Crochiere and L.R. Rabiner, Multirate Digital Signal Processing, Prentice-
point (3). Hall, Upper Saddle River, N.J., 1983, pp. 127-190.

high-quality, 64-point sample rate converters, Sample rate conversion


thus permitting the use of low-cost, monolith- The EMU10K1 achieves relatively high-
ic codec chips and simple analog antialiasing order multipoint conversion using Smith and
filters. Recording incurs very low CPU over- Gossett’s particularly efficient algorithm of lin-
head by using bus master DMA directly to sys- early interpolating the filter coefficients for con-
tem RAM. At the half-buffer and full-buffer volution with the sample data stream. Rather
points, an interrupt signals that software should than incurring the high cost of ideal conversion,
flush the audio to disk. All 32 output channels the EMU10K1 uses a perceptual optimization
of the effects processor may be selected for mul- technique so that most distortion components
tichannel recording, enabling the use of the are inaudible. The key discovery was that most
EMU10K1 in a home studio environment. of the energy in real musical sounds is in the

8 IEEE MICRO
low-frequency band of human hearing. The got closer to tape-out, the importance of sim-
images of these low-frequency components are ulation faded and gave way to emulation. The
located near multiples of the sample rate. design complexity coupled with the numerous
Designing the antialiasing filters to have deep asynchronous clock boundaries called for the
notches at multiples of the sample rate at the use of emulation for final verification. Using
expense of some additional high-frequency alias- the emulator, we could operate the design in
ing significantly improves the perceived sound an actual PC and run millions of times more
quality results. Rossum4,5 received a patent in cycles than would be possible using simulation
1992 for interpolation filters designed in this alone. We could also sniff out a few nasty asyn-
manner. The high-frequency alias components chronous boundary bugs that would have oth-
do not have a great deal of energy because the erwise made it into silicon. Emulation does
source material is music, which has very little have its pitfalls, however; it is an immature
information in the upper band. In addition, technology that often left us feeling frustrated.
Fletcher and Munson6 discovered that human Because of the emulator’s physical size and
hearing is strongest in the middle frequency its FPGA use, the design cannot be run at full
range and very weak in the extreme low- and speed. To accomodate the low clock rate of the
high-frequency range. Thus, the human hear- emulated design, we modified the target PC
ing system itself attenuates the small amount of motherboard to reduce its PCI clock speed.
high-frequency aliasing. These notches provide We found that a factor of about 1:50 was nec-
a dramatic improvement in the sound quality essary for reliable operation, so we operated
with no increase in computational complexity. the PCI bus at 0.6 MHz. We then ran the
Consequently, the EMU10K1 wavetable syn- audio codec at the reduced rate, so we could
thesizer has higher sound quality than would observe analog audio output on an oscilloscope
otherwise be expected when using only 8 points and even perform spectral analysis to detect
for interpolation. All EMU10K1 sample rate distortion. These benefits far outweighed the
converters use the same technique. inconveniences we suffered in getting the emu-
lation environment up and running.
Physical design characteristics To make efficient use of the emulator, we
As stated earlier, the EMU10K1 is imple- required early software support to write a chip
mented in a 0.35-micron CMOS process with debugger that gave us direct access to the
three metal layers. The 6.7 mm × 6.5 mm die chip’s features. The debugging software need-
with 2,439,711 transistors uses a 33-MHz ed to be a DOS program, since Windows
PCI clock and internal audio clocks running required far too much time to boot: between
at 50 and 100 MHz. The chip has 108 signal 10 and 20 minutes. The chip debugger sup-
pins in a 144-pin PQFP. The device dissipates ported macro scripts, so we could build high-
about one watt in normal operation, result- er level operations from sequences of
ing in a worst-case junction temperature of commands. That made it easy to perform
105°C, assuming ambient temperature of first-silicon evaluation by reusing the macro
70°C. The core operates at an internal volt- scripts written during development.
age of 3.3 V, but the I/O is 5-V tolerant and
supports both 3.3-V and 5-V PCI buses. Timing analysis
We used static timing analysis for the major-
Design methodology ity of sign-offs. We could not properly analyze
We used a VHDL-synthesis method to certain design sections, so we relied on a com-
design the EMU10K1, although we first devel- bination of SDF-annotated (Standard Delay
oped portions of the design using a C-language Format) gate-level simulation and “correct-by-
model and then translated them into VHDL. design” practices supported with billions of
emulation cycles.
Logic verification One of the more difficult challenges over-
Initially, we verified the individual blocks come during timing analysis was fixing setup
using RTL simulation. As chip-level integra- and hold violations on the internal RAM
tion progressed, we ran more simulations to inhibit pins. We chose to inhibit the RAMs on
verify the connections between blocks. As we all unused clock cycles to conserve power,

MARCH–APRIL 1999 9
EMU10K1

thereby reducing the worst-case junction tem- provide adequate audio bandwidth for now,
perature and increasing our chance of success- but it could become a bottleneck in the near
ful silicon. The vendor’s RAMs inhibit pins future. The EMU10K1 internal arbitration
worked by gating the clock. This required and priority scheme is very optimal and leaves
setup time before the clock’s falling edge and very little room for increased memory access
hold time after the clock’s rising edge, reduc- performance. Newer 0.25-micron and small-
ing our timing window to much less than 5 ns er processes present some challenges in terms
in some cases. We solved the problem with of I/O voltage, especially considering that most
careful placement and routing, and by tweak- PC motherboards have 5-V-only PCI buses.
ing the clock arrival times in layout. Also, sat- These system limitations will eventually need
isfying critical PCI timing required careful to be eliminated to allow further growth and
placement of the PCI core and, in some cases, innovation in the PC audio subsystem. MICRO
flip-flop duplication so the outputs could be
placed close to the pad cell. References
1. The Complete MIDI 1.0 Detailed Specifica-
Testability tion, MIDI Manufacturers Assoc., 1984-1996.
The chip has several test modes including 2. J.O. Smith and P. Gossett, “A Flexible Sam-
scan, parametric NAND tree, and RAM test. pling-Rate Conversion Method,” Proc. IEEE
To support extremely high production vol- Int’l Conf. Acoustics, Speech, and Signal Pro-
umes, we used functional test vectors to sup- cessing, IEEE Press, Piscataway, N.J., Mar.
plement the scan test and achieve the highest 1984, pp. 19.4.1-19.4.4.
possible fault coverage. We developed the test 3. D. Rossum, “The Armadillo Coefficient En-
vectors by running an RTL simulation to coding Scheme for Digital Audio Filters,” Proc.
record the pin states. IEEE ASSP Workshop on Applications of Sig-
Interestingly, we ran a gate-level simulation nal Processing to Audio and Acoustics, 1991.
to verify vectors. We used emulation as our 4. D. Rossum, U.S. patent number 5,111,727,
primary logic verification tools. So the pri- issued May 12, 1992.
mary reason to run the gate-level simulation 5. D. Rossum, “Constraint Based Audio Inter-
was to verify that the vectors would not fail polators,” Proc. IEEE ASSP Workshop on
because of RTL/gate differences in the mod- Applications of Signal Processing to Audio
eling of unknowns (X’s). The vectors also pro- and Acoustics, 1993.
vided handoff verification at the vendor. We 6. H. Fletcher and W.A. Munson, “Loudness,
encountered a number of vector mismatches Definition, Measurement and Calculation,” J.
during the handoff that were traced to incon- Acoustic Soc. of America, 1933, Vol. 6, p. 59.
sistencies in the handling of X’s by the inter-
nal SRAM models between the various Thomas C. Savell is a staff ASIC engineer at
simulators involved. the Joint E-mu/Creative Technology Center.
His responsibilities include digital and audio

T he EMU10K1 is an evolutionary step


forward in the development of digital
audio for the PC. The choice of placing both
VLSI specification, architecture, design,
implementation, and verification. He holds a
BA in music technology from the University
a high-quality music synthesizer and a pow- of California, San Diego, with a double minor
erful audio effects processor on the same die in computer science and cognitive science. He
has helped to push the expectations for 3D is a member of the IEEE Computer Society,
audio in the PC. Signal Processing Society, and MIDI Manu-
The role of the 3D audio accelerator has facturers Association Technical Standards
become analogous to that of the 3D graphics Board for 1997 and 1998.
accelerators. The demand for more power will
continue as game developers create more elab- Direct questions concerning this article to
orate virtual worlds that use up processing the author at the Joint Emu/Creative Tech-
bandwidth faster than CPU manufacturers nology Center, 1600 Green Hills Road, Suite
can create it. 101, PO Box 660015, Scotts Valley, CA
The 33-MHz, 32-bit PCI bus appears to 95067-0015; tcs@emu.com.

10 IEEE MICRO

You might also like