Professional Documents
Culture Documents
DAVID CLARK
330. © 1982 Audio Engineering Society, Inc. 0004-7554/82/050330-09500.75 J. Audio Eng. Soc., Vol. 30, No. 5, 1982'May
ENGINEERING REPORTS HIGH-RESOLUTION SUBJECTIVE TESTING
tive tests, in order to be scientific, must be performed audibility? Casual and scientific testing are not mutual-
double blind [1]. Double blind means that no one in a ly exclusive. The approach used to produce an answer
position to influence the outcome knows how the test to the question is to make the scientific rigor as trans-
factor is being varied, parent to the listeneras possible.
An important part of scientific testing is the control Level and response matching, polarity consistency,
experiment. This is a test of the test itself in which the elimination of extraneous pops, hums, etc., can be per-
factor under study is eliminated. This can establish: formed beforehand and need be of no concern to the
1) Random variations in results due to experimental listener. Frequently, however, there are time limits, re-
technique, the "noise floor." stricted program material, and other pressures in the
2) A reference point for judging the magnitude of usual double-blind comparison test. This is because the
the results. For example, articulation loss is measured experiment is of fixed duration by design, or the admin-
before and after a sound-reinforcement system is im- istrators create a time pressure by their presence in
proved to find the amount of improvement, running the experiment. There is, however, no inherent
The noise floor in subjective testing is frequently so reason why a scientifically rigorous double-blind eom-
high that achieving meaningful resolution is an exercise parison cannot be carried out over a period of weeks or
in extracting signal (useful data) from noise (test uncer- more in the listener's home or preferred listening envi-
tainty). This can be accomplished using statistical anal- ronment. In short, all of the appearances of a casual test
ysis. Positive results are inherently deterministic or sig- may be maintained.
nal because they are the result of a variation in the The remainder of this report describes an electronic
factor being studied. The test uncertainty is random or double-blind A/B comparator which will enable an
at least unrelated to the studied factor. Statistics can individual or group to perform rigorous tests in a casu-
separate the two: the larger the number of experiments, al manner. Operating controls and data analysis are
the greater the ability to extract signal from noise, designed to maximize the chance of detecting small au-
The question of why one bothers searching for such dible differences. Suggestions are made for the degree
minuscule audibility differences when such comparisons of elimination of extraneous factors to preserve validity.
are not made in real-world usage is often raised. One
reason is that the record-reproduction chain typically 2 REFINEM£NT$ TO THE A/B TEST
involves a long series of devices through which the au-
dio must pass. If a similar fault appears in ten devices The author's first experience with double-blind au-
in series, the compound fault may be clearly audible, dibility testing was as a member of the SMWTMS Audio
Also a number of different "less than audible" cleanups Club in early 1977. A button was provided which would
may result in an audible improvement, select at random component A or B. Identifying one of
A less obvious reason arises from the theory of signal these, the X component was greatly hampered by not
detectability applied to the human observer [2]. The having the known A and B available for reference. This
theory describes a human as a mathematically perfect was corrected by using three interlocked pushbuttons,
detector, except for a constant efficiency factor. The A, B, and X. Once an X was selected, it would remain
implication is that a human observer would be able to that particular A or B until it was decided to move on
detect any-difference between two signals when given to another random selection.
enough time. Since the total time spent in real-world However, another problem quickly became obvious.
listening to a device is likely to be much greater than There was always an audible relay transition time delay
the total test time, an enhanced real-world sensitivity is when switching from A to B. When switching from A
possible, to X, however,the time delay would be missingif X
The most common type of listening test incorporat- was really A and present if X was really B. This ex-
ing a control is the A/B test. Two components, A and traneous cue was removed by inserting a fixed length
B, are switched into the audio chain in turn so that a dropout time when any change was made. The dropout
comparison can be made. One component can be con- time was selected to be 50 ms which produces a slight
sidered the factor under study and the other the con- consistent click while allowing subjectively instant com-
trol. One component is preferably a wire bypass so that parison.
any differences heard can be presumed to be distortion When differences are small, a large number of re-
added by the real component [3]. sponses is necessary to achieve a high statistical prob-
In a casual audio salon type of A/B test, a difference ability that differences are audible. One way to accom-
between components A and B is almost always heard, plish this is to use few trials but a large number of
When the test is made more scientific, by eliminating listeners. If only one listener is used, however, a large
extraneous factors such as level mismatch and the sub- number of trials is necessary. Small numbers greatly
ject's knowlege of which is playing, the audible differ- penalize the listener for a single mistake, but also in-
ences begin to disappear. When the test is rigorous and crease the pure chance level of a perfect score.
double blind, audible differences between components A good minimum number of trials is 12 to 16 for an
become scarce [3]-[5]. ' '_ . individual. The p.r, esent comparator provides up to 100
A question becomes obvious,Doescasual testingTin _ ,_ trials, but this, has:-ne_erbeen approached in pra_:t4ce..If
vent audible differences or does Scientifi'c rigor inhibit _ an attempt is_made to accumulate responses by using
multiple test sequences, listeners are likely to start over to make the choice.
if they see that they have made errors in the first few 4) The listener knows that A and B are different and
trials. This effectively throws out certain data and that X is either A or B, so there is a correct answer.
makes subsequent statistical analysis invalid. Lack of a "no difference" option encourages the listen-
In the presen t design, obtaining the answer ends the er to muster all available auditory powers.
test sequence by disconnecting both components and 5) Random access A/B/X switching permits compar-
disabling the A, B, and X buttons.-Going back' tothe ing A to X for sameness or differentness. Also the tran-
test mode enters new random data into the memory for sition direction can be reversed (A to X and X to A).
a new test. The listenermay temporarily"wear out" one detection
The design philosophy was not to attempt to enforce mechanism, but can listen in a different way, hearing
a valid test, but to encourage it. For instance, the lis- another manifestation of the difference.
tener uses a hand-held control module for switching 6) If desired, the test can be performed individually.
trials and A, B, and X, but the answer and reprogram- Switch points and listening times then are customized
ming controls are located at the display. Typically, to to maximize the individual's sensitivity.
see the answers (and end the test) the listener must get 7) Great improvements in resolution can be achieved
up and go to the display. This does not prevent short if the listener knows what to listen for. Sensitizing tests
tests or cheating, but it makes the action more obvious, can use pink noise, sine waves, or pulses as appropriate
Likewise, the control buttons are easy to operate, yet to hear a difference. Sometimes an artificially enhanced
electrically interlocked to prevent tricking the logic or distortion can be produced by reducing feedback or
timing circuitry, connecting multiple devicesinseriesfor distortion build-
It would have been possible to include a memory for up, The listener is then more able to hear the difference
the listener's responses. Automatic readout of score, on music.
and even statistical analysis could then be performed 8) A tape loop or other means can be used to listen
by the unit. However, it was decided to provide only to exactly the same passage of music on A as on B.
those functions that the operator cannot do alone. This 9) Statistical analysis is used to detect any shift from
decision minimizes cost and prevents obsolescence if a pure chance responses. The probability that a particular
slightly different procedure is used. The comparator score from an A/B/X test is due to chance is given
can easily be interfaced to a general-purpose microcom- exactly by the binomial distribution [6]. Instead ofeval-
puter for data storage or analysis, uating the formula each time, a simple look-up table is
provided with the test form. An expanded version is
3 MAXIMIZING RESOLUTION provided in the owner's manual.
In pair comparison tests, the threshold of hearing is
The following specific operational and procedural traditionally considered to be at the point of 75% correct
considerations maximize resolution of the A/B/X test: responses. This may indeed correspond to a subjective
1) Instant juxtaposition of compared signals transition from not hearing to hearing, but the sounds
2) Test asks for differences only, not qualitative judg- must be affecting the listener differently when the correct
ments responses are between 50 and 75%. Our method utilizes
3) No time limits this prethreshold information to indicate audibility if it
4) Forced decision is statisticallysignificant.
5) "Sameness" or "differentness" comparison pos- As an example of how this increases sensitivity, con-
sible siderthe studiesof optimalbandwidthby Plengeet al.
6) Test is listener controlled [7]. Seven antialiasing filters of differing order, align-
7) Training signals or enhanced difference can be ment, and frequency were auditioned by a pair-eom-
used parisontest. Thetotaledresultsfor allsubjectswereless
8) Exact repetition of signals possible than the 75% threshold for every filter. Applying the
9) Statistical methods used to analyze group data. binomial distribution formula, however, it is extremely
To explain in more detail: probable that four of the seven filters were audible.
1) Many aspects of sound quality are short lived in Various temporal methods of presenting the sound
memory. A delay of only 50 ms between compared with the factor under study and the control are in use.
signals allows these qualities to be compared. They each have advantages in simplicity, repeatability,
2) The simplest possible judgment is: they are the resolution, and test time. The common audio salon A/B
same Or they are different. If the listener wishes to make test suffers most from the extraneous variable of criteri-
a quality judgment also, this should be done in writing, on dependence. Instructions like "amplifier A has high-
A good score in picking A or B qualifies the listener's speed circuitry. Listen for the faster sound of the highs,"
opinion, is an exampleof the worstkind of criterialbiasingof a
3) It can be argued that the stressful conditions of a test. Generally the less instruction and the more time
test do not allow the listener to be sensitive to certain symmetrical the presentations, the less the criterial in-
sonic parameters. There is no inherent time limit to fluence.
either the on time of A and B or the entire test. The The presentation ofapair of sounds is used extensive-
listener can simply wait for the proper sensitive attitude ly in clinical and scientific studies of the human subject
and has also been used to study the sound stimulus, bandwidths and center frequencies necessary to elimi-
Each sound in the pair is usually presented for a fixed nate audible frequency response effects for most music
interval and separated by a short fixed interval. Simple sources. The curves are compiled from fairly limited
instructions and short test times are advantages. Table 1 double-blind testing of a limited number of individuals.
summarizes the qualities for various pair presentation The level used was approximately 85 dB unweighted.
tests. Thesecurvesare in generalagreementwith the findings
of others [2], [4]. In a double-blind test, response differ-
4 MAINTAINING VALIDITY ences greater than those allowed by the curves are likely
Writing down one's answers while performing the test to be responsible for audible differences.
The audibility Of absolute phase or polarity is still in
is mandatory with the A/B/X method. Memory just contention, so to be on the conservative side, polarity
does not serve well enough when scoring time arrives. A
should be maintained [8].
form is available for this (Fig. 1). Usually 16 trials is a
There are an endless number of hums, pops, clicks,
good maximum for one sitting. Program material used
mechanical noises, and other extraneous factors which
should be noted. Source material which makes certain may influence a test. The comparison testing equipment,
distortions apparent may be of value for other tests. A no matter how Well designed, cannot guarantee a valid
brief table of the scores necessary to achieve 95% con- test. Someone involved must assume the responsibility
fidence level is provided. When filled out fully and
of eliminating these influences. One check is to switch
signed, this form documents a scientific test. If it dem-
between components in a normal manner but with no
onstrates unusual findings, others will attempt to dupli- signal present. Many times X can be identified 100%
cate the experiment, with no sound. External decoupling capacitors or drain
It is generally agreed that levels should be matched in resistors sometimes have to be added.
comparison testing. However, assume that a small high-
frequency response rise exists in a particular phono 5 HARDWARE
pickup cartridge which is to be compared to one known
to be flat. A comparison test would likely reveal a dif- The ABX comparator (Fig. 3) consists of three parts,
ference, and perhaps the flatter one would be preferred.
L
The value of this result may be trivial, however, because Com,.r,
...... number
_ o,__ tried
( number )
Auxill[ary Equipment: _correct)
it is really a test of frequency response. A minor tone- M,.i...... for
95X confidence:
control high-frequency reduction is now applied to the Name: 5 out of $
Table 1. Comparison of common-pair comparison tests. The A/B/X method achieves the best results but takes more time.
Criterion
lndepen- Validity
Test SelectionCriteria dence Repeatability Resolution Simplicity Time
ABX IsX reallyAor B? Excellent Excellent Excellent Fair Long
A/B (yes-no) Is signal clean or dirty? Poor Fair Poor Excellent Varies
Randompair Aretheysameor different? Fair Good Poor Fair Short
Random A/B Pair* Which one has distortion? Excellent Excellent Good Fair Short
· This widely used test is known as "two-alternative temporal forced choice" (2ATFC).
the hand-held control module, the logic/display mod- mode, allowing the memory's contents to be read out at
ule, and the relay module. Typically the logic/display each trial number. Both A and B relays are dropped out
module would be placed between or on one of the loud- at this time.
speakers. The control module is operated from the listen- This completes the test sequence. To begin another
ing position. The relay module would be placed near the test, the power is turned off and back on to reload the
components to be switched to enable the shortest audio memory.
cableruns. It is conceivablethat a listener mightspend days of
Fig. 4 is a block diagram of the logic/display module, listening to determine the memory contents, only to
For about 1 s after power is turned on, a pseudorandom- have a momentary power failure cause the answers to
noise generator provides a data input for the memory vanish. Battery backup to keep the memory alive for a
which is held in the write mode. The noise clock advances few hours is included to prevent this.
the decade counter through all steps, thus filling the The schematic in Fig. 5 shows that the circuit is pri-
memory with random ls and Os. marily TTL with discrete transistors performing a few
The logic/display module has two outputs for con- odd jobs.
trolling external A and B relays. Interlocked buttons on It is important that the relays or other means of
the control module operate these two relays. The X accomplishing theswitchingdo not produce extraneous
button assigns the memory's current 1 or 0 to operate hums or switching noise. The hum or switching noise
relay A for 0 and B for 1. Up and down control module can be distracting but the major problem is that the
buttons select different trial numbers each of which character of interference with the two components may
accesses a different memory location for a random value be different, thus allowing identification on an irrele-
of X. vant basis.A simplerelaysystem,Fig. 6, consistingof
When the listener has determined the A or B identity a pair of double-pole reed relays can accomplish a great
for all Xs that are to make up the test, the answer button deal of common comparison testing, however. The sim-
is pushed. A flip-flop now holds the circuit in the answer ple relay module can assign an output to two inputs for
testing amplifier/loudspeaker systems. It can also select
from two inputs for comparing microphone/preampli-
fiers, cartridge/preamplifiers, tape machines and other
sources. The requirements are that common grounding
be acceptable, and that the level is reasonably high with
d CTAVE
3 OCTAVE
a. LEVEL x% oTAVE
20 ....
50 1OO 200 500 1' '
2k '
5k 1 k'0 20k
Frequency, Hz
i
[ CONTROLS
PANEL J
____ I POWER-UP L_______ PSEUDO J
I
TIMER L_ I -i RANDO i
R/_ X
[HA_ H_L_I _ DECADE A_ _m_ORY I
l DISPLAY
BUTTON RESET ri
NTE 0CK IX J
LOGIC
B TO RELAYS
A_ ANSWER
LATCH i _t rA
334 J.AudioEng.Soc.,
Vol.30,No.5,1982
May
ENGINEERING REPORTS HIGH-RESOLUTION SUBJECTIVE TESTING
LO_,b --2'
Vce
?.1(?_ Vcc
DE_OU_IC_ R%_
_ E D
B= j. bec. co0_x DEC. COUNT
L_aCYIlK LI4pb)a
-- ---- UP ' b_ -_
L_TCq T4 L_O0
t UP
RE_ET Vcc
74L%00
T&L_G_
L¥
Vcc
--- L_,
N --
74-4g
LL ,,
C-OHTROL mODULE PCB
Fig. 5. Schematic diagram of logic/display and control modules. Output will operate 5-V or 12-V relays directly.
moderately low input and output impedances, sides, and low sides in the proper order for a silent ex-
Perhaps the most difficult switching problem is silent- change of components within 50 ms. The relay assem-
ly switching preamplifier/amplifier combinations. The bly is elastically mounted to minimize acoustical noise.
more complex relay module in Fig. 7 accomplishes this Since audio must be routed through additional con-
task. The low level section is fully balanced and shielded nectors and relay contacts, there is a possibility that the
with a minimum of stray capacitance. The relays are sound will be audibly affected. This is a frequent criti-
sealed and rated down to dry circuit switching as ap- cism of switching comparisons of audio equipment.
proximated by a phonograph cartridge output. Coil Surely there must be some way of handling signals in a
voltages are ramped up and down to avoid feedthrough manner that will not degrade them audibly. The relays
into the audio. The loudspeaker level section is rated at and connectors used here are of a very high quality.
30 amperes and introduces less than 10 mi2 of addition- They have passed stringent listening tests as well as
al resistance. Both high and low sides are switched. All measuring up to the standards of conventional chassis
relays are controlled by an eight-phase timing sequence wiring. There is no reason, however, why any type of
that disconnects and reconnects outputs, inputs, high switching mechanism cannot be controlled and used in
the A/B/X scheme.
+5V
A
E l
o I LAY
I ] LAYSI
RELAY HI i !
GROUP I
LO i i
HI 3 J
II
LO i I
HI--3 !
III
LO '_ I
I I I I I i I I ,
0 20 40 60 80 100
Fig. 7. Block diagram ofthe balanced relay module. Coil voltages are ramped and sequenced to eliminate clicks.
for example, equalization must be used. Failure to do so ble. 1% is easy with sine waves.
will undoubtedly result in a preference based on the The worst of medium-quality electronics only ap-
dominant difference--frequency response, proaches 0.5% total harmonic distortion midband when
Most of the author's testing has been done in conjunc- driven near clipping. It is not surprising, then, that no
tion with the SMWTMS group [I0]. We have not found differences were heard.
any two preamplifiers or amplifiers that have sounded At this time SMWTMS has not been able to detect
different from each other when responses were matched, differences in pickup cartridges. This testing has been
All units were of medium or higher quality and not limited by its difficulty. It requires careful equalization,
operated in clipping. Other precautions were taken, as identical stamper number pressings in perfect condition,
listed earlier. This result is in agreement with Lipshitz et and synchronized turntables.
al. [11], Baxandall [12], and others. Differences between loudspeakers are almost always
To find out what amount of distortion was audible, a audible. Equalization, however, allows one to concen-
distortion generator was developed (Fig. 8). Nicknamed trate on imaging and other fine points. Frequently accu-
the "Grunge box," it generates even-order harmonic rate equalization is attainable at only one listening loca-
components that are independent of level. The rms out- tion. A 12-bit companded digital delay line was just
put also remains nearly constant as percent total har- audible. A 16-bit linear system was not. Audibility of
monic distortion is varied. Distortion can be heard on absolute phase (polarity) was statistically confirmed but
high- or low-level music passages, and there are no difficult to hear. More tests should be done on this
level-set problems. No two real-world nonlinearities are subject.
the same, and the "Grunge box" is yet another one, but The following are some areas where additional sub-
its sound is somehow typical and useful, jective research would be useful:
The best done so far is 3%, but with carefully selected 1) Filters: Phase, bandwidth, and cutoff characteris-
material (such as a flute solo) 2% or 1% might be possi- tics
.oout
I- n.
"Grunge box"
Distortion is independent of level.
Distortion is independent of frequency.
Constant rms out as percent total harmonic distortion is varied.
All even-order harmonic components.
0 12 37
I kD 32
2 kl'! 28
5 kl'l 21
10k_Q 14
20 kD 9
50k12 4
100kl! 2.2
200 kl! 1.1
500kl! 0.44
I MD 0.22
2 MD 0.11
5 MD 0.04
10 MD 0.02
Open (0.02
Fig. 8. One channel of' calibrated distortion generator. Three percent THD is difficult to detect in music.
d.AudioEng.
Soc.,Vol.30,No.5,1982May 337
CLARK ENGINEERING
REPORTS
THE AUTHOR
began in 1979 when he founded it with five other mem-
' bersof theSMWTMS, a Detriotareaaudioclub.The
club's adoption of double-blind listening tests was found
to be a dramatic aid to component evaluation. The
ABX Company was established to refine the apparatus
and procedures and make them available to the audio
community. His first work in the audio field was for
the University of Michigan where he was a student. He
provided technical services tora number of sponsored
research projects in areas of language teaching and hear-
ing research. He spent more than eight years working
for recording studios: first at Motown Record Corpo-
ration as a project engineer, then at HDH Sound Stu-
David Clark has operated his own company, DLC dios as chief engineer.
Design, since 1977. He performs a wide range of pro- In 1974, Mr. Clark returned to school full-time and
fessional audio services including the design of analog in 1977 received a B.S. degree in electrical engineering
and digital circu!ts, building acoustics, high output loud- from Lawrence Institute of Technology. He was one of
speakers, sound-reinforcement systems, film theaters the founders of the Detroit Section of the Audio Engl-
and recording studios, neeringSocietyand servedas its chairman in 1981.Still
Mr. Clark's involvement with the ABX Company active in the section, he now holds the office of secretary.