You are on page 1of 5

Clap Tracking for a Robot Drummer

Neil MacMillan
For G. Tzanetakis, CSC 475
University of Victoria
nrqm@uvic.ca

Abstract
This document describes a project involving a
2. Clap Detection
modification to the existing robot drummer, allowing it to The robot must be able to gather input before it can
respond in real time to audio stimulus (hand clapping) respond to claps. I looked for an off-the-shelf clap sensor,
without needing external equipment. The stimulus but could not find one that was cheap and produced
detection, timing considerations, clap tracking and appropriate output. I decided to build a clap sensor from
prediction, and results are all discussed. scratch. This involved designing a circuit for the sensor
and laying it out as a circuit board, purchasing parts,
1. Introduction etching and assembling the circuit board, installing the
To satisfy the requirements for a past course at UVic, I and components, and writing the firmware.
two other engineering students, Matthew Loisel and Daniel Without the clap sensor, there would be two choices for
Partridge, designed and built a robot drummer that could clap detection. If the robot drummer did clap detection,
strike a drum in response to Musical Instrument Digital then the signal processing component would add a
Interface (MIDI) input. [1] That project was mostly significant processing and memory load to the drummer’s
successful, but once it was completed there were still many firmware. The sensor offloads this work onto a
ways that the robot could be made more useful or coprocessor that transforms the analog input into a digital
interesting. One problem with the robot is that it is limited output that is very easy for the drummer to read. Another
to MIDI input. It can accept commands from any MIDI way to do clap detection would be with a computer.
source, but it requires such a source—for example a Clearly this would undermine one of the points of this
synthesizer or computer—to be present. In other words, project, which is to make the robot drummer more
the robot is not a standalone machine. independent. In addition to that, my experiments show
The robot can be improved by adding capability to that sending MIDI commands over USB incurs a
drum in direct response to live audio input such as hand transmission delay between 10 and 30 ms. The main
clapping, instead of requiring some complex external piece problem is the delay variance, which is impossible to
of equipment to translate analog signals to MIDI predict or to correct.
commands. It would be even more interesting if the robot 2.1 Clap Sensor Circuit Board
could, when a regular rhythm is being clapped, drum along
with the rhythm—not in response to the claps, but by Mic
anticipating each clap and striking the drum at the precise TRS
moment of the clap instead of shortly after it. 2.2 kΩ Jack
This document will describe the three core tasks that I
undertook for this project:
- Designed and built a sensor for detecting a hand ATtiny84
10 kΩ
clap and outputting a digital signal. This turned
out to be the main task.
- Measured the delays between a stimulus to the + ISP
robot drummer and the drum strike. 5V 10 μF
- Modified the robot’s firmware to accept the clap
sensor’s digital signal, and to predict when the
next clap in a simple beat will occur.
LED
1.5 kΩ
Permission to make digital or hard copies of all or part of this work for
100 Ω
personal or classroom use is granted without fee provided that copies Output
are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page.
© 2009 University of Victoria Figure 1. Clap sensor circuit diagram.
The clap sensor is a simple circuit to interface an solved this problem by drilling extra via holes next to the
electret microphone with a microcontroller. [2] Figure 1 through holes, connecting the top copper to metal posts in
shows the circuit I used. The symbol at the top represents the vias, and then connecting the component leads to the
a tip-ring-sleeve (TRS) jack, which is what the microphone posts on the bottom of the board, after the components
plugs into. The sleeve is ground, the ring is V+, and the tip were installed. This problem could be solved by
is the signal line. The V+ line is connected to power purchasing the boards from a professional manufacturer.
through a resistor, which is what causes the output voltage The manufacturer would be able to tin inside the through
to vary when the electret is stimulated. In the middle of holes, which would make it very easy to solder all of the
Figure 1, the six-pin connector is for the in-system components from the bottom of the board.
programmer (ISP) plug, which is how code is loaded onto The small holes in Figure 2, on the right side of the top
the microcontroller. The ISP plug includes a 10 kΩ pull- layer, correspond from top to bottom to the capacitor, the
up resistor connected to the ISP’s RESET pin. The block LED, and the three resistors (1.5 kΩ, 100 Ω, and 2.2 kΩ).
to the right represents the ATtiny84 microcontroller that I The small holes on the microcontroller interface are vias to
chose to use (the logical connections to the ATtiny84 in connect the bottom layer traces to the top layer. The three
the diagram above to not match the physical layout of the large holes on the outer edge of the top layer correspond
chip). To the left is the DC power input, which can range from top to bottom to V+, ground, and the output. The
in voltage from 2.7 V to 5.5 V. The robot drummer large holes in the middle of the top and bottom layers are
outputs 5 V. Beside the power source is a 10 μF capacitor, for the six ISP header pins.
which protects the circuitry from transients in the power Figure 3 shows the finished clap sensor. As of this
source. The LED is simply a visual output. The LED’s writing, there is still a bug in the microphone jack that
current-limiting resistor value depends on the LED; the causes the mic signal pin to stay at around 0.3 V.
LED that I used needed a 1.5 kΩ resistor to provide its One problem with the circuit board layout is the pull-up
rated current at 3.3 V. It should be 2.5 kΩ for 5 V, but resistor that drives the ISP RESET pin high. In Figure 3
going above the rated current is safe as long as the LED one can see that I added the resistor after creating the
does not spend much time turned on. At the bottom of circuit board. I have not found a way to integrate a full-
Figure 1 is the digital output that the robot drummer reads, sized resistor naturally into the board. A surface-mount
protected by a small current-limiting resistance. resistor, which is much smaller than the ½ Watt axial
resistor that I used, would fit easily next to the ISP port but
it would make the sensor more difficult to assemble.

Figure 2. Clap sensor circuit board layout.


In order to route traces properly, I had to put part of the
circuit on the bottom of the board. Figure 2 shows the final
circuit board layout for the top layer (left) and the bottom
layer (right).
I used a toner transfer method to print the layout to a
copper-clad PCB, and etched it with sodium persulfate
(one can find many printing and etching instructions on the
Internet). The top layer in Figure 2 is mirrored
horizontally from the actual circuit to compensate for the
toner transfer. The wide 1 mm traces are power lines (V+
and ground), and the narrow 0.5 mm traces are signal lines.
The circuit board’s through-holes were difficult to Figure 3. Finished clap sensor.
handle. I drilled them out with two different drill bits, a
2.2 Board Cost
#60 (1.016 mm) bit for the large holes and a #69 (0.742
mm) bit for the small holes. I carved out the TRS jack If the board went into large-scale production, the following
pins—which are very wide and thin—with the small drill costs would be involved (prices are as of December 11
bit. 2009, per unit at the 1000-unit price break, from
It was impossible to solder the top-layer traces for holes http://www.digikey.ca):
located underneath the TRS jack and the ISP header. I
Table 1. Sensor component prices. The microphone output signal is idle at about 95% of
Component Part Number Price (CAD$) the input voltage. If the microphone is being powered by 5
CPU ATTINY84-20SSU 2.03730 V, the output will be idle at around 4.75 V. When the
TRS jack SJ1-3533N 0.34158 microphone reads sound, the output will oscillate around
10kΩ Res. CFR-50JB-10K 0.01532 the idle level, peaking at the 5 V maximum and dipping to
2.2kΩ Res. CFR-50JB-2K2 0.01532 about 3.5 V. The sensor’s CPU can read this signal using
1.5kΩ Res. CFR-50JB-1K5 0.01532 its analog-digital converter (ADC) module.
Figure 4 shows the output of the microphone when it
100Ω Res. CFR-50JB-100R 0.01532
detects a low pitch clap (e.g. a deep, palm-to-palm clap).
Capacitor ECE-A1CKS100 0.03739
The horizontal scale is time, and the vertical scale is signal
LED* LTW-420D7 0.23747
voltage. By visual inspection, one can see that the signal
1-Row Hdr 4-102972-0 0.09853** has a period of a little under 2 ms, which corresponds to a
2-Row Hdr 4-103783-0 0.14689** frequency of a little more than 500 Hz. A spectrum
* This LED is different from the one I used; it will use a 100Ω current-
limiting resistor, and it will be cheaper than the low-current LED I used. analysis performed using Audacity confirms that most of
** The headers are packaged in lengths of 40 rows. The price listed is the the information in a low pitch clap is contained between
price of the three positions used for the circuit board. 500 and 600 Hz.
The waveform for a high pitch clap (e.g. a sharp,
I used a free software package from ExpressPCB to design finger-to-palm clap) is shown in Figure 5. It has a shorter
the circuit board. The company that makes ExpressPCB spike in energy than the low pitch clap, and is at a higher
will manufacture boards designed with their software. The frequency, but otherwise it is similar to the low pitch clap.
software’s price estimator gives an estimate of A spectrum analysis shows that most of the clap’s
US$1638.39 to manufacture 1000 circuit boards, which as information is stored between 3 kHz and 6 kHz.
of this writing works out to CAD$1.74 per board. That
brings the total material cost to $4.70 for a single sensor in
a batch of 1000 units, not including the cost of the
microphone. [3]
2.3 Clap Detection
2.3.1 Microphone Ouptut
For my experiments I used a PC electret microphone.
Such microphones have amplifiers built in, so that they
produce a usable signal without having external
amplification.1 [2] Small, unamplified microphones will
not work with the clap sensor. Microphones other than the
one I used might amplify differently, which would change
the values that I give in this document.

Figure 5. High pitch clap waveform


2.3.2 Signal Analysis
As shown by Repp [4], the frequency spectra of hand claps
are not very useful for clap detection; the spectrum
features are not distinct. Fortunately, as Figure 4 and
Figure 5 show, the time-domain characteristics of clap
sounds are quite distinct. Furthermore, it is much easier
for a general microcontroller like the ATtiny84 to analyze
the time domain characteristics than for it to analyze the
frequency domain characteristics. The frequency domain
would have to be calculated with a Fourier transform,
Figure 4. Low pitch clap waveform. which is impractical to do in firmware because it takes a
considerable amount of calculation time and memory.
The clap waveforms suggest a heuristic for detecting a
1
Mics with amplifiers typically have three conductors (power, clap: look for a ―strong enough‖ spike in signal energy that
ground, and signal). Mics without amplifiers have two lasts for between 1 and 4 ms. If the signal is too weak,
conductors (ground and signal) or four conductors when then it is not a clap. Likewise, if the spike in the signal
integrated with earphones (ground, signal, left earphone, right takes too long to settle or settles too quickly, it’s not a clap.
earphone). Mics made for the iPod do not have amplifiers.
The obvious flaw with this approach is that it is impossible tick produces a sampling rate of 20 kHz, which is fast
to distinguish between claps and sounds that have similar enough to read any clap signal (recall that a high pitch clap
time-domain properties. I think it’s an acceptable risk has most of its information between 3 kHz and 6 kHz).
though. The wave might also get subsumed by a stronger The pulse should span at least one tick, to guarantee that it
signal such as a drumbeat, and that’s a greater risk. It will have a minimum width; otherwise there is a danger
doesn’t affect this project though, since there won’t be a that the pulse will go high immediately before a tick, and
drum striking at the same time as a clap. will be brought low a few cycles later when the system
The usual method for calculating a signal’s energy is to clock ticks.
remove the signal’s direct current (DC) offset and take In the future, it might be useful to add some more
root-mean-square (RMS) values over sample windows, but sophisticated analysis to the sensor, to allow it to detect the
that technique is not feasible for this project because strength of claps. If that is desired, I propose an output in
different approaches to the square root function take too which the pulse width is proportional to the clap strength.
much memory, or too much calculation time, or both. The For example, a weak clap might output a 5 µs pulse, a
square root function is an interesting topic, but a discussion medium clap might correspond to 50 µs, and a strong clap
of it is beyond the scope of this report.2 [5] might correspond to 100 µs. In that case, it would be
The sensor takes the same approach to the energy preferable to use a busy wait delay instead of the system
problem as the RMS method uses, but it takes the absolute timer, to avoid timer overhead (a 5 µs tick requires a timer
average of the sample windows instead of the RMS value. interrupt every 40 CPU cycles on an 8 MHz processor).
The absolute average runs quickly and does not need much
memory. The firmware program removes the 4.7 V DC 3. Latency Compensation
offset so that the sample values are centred about 0, and One part of this project was to measure the robot
averages the absolute values of all the samples over 1 ms drummer’s mechanical delays; that is, the time it takes for
windows. the drummer to move the drum stick from its rest position
The absolute averages of the windows are stored in a to the drum head. This is useful because in order to predict
queue, and when the sensor has enqueued 8 averages it a clap, the drummer needs to start its motion before the
scans through them to find one that is above a certain clap occurs. With information about how long it takes for
threshold. If there is such a window, and if there are at the drum stick to move, the drummer can know when to
least four windows remaining in the queue, and if the start its strikes.
window three samples later is below the threshold, then I measured the timing using a drum pad that actuates a
that is a clap. In other words, the signal must have a high push-to-make (PTM) switch when it is struck. When the
average followed by anything, followed by a near-zero test program on the robot drummer starts, it records a
average in order to decide that it has detected a clap. This timestamp and begins a drum strike. When the drum stick
has proven quite reliable, although very long and strong closes the PTM switch, the drummer takes another
signals (such as blowing into the microphone) will not be timestamp. It calculates the difference between the two
masked properly. timestamps (in milliseconds) and prints it out to the serial
port.
2.3.3 Output
When the firmware detects a clap, it outputs a 50 to 100 µs
high pulse on its output pin. It also turns on the LED for
100 ms to provide a visual output. The sensor does not
process samples during the output phases of the control
loop, which means that if two claps are less than 100.1 ms
apart the second clap will be missed. This will be
desirable in most situations; the delay will mask extra
stimuli (perhaps including, the drum strike). The fastest
that I can clap twice is slower than 100 ms, and the robot
drummer cannot strike twice with very much force that
quickly, so there is little risk.
The output pulse has an arbitrary width that is
guaranteed to be at least 50 µs and at most 100 µs. This is
because the system clock that the sensor uses ticks at 50 µs
intervals. I chose this interval because sampling once per
Figure 6. Drum strike times at different velocities.

2
The drummer’s mechanical delay depends on two
I found that the fastest and most memory-efficient option for factors: the strike velocity and the strike displacement. For
integer square root is by far a nearest-neighbour binary search my purposes, the strike velocity is always set to the
on a constant lookup table stored in Flash ROM.
maximum of 0x7F (the strike velocity value corresponds to for calculating the histogram; further investigation is
the velocity argument in a MIDI ―note on‖ command). required to determine how the available memory limits the
The displacement is the physical distance that the tip of the histogram’s effectiveness.
drum stick travels to go from its rest position to the drum Incorporating beat strength complicates the problem
head. A larger displacement causes a larger mechanical even more. For example, two sections of a beat pattern
delay. that have the same period but different strengths should be
It’s difficult to predict the displacement accurately, considered different, and the histogram approach cannot
because it depends on where the user positions the handle that elegantly. Another way to look at the data is to
drummer and the drum. It might be possible to put a interleave strength measurements with times. For
distance sensor on the drum stick’s tip, but that would not example, weak and strong alternating beats at a frequency
be durable or extensible. I chose to make a reasonable of 2 Hz would look like this:
guess at the displacement and came up with 25 mm, which
corresponds to a delay of around 40 ms. 127 500 110 500 127 500 110 …
4. Clap Tracking In the above sequence, the strong (127) and weak (110)
The final part of the project was to program the robot beats are interleaved with 500 ms beat periods. This does
drummer to strike in time with a clapped rhythm. The not lend itself to a beat histogram, but a more general
scenario that I finished in time for this report was to have numerical pattern matching approach would be able to
the drummer respond to a simple, regular rhythm. The predict, for example, a 500 ms gap preceded by a strong
drummer calculates the time between the first four claps, beat and succeeded by a weak beat.
stores the three times, and averages them to produce a final
beat period. It adds the beat period to the time of the last 5. Conclusion
clap to predict the time of the next clap. Then it subtracts Augmenting the robot drummer with the ability to track
40 ms to compensate for the drummer’s mechanical delay. claps predictively makes the system more interesting and
That way, the drum stick will strike the drum within a few useful. A helpful component is a clap-sensing coprocessor
milliseconds of the next predicted strike time (as long as that extracts beat information from hand claps. By
the drum stick is positioned 25 mm above the drum head). analysing a simple beat and predicting its future, the robot
The drummer could handle more complex beat patterns can now strike the drum in time with the beat while
with a more intelligent algorithm. One simple option is to compensating for mechanical latency. I did not get a
allow the user to pre-configure the length of the pattern. chance to investigate this problem fully, and further work
This is an extension to my implementation. The drummer can be done in the areas of clap detection and beat
would count N time differences between claps, and then tracking.
start to output at the recorded times when it reaches clap
N+1. Perhaps it could average three patterns of length N 6. References
before striking the drum.3 [1] N. MacMillan, M. Loisel, D. Partridge. ―Final Design Report
If the clap sensor microphone is positioned so that it for a Robot Drummer,‖ [Web site] August, 2009, Available:
can ignore the drum strike and detect only claps, then the http://www.scribd.com/doc/19161969
robot could continually average the clap times it detects. [2] T. Engdahl. ―Powering Microphones,‖ [Web site] 2000,
This would allow the user to change the pattern Available:
dynamically, but it would be confusing to the user. During http://www.epanorama.net/circuits/microphone_powering.ht
my tests, I found it difficult to carry on my own clapping ml
beat when the drummer was playing something slightly [3] ExpressPCB. ―Free PCB layout software – Low cost circuit
boards – Top quality PCB manufacturing,‖ [Web site] 2009,
different. Available: http://expresspcb.com/
Ideally, a beat tracking or beat histogram algorithm [4] B.H. Repp. ―The sound of two hands clapping: An
would be used. I did not have time to investigate the exploratory study,‖ J. Acoust. Soc. Am. Volume 81, Issue 4,
options deeply. I think especially the beat histogram could pp. 1100-1109, April 1987.
be a feasible approach, because it requires less [5] J.W. Crenshaw. ―Integer Square Roots,‖ [Web site] February
computation than artificial intelligence mechanisms. 1998, Available: http://www.embedded.com/98/9802fe2.htm
There is around 1 KB of writable memory available on the
robot’s microcontroller, which places a significant limit on
the effectiveness of any beat tracking algorithms. The
histogram method would need to strike a balance between
histogram size and the number of samples being tracked

3
The algorithm I implemented is a special case of this approach,
with N=1.