Clap Tracking for a Robot Drummer

Neil MacMillan
For G. Tzanetakis, CSC 475 University of Victoria nrqm@uvic.ca

Abstract
This document describes a project involving a modification to the existing robot drummer, allowing it to respond in real time to audio stimulus (hand clapping) without needing external equipment. The stimulus detection, timing considerations, clap tracking and prediction, and results are all discussed.

2. Clap Detection
The robot must be able to gather input before it can respond to claps. I looked for an off-the-shelf clap sensor, but could not find one that was cheap and produced appropriate output. I decided to build a clap sensor from scratch. This involved designing a circuit for the sensor and laying it out as a circuit board, purchasing parts, etching and assembling the circuit board, installing the components, and writing the firmware. Without the clap sensor, there would be two choices for clap detection. If the robot drummer did clap detection, then the signal processing component would add a significant processing and memory load to the drummer’s firmware. The sensor offloads this work onto a coprocessor that transforms the analog input into a digital output that is very easy for the drummer to read. Another way to do clap detection would be with a computer. Clearly this would undermine one of the points of this project, which is to make the robot drummer more independent. In addition to that, my experiments show that sending MIDI commands over USB incurs a transmission delay between 10 and 30 ms. The main problem is the delay variance, which is impossible to predict or to correct. 2.1 Clap Sensor Circuit Board
Mic TRS Jack

1. Introduction
To satisfy the requirements for a past course at UVic, I and two other engineering students, Matthew Loisel and Daniel Partridge, designed and built a robot drummer that could strike a drum in response to Musical Instrument Digital Interface (MIDI) input. [1] That project was mostly successful, but once it was completed there were still many ways that the robot could be made more useful or interesting. One problem with the robot is that it is limited to MIDI input. It can accept commands from any MIDI source, but it requires such a source—for example a synthesizer or computer—to be present. In other words, the robot is not a standalone machine. The robot can be improved by adding capability to drum in direct response to live audio input such as hand clapping, instead of requiring some complex external piece of equipment to translate analog signals to MIDI commands. It would be even more interesting if the robot could, when a regular rhythm is being clapped, drum along with the rhythm—not in response to the claps, but by anticipating each clap and striking the drum at the precise moment of the clap instead of shortly after it. This document will describe the three core tasks that I undertook for this project: - Designed and built a sensor for detecting a hand clap and outputting a digital signal. This turned out to be the main task. - Measured the delays between a stimulus to the robot drummer and the drum strike. - Modified the robot’s firmware to accept the clap sensor’s digital signal, and to predict when the next clap in a simple beat will occur.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. © 2009 University of Victoria

2.2 kΩ

10 kΩ

ATtiny84

5V

10 μF

+

ISP

LED 1.5 kΩ Output 100 Ω

Figure 1. Clap sensor circuit diagram.

The clap sensor is a simple circuit to interface an electret microphone with a microcontroller. [2] Figure 1 shows the circuit I used. The symbol at the top represents a tip-ring-sleeve (TRS) jack, which is what the microphone plugs into. The sleeve is ground, the ring is V+, and the tip is the signal line. The V+ line is connected to power through a resistor, which is what causes the output voltage to vary when the electret is stimulated. In the middle of Figure 1, the six-pin connector is for the in-system programmer (ISP) plug, which is how code is loaded onto the microcontroller. The ISP plug includes a 10 kΩ pullup resistor connected to the ISP’s RESET pin. The block to the right represents the ATtiny84 microcontroller that I chose to use (the logical connections to the ATtiny84 in the diagram above to not match the physical layout of the chip). To the left is the DC power input, which can range in voltage from 2.7 V to 5.5 V. The robot drummer outputs 5 V. Beside the power source is a 10 μF capacitor, which protects the circuitry from transients in the power source. The LED is simply a visual output. The LED’s current-limiting resistor value depends on the LED; the LED that I used needed a 1.5 kΩ resistor to provide its rated current at 3.3 V. It should be 2.5 kΩ for 5 V, but going above the rated current is safe as long as the LED does not spend much time turned on. At the bottom of Figure 1 is the digital output that the robot drummer reads, protected by a small current-limiting resistance.

solved this problem by drilling extra via holes next to the through holes, connecting the top copper to metal posts in the vias, and then connecting the component leads to the posts on the bottom of the board, after the components were installed. This problem could be solved by purchasing the boards from a professional manufacturer. The manufacturer would be able to tin inside the through holes, which would make it very easy to solder all of the components from the bottom of the board. The small holes in Figure 2, on the right side of the top layer, correspond from top to bottom to the capacitor, the LED, and the three resistors (1.5 kΩ, 100 Ω, and 2.2 kΩ). The small holes on the microcontroller interface are vias to connect the bottom layer traces to the top layer. The three large holes on the outer edge of the top layer correspond from top to bottom to V+, ground, and the output. The large holes in the middle of the top and bottom layers are for the six ISP header pins. Figure 3 shows the finished clap sensor. As of this writing, there is still a bug in the microphone jack that causes the mic signal pin to stay at around 0.3 V. One problem with the circuit board layout is the pull-up resistor that drives the ISP RESET pin high. In Figure 3 one can see that I added the resistor after creating the circuit board. I have not found a way to integrate a fullsized resistor naturally into the board. A surface-mount resistor, which is much smaller than the ½ Watt axial resistor that I used, would fit easily next to the ISP port but it would make the sensor more difficult to assemble.

Figure 2. Clap sensor circuit board layout.

In order to route traces properly, I had to put part of the circuit on the bottom of the board. Figure 2 shows the final circuit board layout for the top layer (left) and the bottom layer (right). I used a toner transfer method to print the layout to a copper-clad PCB, and etched it with sodium persulfate (one can find many printing and etching instructions on the Internet). The top layer in Figure 2 is mirrored horizontally from the actual circuit to compensate for the toner transfer. The wide 1 mm traces are power lines (V+ and ground), and the narrow 0.5 mm traces are signal lines. The circuit board’s through-holes were difficult to handle. I drilled them out with two different drill bits, a #60 (1.016 mm) bit for the large holes and a #69 (0.742 mm) bit for the small holes. I carved out the TRS jack pins—which are very wide and thin—with the small drill bit. It was impossible to solder the top-layer traces for holes located underneath the TRS jack and the ISP header. I

Figure 3. Finished clap sensor.

2.2 Board Cost If the board went into large-scale production, the following costs would be involved (prices are as of December 11 2009, per unit at the 1000-unit price break, from http://www.digikey.ca):

Table 1. Sensor component prices.

Component CPU TRS jack 10kΩ Res. 2.2kΩ Res. 1.5kΩ Res. 100Ω Res. Capacitor LED* 1-Row Hdr 2-Row Hdr

Part Number ATTINY84-20SSU SJ1-3533N CFR-50JB-10K CFR-50JB-2K2 CFR-50JB-1K5 CFR-50JB-100R ECE-A1CKS100 LTW-420D7 4-102972-0 4-103783-0

Price (CAD$) 2.03730 0.34158 0.01532 0.01532 0.01532 0.01532 0.03739 0.23747 0.09853** 0.14689**

* This LED is different from the one I used; it will use a 100Ω currentlimiting resistor, and it will be cheaper than the low-current LED I used. ** The headers are packaged in lengths of 40 rows. The price listed is the price of the three positions used for the circuit board.

I used a free software package from ExpressPCB to design the circuit board. The company that makes ExpressPCB will manufacture boards designed with their software. The software’s price estimator gives an estimate of US$1638.39 to manufacture 1000 circuit boards, which as of this writing works out to CAD$1.74 per board. That brings the total material cost to $4.70 for a single sensor in a batch of 1000 units, not including the cost of the microphone. [3] 2.3 Clap Detection 2.3.1 Microphone Ouptut For my experiments I used a PC electret microphone. Such microphones have amplifiers built in, so that they produce a usable signal without having external amplification.1 [2] Small, unamplified microphones will not work with the clap sensor. Microphones other than the one I used might amplify differently, which would change the values that I give in this document.

The microphone output signal is idle at about 95% of the input voltage. If the microphone is being powered by 5 V, the output will be idle at around 4.75 V. When the microphone reads sound, the output will oscillate around the idle level, peaking at the 5 V maximum and dipping to about 3.5 V. The sensor’s CPU can read this signal using its analog-digital converter (ADC) module. Figure 4 shows the output of the microphone when it detects a low pitch clap (e.g. a deep, palm-to-palm clap). The horizontal scale is time, and the vertical scale is signal voltage. By visual inspection, one can see that the signal has a period of a little under 2 ms, which corresponds to a frequency of a little more than 500 Hz. A spectrum analysis performed using Audacity confirms that most of the information in a low pitch clap is contained between 500 and 600 Hz. The waveform for a high pitch clap (e.g. a sharp, finger-to-palm clap) is shown in Figure 5. It has a shorter spike in energy than the low pitch clap, and is at a higher frequency, but otherwise it is similar to the low pitch clap. A spectrum analysis shows that most of the clap’s information is stored between 3 kHz and 6 kHz.

Figure 5. High pitch clap waveform

Figure 4. Low pitch clap waveform.
1

Mics with amplifiers typically have three conductors (power, ground, and signal). Mics without amplifiers have two conductors (ground and signal) or four conductors when integrated with earphones (ground, signal, left earphone, right earphone). Mics made for the iPod do not have amplifiers.

2.3.2 Signal Analysis As shown by Repp [4], the frequency spectra of hand claps are not very useful for clap detection; the spectrum features are not distinct. Fortunately, as Figure 4 and Figure 5 show, the time-domain characteristics of clap sounds are quite distinct. Furthermore, it is much easier for a general microcontroller like the ATtiny84 to analyze the time domain characteristics than for it to analyze the frequency domain characteristics. The frequency domain would have to be calculated with a Fourier transform, which is impractical to do in firmware because it takes a considerable amount of calculation time and memory. The clap waveforms suggest a heuristic for detecting a clap: look for a ―strong enough‖ spike in signal energy that lasts for between 1 and 4 ms. If the signal is too weak, then it is not a clap. Likewise, if the spike in the signal takes too long to settle or settles too quickly, it’s not a clap.

The obvious flaw with this approach is that it is impossible to distinguish between claps and sounds that have similar time-domain properties. I think it’s an acceptable risk though. The wave might also get subsumed by a stronger signal such as a drumbeat, and that’s a greater risk. It doesn’t affect this project though, since there won’t be a drum striking at the same time as a clap. The usual method for calculating a signal’s energy is to remove the signal’s direct current (DC) offset and take root-mean-square (RMS) values over sample windows, but that technique is not feasible for this project because different approaches to the square root function take too much memory, or too much calculation time, or both. The square root function is an interesting topic, but a discussion of it is beyond the scope of this report.2 [5] The sensor takes the same approach to the energy problem as the RMS method uses, but it takes the absolute average of the sample windows instead of the RMS value. The absolute average runs quickly and does not need much memory. The firmware program removes the 4.7 V DC offset so that the sample values are centred about 0, and averages the absolute values of all the samples over 1 ms windows. The absolute averages of the windows are stored in a queue, and when the sensor has enqueued 8 averages it scans through them to find one that is above a certain threshold. If there is such a window, and if there are at least four windows remaining in the queue, and if the window three samples later is below the threshold, then that is a clap. In other words, the signal must have a high average followed by anything, followed by a near-zero average in order to decide that it has detected a clap. This has proven quite reliable, although very long and strong signals (such as blowing into the microphone) will not be masked properly. 2.3.3 Output When the firmware detects a clap, it outputs a 50 to 100 µs high pulse on its output pin. It also turns on the LED for 100 ms to provide a visual output. The sensor does not process samples during the output phases of the control loop, which means that if two claps are less than 100.1 ms apart the second clap will be missed. This will be desirable in most situations; the delay will mask extra stimuli (perhaps including, the drum strike). The fastest that I can clap twice is slower than 100 ms, and the robot drummer cannot strike twice with very much force that quickly, so there is little risk. The output pulse has an arbitrary width that is guaranteed to be at least 50 µs and at most 100 µs. This is because the system clock that the sensor uses ticks at 50 µs intervals. I chose this interval because sampling once per
2

tick produces a sampling rate of 20 kHz, which is fast enough to read any clap signal (recall that a high pitch clap has most of its information between 3 kHz and 6 kHz). The pulse should span at least one tick, to guarantee that it will have a minimum width; otherwise there is a danger that the pulse will go high immediately before a tick, and will be brought low a few cycles later when the system clock ticks. In the future, it might be useful to add some more sophisticated analysis to the sensor, to allow it to detect the strength of claps. If that is desired, I propose an output in which the pulse width is proportional to the clap strength. For example, a weak clap might output a 5 µs pulse, a medium clap might correspond to 50 µs, and a strong clap might correspond to 100 µs. In that case, it would be preferable to use a busy wait delay instead of the system timer, to avoid timer overhead (a 5 µs tick requires a timer interrupt every 40 CPU cycles on an 8 MHz processor).

3. Latency Compensation
One part of this project was to measure the robot drummer’s mechanical delays; that is, the time it takes for the drummer to move the drum stick from its rest position to the drum head. This is useful because in order to predict a clap, the drummer needs to start its motion before the clap occurs. With information about how long it takes for the drum stick to move, the drummer can know when to start its strikes. I measured the timing using a drum pad that actuates a push-to-make (PTM) switch when it is struck. When the test program on the robot drummer starts, it records a timestamp and begins a drum strike. When the drum stick closes the PTM switch, the drummer takes another timestamp. It calculates the difference between the two timestamps (in milliseconds) and prints it out to the serial port.

Figure 6. Drum strike times at different velocities. I found that the fastest and most memory-efficient option for integer square root is by far a nearest-neighbour binary search on a constant lookup table stored in Flash ROM.

The drummer’s mechanical delay depends on two factors: the strike velocity and the strike displacement. For my purposes, the strike velocity is always set to the

maximum of 0x7F (the strike velocity value corresponds to the velocity argument in a MIDI ―note on‖ command). The displacement is the physical distance that the tip of the drum stick travels to go from its rest position to the drum head. A larger displacement causes a larger mechanical delay. It’s difficult to predict the displacement accurately, because it depends on where the user positions the drummer and the drum. It might be possible to put a distance sensor on the drum stick’s tip, but that would not be durable or extensible. I chose to make a reasonable guess at the displacement and came up with 25 mm, which corresponds to a delay of around 40 ms.

for calculating the histogram; further investigation is required to determine how the available memory limits the histogram’s effectiveness. Incorporating beat strength complicates the problem even more. For example, two sections of a beat pattern that have the same period but different strengths should be considered different, and the histogram approach cannot handle that elegantly. Another way to look at the data is to interleave strength measurements with times. For example, weak and strong alternating beats at a frequency of 2 Hz would look like this: 127 500 110 500 127 500 110 … In the above sequence, the strong (127) and weak (110) beats are interleaved with 500 ms beat periods. This does not lend itself to a beat histogram, but a more general numerical pattern matching approach would be able to predict, for example, a 500 ms gap preceded by a strong beat and succeeded by a weak beat.

4. Clap Tracking
The final part of the project was to program the robot drummer to strike in time with a clapped rhythm. The scenario that I finished in time for this report was to have the drummer respond to a simple, regular rhythm. The drummer calculates the time between the first four claps, stores the three times, and averages them to produce a final beat period. It adds the beat period to the time of the last clap to predict the time of the next clap. Then it subtracts 40 ms to compensate for the drummer’s mechanical delay. That way, the drum stick will strike the drum within a few milliseconds of the next predicted strike time (as long as the drum stick is positioned 25 mm above the drum head). The drummer could handle more complex beat patterns with a more intelligent algorithm. One simple option is to allow the user to pre-configure the length of the pattern. This is an extension to my implementation. The drummer would count N time differences between claps, and then start to output at the recorded times when it reaches clap N+1. Perhaps it could average three patterns of length N before striking the drum.3 If the clap sensor microphone is positioned so that it can ignore the drum strike and detect only claps, then the robot could continually average the clap times it detects. This would allow the user to change the pattern dynamically, but it would be confusing to the user. During my tests, I found it difficult to carry on my own clapping beat when the drummer was playing something slightly different. Ideally, a beat tracking or beat histogram algorithm would be used. I did not have time to investigate the options deeply. I think especially the beat histogram could be a feasible approach, because it requires less computation than artificial intelligence mechanisms. There is around 1 KB of writable memory available on the robot’s microcontroller, which places a significant limit on the effectiveness of any beat tracking algorithms. The histogram method would need to strike a balance between histogram size and the number of samples being tracked
3

5. Conclusion
Augmenting the robot drummer with the ability to track claps predictively makes the system more interesting and useful. A helpful component is a clap-sensing coprocessor that extracts beat information from hand claps. By analysing a simple beat and predicting its future, the robot can now strike the drum in time with the beat while compensating for mechanical latency. I did not get a chance to investigate this problem fully, and further work can be done in the areas of clap detection and beat tracking.

6. References
[1] N. MacMillan, M. Loisel, D. Partridge. ―Final Design Report
for a Robot Drummer,‖ [Web site] August, 2009, Available: http://www.scribd.com/doc/19161969 [2] T. Engdahl. ―Powering Microphones,‖ [Web site] 2000, Available: http://www.epanorama.net/circuits/microphone_powering.ht ml [3] ExpressPCB. ―Free PCB layout software – Low cost circuit boards – Top quality PCB manufacturing,‖ [Web site] 2009, Available: http://expresspcb.com/ [4] B.H. Repp. ―The sound of two hands clapping: An exploratory study,‖ J. Acoust. Soc. Am. Volume 81, Issue 4, pp. 1100-1109, April 1987. [5] J.W. Crenshaw. ―Integer Square Roots,‖ [Web site] February 1998, Available: http://www.embedded.com/98/9802fe2.htm

The algorithm I implemented is a special case of this approach, with N=1.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.