Professional Documents
Culture Documents
Ken Laberteaux
ken@laberteaux.org
Original: January 2020
Current: 9 February 2020
This is an article describing Bluetooth's low complexity subband codec. To be clear, this is the same
codec that is required for all Bluetooth Advanced Audio Distribution Profile (a2dp) systems, and is
usually simply written as SBC. However, for this article, I will be calling this codec L
C-SBC to
distinguish it, as it is just one example of a subband codec.
Also, it is convenient to first consider just a mono audio channel in the following. We will treat the
two-channel stereo case later.
1
Figure 1-Pass band Analysis Filterbank
Next is a decimation process. Decimation is shown in Fig 2 with an arrow down and a value. In this
case, our value is n
rof_subbands. This means to only keep only one of every n
rof_subbands of
samples, and throw the rest away.
2
Figure 2-Band pass filterbank plus decimation, resulting in the sequences Di(n)
[If the rest of this paragraph is confusing, skip it and meet us at the start of the next paragraph.]
Each bandpass filter is implemented in a computationally efficient method, using one prototype
digital (and relatively small-in-number-of-coefficients) low-pass filter, and modulating it to the center
of the passband by multiplying the prototype low-pass filter coefficients with a cosine term. Such a
low-complexity passband filter will not have sharp frequency transitions, allowing some aliasing
(frequencies from the adjacent side-bands). But, as a feature of this family of codecs, the filters are
designed to cancel (in theory) any aliasing at the synthesis/reconstruction filter. The outputs of
3
these bandpass filters are then decimated by nrof_subbands, i.e. if n
rof_subbands=4, what is sent to
codec receiver is only every fourth output of the bandpass filter, the other three samples are thrown
away (or, as in this case, not even calculated). This process is done for nrof_blocks times, resulting
in the sequences of Di (n). Nrof_blocks is the amount of data that will be processed and coded
together, where i denotes the subband, i=0, 1, …, (nrof_subbands-1). This is shown in Figure 2. There
are plenty of technical details for this polyphase filterbank and decimation process. If interested,
please see Fig 12.5 of A
DVANCED AUDIO DISTRIBUTION PROFILE SPECIFICATION (Version 13),
https://www.bluetooth.org/docman/handlers/DownloadDoc.ashx?doc_id=260859&vId=290074.
Other helpful references for this filterbank plus decimation topic can be found in F. de Bont, M.
Groenewegen and W. Oomen, "A High Quality Audio-Coding System at 128 kb/s", 98th AES Convention,
Febr. 25-28, 1995 (note that this is not exactly LC-SBC, but the immediate predecessor of LC-SBC)
and Chapter 2 of Considering Bluetooth’s Subband Codec (SBC) for Wideband Speech and Audio on
the Internet, by Christian Hoene, Mansoor Hyder October 2009, and the references therein.
Following the subband filtering and decimation described above (see Figure 2), we have Di(n), w
hich
are nrof_subbands time-domain sequences, each nrof_blocks long, each (mostly) containing only
the frequencies of its decimated (stretched) passband. Let us name D
i(n), i=0, 1, …,
(nrof_subbands-1), for any given time-slice n, a b
lock. Because of the decimation, each block
represents a time-slice of nrof_subbands original audio samples. At this point, no lossy data
compression has occurred. D
i(n) p
erfectly contains all of the original information of the original
audio samples. If we were to reverse the process, i.e. interpolate each passband signal, multiply
each by its respective s
cale_factor, put it through the synthesis filterbank, and summing the
passband samples, we should get (theoretically) the original audio signal of length (nrof_subbands *
nrof_blocks) samples. This is the unit of data that will be coded (ideally reducing its bandwidth),
called a f rame.
4
sizes trying to match the sensitivity of the human ear to different frequencies. Frequency maskings
are then computed based on the signal, and used to reduce precision of parts of the audio signal
according to masking. Subbands where the signal is more masked will be encoded with less
accuracy, the accuracy eventually going down to zero (no signal encoded at all) in extreme cases.
With less audio information to encode, mp3 can achieve higher compression rates. However, no
such per-subband psychoacoustic masking is done for L
C-SBC, as this would impair the
low-complexity (and battery preserving) and low-delay design goals. Even if someone wanted to
compute masking on subbands within LC-SBC, that would be pointless as there are simply not
enough subbands to even roughly match the frequency selectivity of the human ear. As such, there
is not any intuitive problem with using such few subbands.
To reduce the bandwidth, we scale and quantize these time-domain samples Di (n) a
t fewer
quantization levels than would be required to perfectly capture the original signal. How this is done
is what makes L
C-SBC both lower-complexity and lower-delay as compared to more complex
codecs. Each subband is scaled to a number between [-1,1] by dividing each audio sample by a
scale-factor(i). The scale-factor(i) is chosen by finding, for each subband i, the largest absolute value
of the subband signal Di (n), and rounding it to the next power of two.
LC-SBC does not change the number of samples. Each scaled subband signal D
i(n)/subband(i) t hus
far is n
rof_blocks long, and will remain as such. It is just that each sample will be represented with
less precision. Specifically, each subband is assigned a number of bits, such that the sum of the
total number of bits across the passbands remains constant. This constant is called the bitpool.
Intuitively, the b
itpool is allocated to the subbands such that the largest number of bits go to the
"most important" subband. The next "most important" subband would be assigned the same, or
fewer, number of bits, and so on. While it is interesting to consider how one might choose to do this
evaluation of "most important", and thus, the bit assignment, the reference L
C-SBC encoder defines
two fairly simplistic methods. The first, called SNR, and simply assigns the bits in the order of each
subband's s
cale-factor(i), specifically each subband i gets log_2(scale-factor(i))-1 bits. The second,
called LOUDNESS, is similar to SNR, but is biased to give more bits to the lower frequency subbands,
at the expense of fewer bits to the higher frequency subbands. Some call this LOUDNESS method a
"simple psychoacoustic model", and I guess it is true that the bottom subband (up to 2756.25 Hz for
nrof_subbands=8, and double that for n
rof_subbands=4) is often more important to listening than the
highest-frequency subband. It is also true that the noise associated from the quantization process
5
is (theoretically) confined to its own subband, thus having it pushed to harder to hear frequencies is
probably less objectionable to the listener.
Once each subband gets its bit allocation, Di (n)/subband(i), representing a value between [-1,1], is
quantized with that number of bits, e.g. if one subband is allocated 4 bits, all nrof_blocks samples of
that subband are quantized into one of 2^4=16 values, representing a value between [-1,1]. This is
done for all subbands, which results in ( nrof_subbands*nrof_blocks) values, with each block
represented by bitpool bits. Then, those quantized values, along with the scale_factor(i) for each
subband, along with additional header information (and potentially some zero padding bits to make
the frame "the right size") are put into an A2DP bluetooth packet and sent.
On the receiver side, the values are unpacked, each subband's values are multiplied by its
scale_value(i), its values are then interpolated (the reversal of decimation, i.e. space the received
samples in a timeline, and add (nrof_subbands-1) zeros between each value). These interpolated
time signals are put into another filterbank with nrof_subband parallel filters. The synthesis
filter-bank has specific values to undo what the analysis filters did on the encoder side. They are
summed and the output audio is created. Details can be found on Fig 12.3 of A
DVANCED AUDIO
6
DISTRIBUTION PROFILE SPECIFICATION, plus the previously mentioned references, and references
within. This decoder is shown in Figure 4.
Stereo Modes
So far we have only considered LC-SBC for a mono signal. There are three additional modes for
stereo signals. The first, Dual_Channel, encodes R and L channels completely independently. Thus,
both channels will independently determine the scale_factors and how the bitpool is allocated. In
Dual_Channel mode, the Bluetooth frame essentially doubles in size, as there are two separate sets
of scale_factors and quantized values for each block and subband. As a result, as compared to
Mono mode, Dual_channel essentially doubles the bitrate of the codec.
For Stereo mode, the bits allocated to a subband now have to represent two (L and R) values. If the
bitpool is not increased, the intuition is that now you have roughly half the number of bits to quantize
each channel's value as compared to M
ono mode. The choice for s
cale_factor is determined by
identifying the largest sample across both channels, and that a single scale_factor is used for L and
7
R time sequences. For Joint_stereo, things are similar to the S
tereo case, but each subband can
decide whether the two channels are characterized as L and R, or alternatively as middle (M) and
side (S). The middle and side representation is found by adding (M) and subtracting (S) the two
channel values. An M and S representation should require fewer bits to encode if the side channel is
small, i.e. the L and R channels are highly correlated. Because of this flexibility (LR vs MS), there is
an extra bit for each subband added to the final frame to indicate which convention is used. The
entire LC-SBC block diagram is shown in Fig 5.
These equations, as well as other details, are also nicely implemented in an LC-SBC bitrate
calculator created by a person with handle V
aldikSS. That calculator is available at
https://btcodecs.valdikss.org.ru/sbc-bitrate-calculator/
8
long. The size of these Bluetooth A2DP frames are primarily determined by the size of the quantized
subband time signals, what I will call the a
udio payload. Those are expressed as (nrof_blocks *
bitpool) bits for Stereo, Joint_stereo, and Mono, and (nrof_blocks * 2 * bitpool) f or D
ual_Channel. The
other terms express the header information, the scale_factor(i) for each subband, and in the case of
Joint_stereo only, the choice of MS or RL representation for each subband.
Importantly, the size of A2DP frame is dominated by the audio payload. If we make this simplifying
assumption that the size of the f rame_length is dominated by the audio payload of (nrof_blocks *
bitpool) (2x for D
ual_Channel), then the bit_rate (bits/sec) is approximately (F_s*nrof_blocks *
bitpool)/(nrof_blocks*nrof_subbands) (2x for D
ual_Channel), or
(approximately) for S
tereo, Joint_Stereo, and M
ono:
bit_rate (bits/sec) ≈ ( F_s * bitpool)/nrof_subbands [Eq 3a]
(approximately) for D
ual_Channel
bit_rate (bits/sec) ≈ 2*(F_s * bitpool)/nrof_subbands [Eq 3b]
(The actual b
it_rate is higher, because, in addition to the audio payload, the actual frame_length
needs to also include the bits representing the header info, as well as the scale_factors)
9
Note that any time you choose to increase the bandwidth of a signal, especially on a wireless
channel, you need to consider its potential impact on channel congestion and packet loss. Perhaps
even more so as Bluetooth shares the same unlicensed, ISM, 2.4 GHz band as many wi-fi devices.
This is not just for LC-SBC, but any Bluetooth codec that improves fidelity by increasing bandwidth,
e.g. LDAC. If you are operating in a place with lots of uncoordinated wi-fi transmissions, and you
experience packet-loss of the Bluetooth A2DP packets, causing stuttering, buffering, etc., it may be
better to use other, lower-bitrate codec options, including default LC-SBC. Increasing the bitrate of
any Bluetooth A2DP audio stream should make the codec more-transparent, unless it triggers
channel congestion and dropped packets. To provide a personal anecdote, when I try to connect my
mobile phone to a LDAC Bluetooth receiver, when separated by a meter of air, I can often sustain
bitrates of 660 Kbits/sec, but almost never sustain the maximum currently possible, 990 Kbits/sec.
YMMV. It seems that I could certainly expect to sustain higher LC_SBC bitrates than what I am
currently getting, e.g. 328 Kbits/sec.
As most Bluetooth audio devices default to Joint_stereo, and many listeners are unhappy with their
Bluetooth audio, let us consider how we might improve things without abandoning L
C-SBC. As a
general rule of thumb, codecs perform better (more transparently) by increasing their bitrate.
Intuitively, by increasing the bitrate, the codec will have a larger container into which it reduces the
original audio, requiring less reduction of resolution (or compression). Generally, however, you do
not want to make the container (bit_rate) too big. Certainly, if the bitrate is the same (or larger) as
the original audio stream, you defeat the main job of a codec. Generally, we would like to get the best
bang-for-the-buck, i.e. reduce the bitrate as far as you can, without making the results on the far end
unusable (however you define that). Of course, trade-offs are required. This is generally true with
LC-SBC, as we next discuss. However, as there is a large amount of existing Bluetooth audio devices
in the world that all support L
C-SBC, ideally, we want to make changes that are least likely to “break”
most of those legacy devices.
10
While this is perhaps the most intuitive way to increase the bitrate, there are some cautions. First,
each of those legacy Bluetooth audio devices will advertise, during the connection process, its
maximum bitpool. ValdikSS keeps a database of Bluetooth audio devices along with their advertised
maximum bitrate at https://btcodecs.valdikss.org.ru/codec-compatibility/ Viewing that database,
many devices advertise a maximum bit rate of 53, and others less, e.g. 35. ValdikSS has also found
that some devices are unable to function even at their advertised maximum bitrate, e.g. see the the
comment section of an article on soundexpert.org with title “Audio quality of SBC XQ Bluetooth
audio codec”. So, while increasing b
itpool seems like an obvious way to increase fidelity to the
original (uncoded) signal, most Bluetooth receivers are already operating at, or near, their maximum
bitpool.
Also, Section 12.5.1, ADVANCED AUDIO DISTRIBUTION PROFILE SPECIFICATION (Version 13):
The value of the bitpool field shall not exceed 16 * nrof_subbands for the MONO and DUAL_CHANNEL
channel modes and 32 * nrof_subbands for the STEREO and JOINT_STEREO channel modes.
This also bounds how much we can increase the bitpool.
That said, if we could convince the industry to start producing Bluetooth receivers with LC_SBC
bitpools larger than 53, but smaller than the limitations mentioned in the previous paragraph, we
would likely get higher higher fidelity results at the receiver. Perhaps high enough that would render
the new (closed-source, royalty-generating) codecs moot.
The second idea was created by ValdikSS. ValdikSS’ suggestion is to somehow force a
Dual_channel connection with legacy Bluetooth audio devices: https://habr.com/en/post/456476/
From Eq [3a-b], it is clear that such a configuration would roughly double the bitrate. ValdikSS has
also written Android patches to accomplish this. These have been adopted by the Android ROM
Lineage and a few other AOSP-based ROMs.
https://www.lineageos.org/engineering/Bluetooth-SBC-XQ/ ValdikSS (or someone) has decided to
call this mode SBC XQ, or Dual Channel HD.
However, I, and a few others, have noted that this second approach forces the extra bits to be evenly
distributed to each channel, thereby not taking advantage of the Joint_stereo ability to more
efficiently encode two channels that are highly correlated, as is the case in many music files. Still,
this second idea is very clever, is likely to work effectively with most legacy devices, and is the
inspiration for me in what I propose next.
11
Now, the third idea, which I believe is novel up to now, and I call SBC-High Bit Rate, or SBC-HBR.
There is at least one other way to increase the bitrate in ways that should be (i.e. mandated by the
specification) implemented in legacy Bluetooth audio receivers (speakers and headphones).
Whether this third suggestion will work will require some future work. Consider these caveats:
1. Like ValdikkSS found with D
ual_channel and larger bitpools, it seems that most Bluetooth
audio devices were only tested with the suggested parameter combinations in the specs.
While these were only offered as suggestions, it seems that many implementers did not fully
understand the available tradeoffs described above, and stuck to “known good choices”. As
such, they likely tested their implementations with the known, recommended choices, not
confirming full compliance with the specification. This could also be the case with reducing
the n
rof_subbands from its default values of 8. Only by testing with a large collection of
legacy devices will we find if these choices will reveal bugs and other problems.
2. The main “kludge” of LC-SBC is the way in which scale_factor is determined and bit
allocations are done. These processes are very simplistic, but deemed “good enough” for
this Low Complexity codec. Changing the n
rof_subbands from its default choice may have
undesirable (or at least suboptimal) interactions with these simplistic processes. This will
require additional consideration and listening tests.
I would propose that these three modifications to LC-SBC, i.e. 1. Increasing bitpools, 2. forcing
Dual_channel, 3. halving n
rof_subbands, be collectively described as SBC Bitrate Doubling
12
Techniques. Further testing should address the two future-work items mentioned immediately above,
as well as any other unforeseen issues for SBC-HBR
These goals can be managed (and traded-off) via smart choices of the (relatively) large numbers of
freely-chosen parameters, i.e. bitpool, nrof_blocks, nrof_subbands, and two bit allocation schemes
(SNR and Loudness). It is not clear how many fully understood the flexibility of this codec, and by
choosing parameters suggested by the original specification--suggestions that I believe have proven
to be a bit too conservative in terms of bitrate-- L
C-SBC has been widely viewed as a poor choice for
discerning listeners in terms of overall sound quality. On the other hand, these conservative choices
for LC-SBC “ just worked”, and showed that wireless audio was not only available to most users, but
would be widely adopted.
As a result, there has been a proliferation of “high”, or at least “higher”, quality codecs, such as
APT-X (and its variations), AAC, Samsung’s Scalable Codec, Sony’s LDAC, and most recently, LC3.
While many of these are indeed well-engineered and dialed-in alternatives, none are open source (so
far), and most are not royalty-free. Further, many, especially the higher bitrate options such as
APT-X HD and LDAC, additionally increase fidelity by increasing the bitrate over typical L
C-SBC
settings.
The successful implementations of these higher bitrate options begs the question: “How good can
LC-SBC sound if allowed to set higher bitrates, such as these ‘advertised higher sound quality’
codecs?” This article is one exploration of that question. It explores the available settings for
LC-SBC by modifying its parameters. It further shows three potential solutions for increasing the
bitrate, one which I believe is novel.
One should take care not to increase the bitrate “too much”. Bluetooth works in an unlicensed band
often crowded by other wifi and Bluetooth (among other) devices. This can lead to a Bluetooth audio
13
experience with lost packets, stutters, buffering, not to mention likely sub-optimal performance of
those other devices.
If we could convince the makers of Bluetooth audio equipment to implement LC-SBC with higher
bitpools, this would certainly improve fidelity to the original audio. Some models might even be able
to achieve this with a firmware update, as the original limit was likely chosen out of convention
instead of an actual hardware limitation. Unfortunately, such a change would require the
manufacturer to do additional work (mostly testing), and has little incentive to do so with already
sold units. Given that, this direction seems unlikely to help those use their legacy Bluetooth
headphones and speakers.
Asking the makers of Bluetooth audio equipment to support SBC-HBR, i.e. try to negotiate a LC-SBC
with nrof_subbands=4 configuration (or hacking our source devices, e.g. smartphones) would double
the bitrate as compared to the default nrof_subbands=8. If this bitrate is too high (causing
congestion), it can be fine-tuned by reducing the b
itpool as needed. Since this allows the added
compression possible for correlated L and R channels, SBC-HBR would be my first choice.
Either of these last two suggestions should be fully compliant with Bluetooth receivers that follow
the specifications. That means it should work with most of the legacy devices we already have.
Further, it would promote open source projects and use of royalty-free standards.
14
Fig 5-LC-SBC Block Diagram
15
Audio Examples
Three sets of 30 second audio test files are available. In each case, three different bitpools are
chosen, i.e. bitpool={20, 35, 53}. Subbands are chosen from nrof_subbands={4, 8}. The other
parameters are fixed for all versions to these values: Joint_stereo, nrof_blocks=16, Loudness
bit allocation.
Each of the three sets of audio files begins with the original .wav file extracted from a music CD,
then edited in Audacity for length, plus using its standard fade-in and fade-out effect. The
shortened file is then re-exported as a .wav file, with the string “test” appended to the file name,
e.g. OFortuna_test.wav. The file is then used for a series of encodings and decodings by the
Windows executables sbc_encoder.exe and sbc_decoder.exe, each time with the various
combinations of encoding parameters. All operations were performed on a Windows 10 PC.
Details can be found in the batch.txt file that accompanies each set. In addition, here are the
parameters for both the encoder and decoder.
>sbc_encoder -help
SBC Encoder LIB Version 1.5
Copyright (c) 2002 Philips Consumer Electronics, ASA Labs
Usage:
sbc_encoder [-jsv] [-lblk_len] [-nsubbands] [-p] [-rrate] [-ooutputfile] inputfile
[-s] use the stereo mode for stereo signals
[-v] verbose mode
[-j] allow the use of joint coding for stereo signals
[-lblk_len] blk_len specifies the APCM block length, out of [4,8,12,16]
[-nsubbands] subbands specifies the number of subbands, out of [4,8]
[-p] a simple psycho acoustic model is used
[-rrate] specifies the bit rate in bps
[-ooutputfile] specifies the name of the bitstream output file
inputfile specifies the audio input file, the major audio formats are supported
>sbc_decoder -help
SBC Decoder LIB Version 1.5
Copyright (c) 2002 Philips Consumer Electronics, ASA Labs
16
Usage:
sbc_decoder [-v] [-ooutputfile] [-pstartpos] inputfile
[-v] verbose mode
[-pstartpos] startpos specifies the byte offset to start with decoding
[-ooutputfile] specifies the name of the audio output file
inputfile specifies the name of the bitstream input file
Note that each set includes a version with the suggested “Medium Quality” [-j -p -n8 -l16
-r228800] and “High Quality” [-j -p -n8 -l16 -r328000] parameter choices. One of these are the
versions most likely produced with legacy audio equipment. These parameter choices are
consistent with the suggested settings in the original specification, as shown here:
[Considering Bluetooth’s Subband Codec (SBC) for Wideband Speech and Audio on the Internet, by
Christian Hoene, Mansoor Hyder October 2009]
Test Set 1: Classical Music, Full Orchestra and SATB Chorus, Loud Volume
https://drive.google.com/open?id=1bnXtniYsnSlMbMbD16ZdzjjPgHPdJ_nF
Test Set 2: Classical Music, Female Soloist with Orchestra, Soft Volume
https://drive.google.com/open?id=1QVW-GbcoKP4BStgu1DOkmcMN87ebKRpf
Test Set 3: Jazzy Pop, Female Soloist with Guitar, Piano, Bass, and Drums, Medium Volume
https://drive.google.com/open?id=113wjTJ2mDZfIj3_NgL4IHd94hNxrCXn8
17
Acknowledgements
I want to thank ValdikSS for his Dual_channel idea, as well as his extensive public writing on
A2DP. He has also answered many questions of mine via private correspondence. Gabriel
Bouvigne has also patiently answered questions of mine, and his suggestions greatly improved
this article.
Versions
● January 2020 Original sent to a small group of reviewers, iterative edits
● February 9, 2020 Original sent to broad audience
● February 13, 2020 Original SBC-HBR (the group of modifications) now called SBC
Bitrate Doubling Techniques; SBC-HBR is used exclusively to describe the
nrof_subbands=4 technique
END
18