Mid/Side Stereo Coding

MUSIC 422 Final Project March 11, 2005 Rui Wang, Harold Nyikal, James Yu

02/09/08

1

Introduction
 Why Stereo Coding?
   

Exploit correlations Flexibility for better quality or compression Widely used Artifact free

02/09/08

2

Topics
 Mid/Side Coding  Bit Stream

02/09/08

3

Mid/Side Overview
 Take FFT of Block  Decide to code M/S or L/R based on channel energy differences  Stereo Perceptual Model for L/R or M/S  Bit allocate  Quantize  Pack to Bit Stream
02/09/08 4

Encoder Block Diagram
16-bit PCM WAV block size

read audio file
L/R desired bit rate L/R

MDCT

FFT

M/S convert

select LR/MS
L/R or M/S SMR M/S or SMR L/R

decide MS/LR

FFT M/S FFT L/R

find peak select
SMR(LR/MS)

find peak

common bit pool

allocate bits

block floating point quantize

masking curve
SMR M/S SMR L/R

masking curve

SMR packing

SMR

write file

02/09/08

5

Decoder Block Diagram
encoded file

read file
array of bytes

unpack
n-bit code

dequantize

convert M/S sub-band to L/R (if necessary)
MDCT coefficients

IMDCT
number (double)

write audio file

02/09/08

6

Mid/Side Coding
Mid = (L+R) / 2 Side = (L-R) / 2 We can losslessly recover L, R by L = Mid+Side R = Mid - Side

02/09/08

7

Choose L/R or M/S
 The decision to transmit L/R or M/S based on the following threshold
f upper k = f lower

l k2 − rk2 < 0.8 ∑

f upper

k = f lower

l k2 + rk2 ∑

 If TRUE, transmit M/S, otherwise L/R

02/09/08

8

Generate Perceptual Model for L/R and M/S
 To calculate the M/S masking threshold, first the same two slope spreading function as used for L/R (from text) is used.
 

BTHRm – base threshold for the M channel BTHRs - base threshold for the S channel

 Additionally we must consider the stereo masking contributions in the M and S channels. This is dependent on the masking level difference between the M and S channels.
02/09/08 9

Cont.
 The masking level difference is determined by the following factor

10
 

1.25* ( 1− cos( π* min( z ,15.5 ) / 15.5 ))− 2.5

 This is multiplied by BTHRm or BTHRs to obtain the respective MLD values
MLDm = MLDfactor * BTHRm MLDs = MLDfactor * BTHRs
10

02/09/08

Perceptual Analysis
 Stereo Perceptual Analysis

Masking and MLD factor

02/09/08

11

The Masking Thresholds
 The masking thresholds are thus calculated as follows
 

THRm = max(BTHRm, min(BTHRs, MLDs)) THRs = max(BTHRs, min(BTHRm, MLDm))

 The SMR of the M/S channels is determined from these thresholds
02/09/08 12

Bit Allocation
 Waterfilling Algorithm

All bands in stereo signal (either M/S or L/R) are ranked in one pool according to SMR Bits allocated to each band from a common pool of bits

02/09/08

13

Bit Stream

Encode File Header

Block 1

Block 2

Block 3

Header:

File ID (7-bytes)

Wave info (32 bytes)

n Blocks (8 bytes)

Blocksize (4 bytes)

# of scale bits (2 bytes)

# of mantissa bits (m) (2-bytes)

# of bytes in a block (4-bytes)

Block:

Switch Info (25 bits)

Scale factors (4*25 bits)

# of mantissa bits per band (m * 25 bits)

mantissas

02/09/08

14

Example: Time-domain block

02/09/08

15

FFT of the block

02/09/08

16

M/S Channels

02/09/08

17

Sum and Difference of FFT energies

02/09/08

18

Listening Test Results

1) Rock music 2) Bass singer 3) Castanets 4) Glockenspiel 5) Harpsichord 6) Quartet 7) Speech 8) Violin
02/09/08 19

Listening Test: Stereo Improvement

02/09/08

20

Conclusions
 Stereo signals usually have strong correlation between the two channels  M/S encoding is used more often than L/R based on our decision model  Stereo Coding improves most stereo signals

02/09/08

21