You are on page 1of 8

Multi-Rate Broadcast-Part I:An

Investigation of Broadcast Channels

Department of Electrical Engineering, Madras Institute of Technology,

Abstract –An expository investigation of Broadcast is a quantity that depends only on the characteristics of a
channels,their Capacity constraints and basic Information channel (provided it is discrete and memoryless),and is
theory related to Progressive Source Coding for Multi-Rate influenced through the conditional probabilities
Broadcast. p(y/x).This quantity is called the capacity of the channel
'C' =max{I(X;Y)} bits=max{H(X)-H(X/Y)} bits
Keywords–Broadcast, Channel Capacity, Degraded
Channel,Multi rate Broadcast.
A Note on broadcast Channels:
A broadcast channel is in many cases the practical
I. INTRODUCTION subversion of a the unicast channel. A single transmitter
The purpose of this paper is to discuss broadcast must decide upon an interesting rate to transmit
channels,channel capacity,efficient coding and the various information efficiently to many receivers. This rate will
issues related to them so as to lay out a foundation for most definitely depend on the capacity of the channel,but
further development of efficient communication schemes. the crux of the argument is that a broadcast channel is
The paper also discusses,though more informally, the essentially made up of a number of individual channels
basics of superposition coding as a predecessor to the each with their respective capacities C1,C2...Ck,and the
development of other multi-rate broadcast methods. transmitter doesn't know the true channel characteristics
Broadcasting is the act of simultaneous transmission of all these 'K' channels. Therin lies the challenge.
to multiple receivers. A Communication channel,when There two basic approaches to broadcasting
defined as the physical channel that provides a connection information over a channel:
between transmitter and receiver,is often misconceived to 1) Send at a rate=Cmin=min{C1,C2...CK}
be a static unchanging medium. However for proper 2) Send at a rate=Cmax=max{C1,C2...CK},which
analysis from the perspective of communications would lead to transmission in the best channel alone (I.e
engineers we need to be aware of the various R1=R2..,all except the best rates.=0)
manifestations a channel can have, and how each A third more efficient approach,known as time
parameter of the same channel can vary for different sharing,allocates proportions of time λ1, λ2... λk to each of
stimuli. C1,C2....CK in such a way that the actual rate of
A Note on Capacity: information transmission of channel j is
A broadcast channel can be visualized as having one R j =∑  λ j C j 
transmitter and many receivers,and can be characterized j≤k
by the function P(Y1=y1,Y2=y2..Yn=yn/X=x1).This However the ultimate aim of our discussion is to
conditional probability function denotes the probability of develop a scheme that exceeds these limits of
reception of various values of Y by different receivers,for performance by distributing precision and nesting data,to
a given value of X,where X and Y are random variables. make multi rate broadcast resemble K unicast operations
With knowledge of the channel induced transitions that as closely as possible.
occur,we also gain knowledge of the entropy involved : Hence we refuse to submit to the use of any of these
H(X/Y).This quantity denotes how uncertain we are of X three schemes.
after being given Y. If H(X/Y)=0 then the channel is error
less because this means that X=Y.H(X/Y) is therefore one
of the best indicators of the information loss in a channel.
Every source has an entropy H(X),which denotes the
average information per source symbol. Even thought the Consider a broadcast channel with 2 receivers,the
rate of input bits may be R bits/sec, the rate of information formal definition as initiated by Cover[2] is as follows:
is only:Din=H(X)R bits/sec. More importantly,the rate of
transmitted bits is:Dt=[H(X)-H(X/Y)]R bits/sec Definition:A broadcast channel consists of an input
In the case of a channel with H(X)=H(X/Y) we are alphabet X and two output alphabets y 1 and y 2 and
better off flipping a coin to decide values for the output Y a probability transition function which characterizes the
ourselves. channel,given by:
I(X;Y) is the mutual information provided by the n n n
P  y1 , y2 / x 
output Y about the input X. The value of I(X;Y) The broadcast channel is said to be memoryless if
maximized over the set of input symbol probabilities p(xi)
n By defining the cutoff rate
P  y , y / x =∏ p  y 1i , y  2i / xi 
n n n
1 2
R0=log 2 2 /1e−E / N  and realizing that
c 0

in a memory less broad cast channel the occurrence of the Q function (born out of the probability of error of a
error in one interval or symbol period doesn't affect other binary system) is bounded by
−dE c / N 0
symbol periods. Examples of channels with memory Q  2dEc / N 0 =e
include fading links where switching transients cause we get:
burst errors. A more general representation of broadcast
channels is given in [1] as being three finite sets given by P e 2−n R −R 
0 c

(X,P(y1,y2/x),Y1X Y2),the interpretation being that x is the this means that whenever Rc<Ro the average probability
input to a broadcast channel with conditional probability n 
of error P e 0 as n ∞
p(y1,y2/x) resulting in the outputs y1 and y2.The successful
reception of an n bit codeword transmitted as X in such a
Definition: The capacity region of a broad cast channel is
channel will only depend,due to the amnesia of the
the closure of the set of achievable rates.
channel,upon the successful reception of each of the bits
These three definitions will suffice for a basic analysis.
in the word by each of the receivers.

Definition: The rate pair (R1,R2)is said to be achievable TABLE I

for the broadcast channel if there exists a sequence of Capacities of some frequently encountered channels
nR nR n 
 2 1 , 2 2  , n codes with P e 0 Apart from the Channel Capacity

fact that both codes R1 and R2 must have a probability of 1 Binary Symmetric Channel 1−H  p
error that tends to zero,this definition is very similar to the q ∞
channel coding argument that postulates the existence of a
2 AWGN memoryless
max ∑ ∫ p  y / x p  x 
set of codes for which p e 0 as 'n',the code block 0 −∞
length,tends to infinity (provided Rc < Ro).
Why does the code block length affect the probability Waveform Channel:
 Bandwidth A very celebrated formula:
of error?simply put there are 2 n availiable vertices of a limitation
hypercube,out of which we only select some M.For 3  Power
wlog 1 pav / wN 0 
different selections of M there exist different Constraints
 Gaussian
communication systems,and hence an ensemble of 2 nM Noise
possible choices.


Understanding a Symmetric Discrete Memoryless
Channel (DMC),i.e a channel characterized by a set of
conditional probabilities [pij] which may or may not be
equal, is vital for the design of Multi-rate codes and so a
deeper inspection is not uncalled for.The Binary
Symmetric Channel(BSC) is a special case of a Discrete
Memoryless Channel,wherein there is only one common
value for the probability of erroneous reception,p. To
further clarify this statement,the channel is characterized
by the conditional probabilities:

Pr( Y = 0 | X = 0 ) =Pr( Y = 1 | X = 1 ) =1-p

Visualizing Distance Properties with Code Cube Pr( Y = 0 | X = 1) =Pr( Y = 1 | X = 0 ) =p
The probability of error of a communication system that
chooses M coded waveforms out of a possible 2 nM 1

possible sets of codes is upper bound by the probability of

error of a system that uses only 2 equally likely words. q

This means that: q

P e  X k ≤ P 2  x i , x j  p
where P 2  x i , x j  is the ensemble average over all
2 systems for a scheme that uses only the code words
xi , x j . Discrete Memoryless Channel
The channel model represented above is actually 2 .Since the probability transition matrices are:
BSC channels. The first BSC,between X and Y1,has a Channel (i)
capacity C1=1;The second BSC,between X and Y2,has a
capacity C2=1-H(P).This is because the conditional
probability of transition for the second channel is p(and
 
P 1= 0.5
0.1  
P 2= 0.3
0.3  
P 1= 0.1
q=1-p),because of which the maximum information is
bounded by the value of 1-H(P). Channel (ii)
I  X ; Y = H  X −H Y / X 
= H Y −∑ p x H Y / X =x 
 
P 1= 0.6
0.3  
P 1= 0.3
0.1  
P 1= 0.1
x We expect the first channel to achieve it's capacity
= H Y −∑ p x H  p which means that through equi-probably symbols,and the second channel to
x achieve it's capacity through a set of symbols distributed
I  X ; Y ≤1−H  p=C so as to maximize I(X;Y) (when expressed as a function
But why can't the rate exceed this value? of p(x1)=p and p(x2)=(1-p))
This is because on correct decoding the receiver acquires This leads to a further conclusion regarding the input
2 pieces of information:(i)the transmitted word of length
log 2  M  bits and(ii)an error word statistically probabilities {P(xi)} that maximizes I(X;Y) which can be
better illustrated through the analysis of a cost function
independent from (i).In a collection of n bits the number
through which we define C(X) as:
of possible errors that can occur is 2 n ,however the q −1
probable errors are limited to a range defined by the C  X = I  X ; Y −λ ∑ P  x j −1
hamming weight n(p±δ).This results in a typical set of j =0
error sequences ,where,to borrow from [3]almost each one The cost function denotes that I(X;Y) need not always
is almost equally probable.Each of these error words has equal C(X)and will only do so for a certain probability
an error probability of 2−nH  p  and so each conveys distribution (which may or may not be the equi-probable
information equal to nH(p).Now net bits in possession: distribution) .By maximizing C(X) as follows:
log 2  M nH  p ∂ C  X = ∂ 
but actually only n bits were transmitted,and so ∂ P xk
∑ p  x j  I  x j ; Y 
∂ P  x k  j=0
log 2  M nH  p≤n
log 2  M /n=1−H  p ∂ 
q −1

A Note on the Input Distribution of a Symmetric − ∑ p  x j =0

∂ P  x k  j =0
Discrete Memoryless Channel:
Very often the choice of equally probably input Where P(xk) is the optimum distribution.On
symbols maximizes the average mutual information simplification we find that I  x k ; Y =log e
thereby achieving the capacity of the channel. However it .This is an expected and yet sometimes slighted result
must be noted that this is not always the case,and that which denotes that,for the optimal distribution
such an equi-probable distribution will only result in
maximum information when the channel transition
P  x k  , I  x k ; Y  is constant over all xk and
probabilities exhibit a certain symmetry. More moreover,the capacity of the Symmetric Discrete
specifically,when each row of the probability transition memoryless Channel is:
matrix 'P' is a permutation of any other row,and each C=max p x  I  x k ; Y 
column is a permutation of any other column,the Going back to the 2 channel example presented earlier,the
probability transition matrix is symmetrical and equi- first channel achieves I  x 1 ; Y =I  x 2 ; y =C for an
probable inputs will maximize I(X;Y).Consider for
equi-probable distribution,while the second channel
example,the following two channels

Orthogonal Channels:
The best possible scenario for someone who wishes to
broadcast information is to be presented with a channel in
which communication to one receiver,in no way interferes
with communication to another. Further,if the channel
matrices of all channels involved only contain either 1's or
0's,i.e they are perfectly noiseless channels,then all their
capacities will be 1 bit/transmission.
Consider a source X that broadcasts information to
Channel(i) Channel (ii) two receivers Y1 and Y2. For such a broadcast channel
Transition probability values borrowed from [4]. (I(X/Y1),I(X,Y2))=(1,1) can be achieved by:
 Choosing the input probabilities in such a way III. GENERAL BROADCAST CHANNELS
that information conveyed is maximized,as
In this section we discuss, Incompatible,Gaussian and
depicted under the previous sub-heading.
Degraded broadcast channel models,and briefly outline
 Efficiently symbolizing the input alphabet in such
the capacity region established by Marton. An
a way that C1 and C2 are jointly achievable will
understanding of these channels is essential for the
then result in maximum capacity.
analysis and development of multi-rate broadcast
For example,if C1=C2=1 bit/transmission (perfectly
noiseless channel discussed earlier),then an input
Martons Inner Bound:
probability distribution of P{x=i}=1/4 will achieve
The capacity region of a general broadcast channel is
(I(X/Y1),I(X,Y2))=(1,1).Then assuming there are n
still to be established (save certain special cases),but
possible input bits u Є{I,2,3...n} that we wish to transmit
Marton [5] has established an inner bound. The inner
to Y1 and n possible input bits v Є {1,2,3...n} that we wish
bound states that any (R1,R2)ЄR0 is achievable for a
to transmit to Y2,there will be a total of n 2 Discrete Memoryless Broadcast Channel (DBMC)
combinations(assuming that we transmit one member of u provided R1 and R2,individually,are both bounded by the
and one of v,together).To uniquely represent these n 2 information we wish to convey to receiver Y1and Y2
possibilities we need to make use of a same number of respectively,and (R1,R2) are both bounded by the net
symbols s Є{1,2,... n 2 },and every time receiver Y1 information we wish to convey across the channel.
receives a symbol s1 it associates it with the corresponding Assume we wish to send the auxillary variable u to
member of u while every time Y2 receives the same receiver y1 and auxillary variable v to receiver y2,then
symbol s1 it associates it with a member of v. Hence according to Martons bound:
the achievable rate region for orthogonal channels is: any R0= R1, R2 : R1, R 2≥0 is achievable provided
R1 I U ; Y 1  , R2 I V ; Y 2  and
R1=R 2I U ; Y 1 I V ; Y 2 −I U ; V 
for the DBMC  X , p  y 1, y 2 / x  , Y 1×Y 2  .
This trivial bound can be achieved by binning,as
depicted in [ ] and [ ].
Incompatible Broadcast Channels:
The worst cases of incompatibility for simultaneous
Switch-To-Talk:The switch to Talk broadcast channel, as
described in Cover [1] consists of a single transmitter and
2 or more receivers (at least for the sake of analysis,this is
the model chosen).The transmitter does not transmit the
same message to both receivers,nor does it make use of
the same alphabet. The idea is that when the sender
wishes to communicate with Y1 he uses x ЄX1 and when
he wishes to communicate with Y2 he uses x ЄX2.
Achievable Rates for an Orthogonal Channel

The noiseless of the channel,however,is no crucial.

An orthogonal broadcast channel will still maintain it's
position of superiority over other types of broadcast
channels in the sense that (R1,R2)=(C1,C2) can still be
achieved,even if the the broadcast channel is made up of 2
BSC channels each with parameter P1,P2.In this case each
of the individual channels will have capacity CN=1-
H(PN).The point here is that any R12 (a common rate of
broadcast) such that 0≤R12 ≤min C 1, C 2  can be
achieved,and if C1=C2 then R12=R1=R2 achieves maximum Naive Time Sharing
capacity. A common analogy used to describe this type of
Orthogonal broadcast channels are idealistic channel is a situation in which you,a speaker fluent in 2
paradigms for the analysis of further,more practical languages, needs to communicate with 2 people,each of
broadcast channels discussed in preceding sections. The whom can only understand 1 language. A simple
capacity regions of these channels will always lie within benchmark for this channel is naive time sharing as
the region of the orthogonal channel,as there will always described in the introduction section of this paper and also
be some destructive interference between receivers.
in [2] and [3].Since both (R1,R2)=(C1,0) and as the transition matrices of the two channels C1 and C2 of
(R1,R2)=(0,C2) are achievable by devoting a fraction of a system with X={1,2,3,4},Y1={1,2},Y2={1,2}.
the transmission time  to A and   to B one can easily We see that to find the capacity, we need to maximize
achieve (R0,RA,RB)= 0, C 1,   C2 . I(X/Y1),I(X/Y2) over the input probability distribution.
To see how exactly this sort of channel functions,consider
We can do much better than this,however.
the following scenario:
Consider the earlier stated example of speaking 2
languages. You talk in English and Hindi while each of = p1 p 2 ; 
 = p 3 p4
the listeners can only understand either one language. If where P(X=i)=pi, ,
naive time sharing is used ,you would be using periods of H Y 1 =H  p1/ 2 (or p2,since anyway to find the
time  to differentiate the messages of receiver (1) and maximum information we will maximize from
receiver (2).Now instead of this,assume that each receiver 0≤ p 1≤ )
can distinguish if a particular word is spoken in his H Y 1 / X = by definition,the average information
language or not. Then,if at a particular instant of time a
hindi word is spoken,the Indian would get some transmitted to Y1 by X is  .
information in the form of a message, while both the this results in the capacities:
Indian and the American would get information about the C 1= ; C 2=

source of the message (they would both realize that the when  =1 C1=1 but C2=0 (it receives pure noise and no
source is the Hindi script). information).
In the switch-to-talk channel,if channel 1 is used  Gaussian Channels: The unpredictability involved in
of the time and channel 2 is used 1− proportion of broadcasting often makes it's analysis more challenging
the time, H  additional bits/transmission may be than unicasting.One transmitter has to efficiently code and
achieved. This is different from the naïve time sharing transmit data to n receivers in such a way that a rate as
bound in that,  is usually chosen based on source close to the capacities of each of those n channels is
achieved.The Gaussian Channel is a time-discrete channel
probabilities so as to facilitate the perfect transmission of
that appears to draw noise from an independent and
one of 2 nH   additional messages to Y1 and Y2.The identically distributes Gaussian distribution,add it to the
H  additional bits transmitted in such a way give input, and generate an output that doesn't depend on the
knowledge of the source. Thus all (R1,R2) of the form input alone.The easiest way to represent this channel is:
C 10H  ,  0
 C 2 H 
can be achieved,resulting in a capacity that dominates that
of naïve time sharing.

Yi=Xi+Zi Zi~N(0,N)
if the noise variance is zero the receiver receives the
transmitted symbol perfectly.
To transmit data efficiently over this channel we establish
a power limitation on any codeword,denoted as:
n −1
1/ n ∑ x1 =P
for a codeword (x1,x2......xn),and one of the simplest
ways to transmit data would be to send one of 2 levels
Switch-to-talk capacity region  P ,− P .In such a scheme,when the noise in the
The Worst case of Incompatibility:2 channels,so channel overwhelms the signal power,a misrepresentation
incompatible that one can do no better than time sharing. of data occurs. Exploiting this fact one can convert a
This type pf channel has very detrimental interference continuous Gaussian channel into a Discrete Channel with
coupled from one channel to the other. If X wishes to cross-over probability Pe,where Pe is the probability of
communicate with Y1,he must send pure noise to error defined according to the input signal levels used (for
Y2.Borrowing the description of such a channel from [1,2] a  P ,− P system it would be :

   
1 0 0.5 0.5 P e =0.5PY 0 / X =  P 
0 1 0.5 0.5
P 1= P 2= 0.5PY 0/ X =− P )
0.5 0.5 1 0
0.5 0.5 0 1
The power constraint P however,leads to a subversion
of the classical formula for channel capacity Here we introduce superposition coding in one of it's
C=maxI(X;Y).Even though this formula still holds most fundamental manifestation, subtractive de-
good,the typical influences of signal power and noise are coding.Based on the assumption that any data stream
not reflected in it.One needs only to realize that the noise decodable by receiver Y2 is also decodable by Y1 (since
distribution,and hence the output distribution (since C2 is more hostile to data transmission),we can decode S2
Y=X+Z) are both normal,due to which the entropy of each at Y2 and then find the data stream intended for receiver
can be found easily in terms of the cumulative normal Y1 by subtracting s2 from y1.The receiver Y1 therefore
function Φ(x).In short,the entropy of both Z and Y are correctly receives both s1 and s2 by first decoding s2,
1 1 which has been embedded in the overall data stream,and
log 2 e 2

bits (because Φ x= e 2  and then using this knowledge to isolate s1+z1.Conversely,the
2  2  2
h =−∫  ln  as shown in [5]).Since the variance of Z rate pair:
1 S
 1 S
is N,and the variance of Y is (P+N) were P is E  X 2  ,the R1= log1  log 1 
2  S N 2 2 N1
mutual information is bounded by:

1 1 1 S

R2 = log 1 
I  X ; Y ≤ log2 e  PN − log 2 e N  2  S N 2
2 2
is simultaneously possible,and it is clear that our previous
conclusion is over ruled simply because a much higher
since C=max E  X 2 ≤P I  X ;Y  rate is possible for Y1.
the capacity of a gaussian channel with power constraint P
and noise variance N is:
1 P
C= log 1  bits per transmission...(1) An engineer who wishes to broadcast music at the
2 N best possible quality to all listeners is posed with a
and for a channel bandlimited to W,since there are 2W predicament. He can either:
samples per second:  Prepare for the worst and transmit monoaural
P quality music to everyone.
C=2W log 1  bits per second. Hope for the best and transmit stereophonic
N' 

Where N' is the noise spectral density=N0/2 watts/Hz. quality music to everyone.
Now consider the time-discrete Gaussian Broadcast The results depicted in [curve ineq] imply that he needn't
Channel.2 channels leading to 2 receivers,each with their resort to either option (or equally,that he can use both).
own noise distributions Z1,Z2,.mean zero,and noise Rather, he can create a data stream that contains both
variance N1,N2.It is well known that each of these monoaural and stereophonic music by superimposing the
channels has a capacity given by (1),but for efficient latter on the former,and transmit this data to all receivers
transmission a scheme that time-shares these capacities is in such a way that the quality of music heard by a listener
in adequate.Instead,consider a scheme that superimposes is dictated by the noise power present in the channel
s2,the data stream meant for Y2,onto the sequence s1 meant between him and the transmitter.When Noise in the
for Y1.If the transmitted sequence actually consists of 2 channel (N)>Noise threshold (Nt),a receiver can do no
streams of information s1,s2 in such a way that s2 is better than recover the monoaural data stream
alone;however when N<Nt a receiver can first recover the
intended for Y2 and s1 is intended for Y1,then the received
monoaural data stream, and then use it to fine tune its
sequence are y1=s1+s2+z1 and y2=s1+s2+z2.Hence s1 and
reception so that stereophonic music is heard.
z2 are considered noise by Y2,and S2 and Z1 contribute to A Note on Codebook Sizes:.
the loss of information with respect to Y1. If the net signal The transmission of an 'n' bit codeword will result in
power is S,let  S , 
 S be the signal power proportions the reception of a vector of power n(P+N).The space of
devoted to transmitting information to Y1 and Y2 received vectors can be encompassed in a sphere of radius
respectively. Then,noise power felt by Y2 will be  S
+N2 resulting in:
 n P N  ,and it is within this sphere that any possible
codeword will be mapped.The received vectors are
1 S
C 2 = log1   normally distributed with mean equal to the true
2  S N 2 codeword and variance equal to the noise variance,which
If we proceed along the same lines and analyze receiver means that they will very rarely be mapped as the exact
Y1 we may conclude that: transmitted word but have a high probability of being
1 S mapped inside a sphere of radius  n N  around the
C 1 = log 1  true codeword.These decoding spheres denote the limits
2  S
 N 1 of error the decoder can tolerate,any transmittes vector
However,a better rate is definitely possible if we only will only result in an error if it is plotted outside of it's
realize that,more often than not,one receiver experiences decoding sphere.The volume of an n dimensional
more noise than the other,i.e when N1<N2. n
sphere,as given in [bok] is C n r  where r is the radius of
the sphere.Hence the number of decoding spheres of words x w1, w 2  .
radius  n N  that can be packed in a sphere of radius
 n P N  is:
n n/ 2 1 P
C n n  PN  /2 P n× log 1 
2 N
=1  =2
C n nN  / 2 N
Efficient transmission over a channel of capacity C
can be achieved using a ( 2nC  ,n ) codebook because we
need a codeword to represent each 'decoding sphere',and
there are 2nC decoding spheres in a larger sphere of
plausible received vectors. Cloudcenter
To transmit the pair (w1,w2) send the codeword
n n
Outline of Achievability: x w 1, w 2  .The cloudcenter u w 2 is never
Consider a noiseless channel along with a BSC channel actually sent.
of parameter p .The noisy channel has lower capacity This code structure allows the transmission of a code
C( p ),and the number of codes in it's codebook is limited word (r,s) where r is received and decoded by both Y1
to 2nC  p − .Fewer codewords leads to a higher noise
and Y2 while S is decoded only by Y2.It must be noted
tolerance however,because the constellation points are that the parameter  controls the ration of power
more spaced out .This noise tolerance is exploited to pack allocated to the two data streams.If  is high it means the
in extra bits of information that are not decodable by number of cloud centers is low and there are more 'error'
Y2,but carry meaningful information for Y1. points within each cloud.Power allocated to data stream r
A 2nC  p  , n codebook can be constructed for
2 depends on the number of bits used to locate each cloud
channel 2,with low probability of error. Normally one center,and is proportional to (1-  ) ; Power allocated to
would choose one word out of a possible 2 n data stream s depends on the number of bits used to
,transmit it,and decode it based on it's hamming identify each point within the clouds,and is proportional
distance from any of one 2 nC  p words in the
2 to  .To better elucidate this point we make a brief
venture into the actual coding schemes used for multi-rate
broadcast in depicting:
In superposition coding we construct a code of this
type for a channel X (which is noisier than channel
2),and pack in extra information for Y1 that will still
keep the codeword within the hamming distance of
the intended word in the codebook of channel X.
Assume we have a broadcast channel that consists of 2
BSC's,one perfect and the other with parameter p. Even
though the worse of the two channels only has parameter
p,we design our code for a channel X,with parameter
 p   p (i.ethe 'p channel' cascaded with an additional
BSC of parameter  ).Thus our basic codebook will only
have 2nC  p p − codewords,but a noise tolerance=n
bits(probability that each bit is wrong)=n(  p   p ).
● These codewords form the cloudcenteres.
Each 'cloud' will have radius =noise tolerance of channel
X,n(  p   p ),and withing each cloud will be a set of
Multi-rate Signal Constellation
points distinguishable only by channel 1.
The first two bits of each word denote the 'cloud
● The number of such points will be 2nH 
centers'.Receivers in noisy channels can only decode
This is because the number of probable errors for each
empty dots,while receivers in better channels can decode
codeword/cloudcenter is 2nH  ,as explained in part II black dots.For a fixed rate code in which the number of
of this paper.Only, in our case they are not really points cannot change,  determines the Euclidian
errors,but information bits meant for Y1.Hence the extra distance between constellation points.
information we can transmit to Y1,with each codeword “Superposition coding dominates frequency multiplexing
sent to Y2, is H  .Resulting in the rates: which in turn dominates time multiplexing”,as shown in
● R1= C  p  p H  [curve ineq].
R2= C  p  p
A Note on Codebook Generation:
First generate 2nR codewords of length n to give
the cloud centers, u w 2 [2].Then for each of these We are trying to develop a superposition coding
nR technique that will function efficiently in wireless fading
2 2
codewords generate an additional 2nR code
channels,and this paper lays the ground work for further [20] J. P. Wilkinson, “Nonlinear resonant circuit devices (Patent style),”
U.S. Patent 3 624 12, July 16, 1990.
analysis and design of such variable-rate codes. Our [21] IEEE Criteria for Class IE Electric Systems (Standards style),
coding scheme includes layering of data and the use of IEEE Standard 308, 1969.
auxillary random variables,or virtual signals,that will only [22] Letter Symbols for Quantities, ANSI Standard Y10.5-1968.
[23] R. E. Haskell and C. T. Case, “Transient signal propagation in
participate in the construction of the code;one useful idea lossless isotropic plasmas (Report style),” USAF Cambridge Res.
is that of achieving Multi-rate Broadcast using Lab., Cambridge, MA Rep. ARCRL-6-24 (II), 1994, vol. 2.
Superposition Turbo TCM. [24] E. E. Reber, R. L. Michell, and C. J. Carter, “Oxygen absorption in
the Earth’s atmosphere,” Aerospace Corp., Los Angeles, CA, Tech.
Rep. TR-0200 (420-46)-3, Nov. 1988.
REFERENCES [25] (Handbook style) Transmission Systems for Communications, 3rd
ed., Western Electric Co., Winston-Salem, NC, 1985, pp. 44–60.
[26] Motorola Semiconductor Data Manual, Motorola Semiconductor
[1] Cover, T.M. , “Comments on broadcast channels ,” in Information Products Inc., Phoenix, AZ, 1989.
Theory, IEEE Transactions on , Oct 1998 Volume: 44 [27] (Basic Book/Monograph Online Sources) J. K. Author. (year,
[2] David J. C. MacKay, Information Theory, Inference, and Learning month, day). Title (edition) [Type of medium]. Volume(issue).
Algorithms Cambridge: Cambridge University Press, 2003. [28] J. Jones. (1991, May 10). Networks (2nd ed.) [Online].
[3] Cover, T.M ,“Broadcast channels,”IEEE Transactions on [29] (Journal Online Sources style) K. Author. (year, month). Title.
Information Theory, IT-18(1):2--14, January 1972. Reprinted in Journal [Type of medium]. Volume(issue), paging if given.
Record of COMSAT, seminar on Multiple User Communications,
UPO43CL, Clarksburg, Maryland, May 1975. Reprinted in Key
Papers in the Development of Information Theory. IEEE Press,
1974. ed. by D. Slepian.
[4] T. Cover and J. Thomas, Elements of Information Theory, Wiley &
Sons, New York, 1991. Second edition, 2006.
[5] Thomas M. Cover “An Achievable Rate Region for the Broadcast
Channel”IEEE Transactions on Information Theory,
IT-21(4):399--404, July 1975.
[6] S. Diggavi and T. Cover. Is maximum Entropy noise the worst?
Proceedings of IEEE International Symposium on Information
Theory, June 1997, Ulm, Germany, p. 278. n.
[7] C. J. Kaufman, Rocky Mountain Research Lab., Boulder, CO,
private communication, May 1995.
[8] Thomas M. Cover , “Open Problems in information Theory,” IEEE
IEEE USSR Joint Workshop on Information Theory, IEEE Press,
35 - 36, December 1975.
[9] Marton, K.“A coding theorem for the discrete memoryless
broadcast channel” Information Theory, IEEE Transactions
on ,Volume 25, Issue 3, May 1979 Page(s): 306 - 311
[10] El Gamal, A.van der Meulen, E. , “A proof of Marton's coding
theorem for the discrete memoryless broadcast channel,”
Information Theory, IEEE Transactions on, Publication Date: Jan
1981 Volume: 27
[11] S. Chen, B. Mulgrew, and P. M. Grant, “An outer bound to the
capacity region of broadcast channels,” Information Theory, IEEE
Transactions on, vol. 4,Publication Date: May 1978
[12] Verdu, S., “Fifty years of Shannon theory ” Information Theory,
IEEE Transactions on ,Volume 44, Issue 6, Oct 1998 Page(s):2057
- 2078 .
[13] Patrick P.Bergmans,Thomas.Cover,“CooPerative Broadcasting,” in
Information Theory, IEEE Transactions on, volume. it-20,May
[14] G. R. Faulhaber, “Design of service systems with priority
reservation,” in Conf. Rec. 1995 IEEE Int. Conf. Communications,
pp. 3–8.
[15] W. D. Doyle, “Magnetization reversal in films with biaxial
anisotropy,” in 1987 Proc. INTERMAG Conf., pp. 2.2-1–2.2-6.
[16] G. W. Juette and L. E. Zeffanella, “Radio noise currents n short
sections on bundle conductors (Presented Conference Paper
style),” presented at the IEEE Summer power Meeting, Dallas, TX,
June 22–27, 1990, Paper 90 SM 690-0 PWRS.
[17] J. G. Kreifeldt, “An analysis of surface-detected EMG as an
amplitude-modulated noise,” presented at the 1989 Int. Conf.
Medicine and Biological Engineering, Chicago, IL.
[18] J. Williams, “Narrow-band analyzer (Thesis or Dissertation style),”
Ph.D. dissertation, Dept. Elect. Eng., Harvard Univ., Cambridge,
MA, 1993.
[19] N. Kawasaki, “Parametric study of thermal and chemical
nonequilibrium nozzle flow,” M.S. thesis, Dept. Electron. Eng.,
Osaka Univ., Osaka, Japan, 1993.