# IETE Technical Review:Draft for submission

Abstract –An expository investigation of Broadcast
channels,their Capacity constraints and basic Information
theory related to Progressive Source Coding for Multi-Rate
I. INTRODUCTION
The purpose of this paper is to discuss broadcast
channels,channel capacity,efficient coding and the various
issues related to them so as to lay out a foundation for
further development of efficient communication schemes.
basics of superposition coding as a predecessor to the
development of other multi-rate broadcast methods.
Broadcasting is the act of simultaneous transmission
to multiple receivers. A Communication channel,when
defined as the physical channel that provides a connection
between transmitter and receiver,is often misconceived to
be a static unchanging medium. However for proper
analysis from the perspective of communications
engineers we need to be aware of the various
manifestations a channel can have, and how each
parameter of the same channel can vary for different
stimuli.
A Note on Capacity:
A broadcast channel can be visualized as having one
transmitter and many receivers,and can be characterized
by the function P(Y1=y1,Y2=y2..Yn=yn/X=x1).This
conditional probability function denotes the probability of
reception of various values of Y by different receivers,for
a given value of X,where X and Y are random variables.
With knowledge of the channel induced transitions that
occur,we also gain knowledge of the entropy involved :
H(X/Y).This quantity denotes how uncertain we are of X
after being given Y. If H(X/Y)=0 then the channel is error
less because this means that X=Y.H(X/Y) is therefore one
of the best indicators of the information loss in a channel.
Every source has an entropy H(X),which denotes the
average information per source symbol. Even thought the
rate of input bits may be R bits/sec, the rate of information
is only:Din=H(X)R bits/sec. More importantly,the rate of
transmitted bits is:Dt=[H(X)-H(X/Y)]R bits/sec
In the case of a channel with H(X)=H(X/Y) we are
better off flipping a coin to decide values for the output Y
ourselves.
I(X;Y) is the mutual information provided by the
output Y about the input X. The value of I(X;Y)
maximized over the set of input symbol probabilities p(xi)
is a quantity that depends only on the characteristics of a
channel (provided it is discrete and memoryless),and is
influenced through the conditional probabilities
p(y/x).This quantity is called the capacity of the channel
'C' =max{I(X;Y)} bits=max{H(X)-H(X/Y)} bits
A broadcast channel is in many cases the practical
subversion of a the unicast channel. A single transmitter
must decide upon an interesting rate to transmit
information efficiently to many receivers. This rate will
most definitely depend on the capacity of the channel,but
the crux of the argument is that a broadcast channel is
essentially made up of a number of individual channels
each with their respective capacities C1,C2...Ck,and the
transmitter doesn't know the true channel characteristics
of all these 'K' channels. Therin lies the challenge.
There two basic approaches to broadcasting
information over a channel:
1) Send at a rate=Cmin=min{C1,C2...CK}
2) Send at a rate=Cmax=max{C1,C2...CK},which
would lead to transmission in the best channel alone (I.e
R1=R2..,all except the best rates.=0)
A third more efficient approach,known as time
sharing,allocates proportions of time λ1, λ2... λk to each of
C1,C2....CK in such a way that the actual rate of
information transmission of channel j is

R
j
=

j≤k
¦ λ
j
C
j
)
However the ultimate aim of our discussion is to
develop a scheme that exceeds these limits of
performance by distributing precision and nesting data,to
make multi rate broadcast resemble K unicast operations
as closely as possible.
Hence we refuse to submit to the use of any of these
three schemes.
II. DEFINITION OF A BROADCAST CHANNEL
formal definition as initiated by Cover[3] is as follows:
Definition:A broadcast channel consists of an input
alphabet X and two output alphabets
y
1
and
y
2
and
a probability transition function which characterizes the
channel,given by:
P¦ y
1
n
, y
2
n
/ x
n
)

The broadcast channel is said to be memoryless if
Dr.V.Vaidehi,Prashanth.B
Department of Electrical Engineering, Madras Institute of Technology,
vaidehi@annauniv.edu,prashanthseven@gmail.com
IETE Technical Review:Draft for submission
P ¦ y
1
n
, y
2
n
/ x
n
)=

i=1
n
p ¦ y
¦1i)
, y
¦ 2i)
/ x
i
)
in a memory less broad cast channel the occurrence of
error in one interval or symbol period doesn't affect other
symbol periods. Examples of channels with memory
burst errors. A more general representation of broadcast
channels is given in [1] as being three finite sets given by
(X,P(y1,y2/x),Y1X Y2),the interpretation being that x is the
input to a broadcast channel with conditional probability
p(y1,y2/x) resulting in the outputs y1 and y2.The successful
reception of an n bit codeword transmitted as X in such a
channel will only depend,due to the amnesia of the
channel,upon the successful reception of each of the bits
in the word by each of the receivers.
Definition: The rate pair (R1,R2)is said to be achievable
for the broadcast channel if there exists a sequence of
¦¦ 2
nR
1
, 2
nR
2
) , n) codes with
P
e
¦n)
-0
Apart from the
fact that both codes R1 and R2 must have a probability of
error that tends to zero,this definition is very similar to the
channel coding argument that postulates the existence of a
set of codes for which
p
e
-0
as 'n',the code block
length,tends to infinity (provided Rc < Ro).
Why does the code block length affect the probability
of error?simply put there are 2
n
availiable vertices of a
hypercube,out of which we only select some M.For
different selections of M there exist different
communication systems,and hence an ensemble of
2
nM

possible choices.
Visualizing Distance Properties with Code Cube
The probability of error of a communication system that
chooses M coded waveforms out of a possible 2
nM
possible sets of codes is upper bound by the probability of
error of a system that uses only 2 equally likely words.
This means that:
P
e
¦
¯
X
k
)≤P
2
¦
¯
x
i
, x
j
)

where
P
2
¦
¯
x
i
, x
j
)
is the ensemble average over all
2
nM
systems for a scheme that uses only the code words
x
i
, x
j
.
By defining the cutoff rate
R
0
=log
2
¦2 /1+e
−E
c
/ N
0
)
and realizing that
the Q function (born out of the probability of error of a
binary system) is bounded by

.
2dE
c
/ N
0
)=e
−dE
c
/ N
0
we get:

P
e
2
−n¦ R
0
−R
c
)
this means that whenever Rc<Ro the average probability
of error
P
e
¦n)
-0
as
n -∞
Definition: The capacity region of a broad cast channel is
the closure of the set of achievable rates.
These three definitions will suffice for a basic analysis.
II. SYMMETRIC CHANNELS
Understanding a Symmetric Discrete Memoryless
Channel (DMC),i.e a channel characterized by a set of
conditional probabilities [pij] which may or may not be
equal, is vital for the design of Multi-rate codes and so a
deeper inspection is not uncalled for.The Binary
Symmetric Channel(BSC) is a special case of a Discrete
Memoryless Channel,wherein there is only one common
value for the probability of erroneous reception,p. To
further clarify this statement,the channel is characterized
by the conditional probabilities:
Pr( Y = 0 | X = 0 ) =Pr( Y = 1 | X = 1 ) =1-p
Pr( Y = 0 | X = 1) =Pr( Y = 1 | X = 0 ) =p

Discrete Memoryless Channel
TABLE I
Capacities of some frequently encountered channels
Channel Capacity
1
Binary Symmetric Channel
1−H¦ p)
2 AWGN memoryless
max

0
q

−∞

p¦ y/ x) p¦ x)
3
Waveform Channel:
 Bandwidth
limitation
 Power
Constraints
 Gaussian
Noise
A very celebrated formula:
wlog ¦1+p
av
/ wN
0
)
X
Y1
Y2
1
q
1
p
q
q
IETE Technical Review:Draft for submission
The channel model represented above is actually 2
BSC channels. The first BSC,between X and Y1,has a
capacity C1=1;The second BSC,between X and Y2,has a
capacity C2=1-H(P).This is because the conditional
probability of transition for the second channel is p(and
q=1-p),because of which the maximum information is
bounded by the value of 1-H(P).
I ¦ X ; Y )=H ¦ X )−H ¦Y / X )
=
H ¦Y )−

x
p¦ x) H ¦Y / X =x )
=
H ¦Y )−

x
p¦ x) H ¦ p)
which means that

I ¦ X ; Y )≤¦1−H ¦ p))=C
But why can't the rate exceed this value?
This is because on correct decoding the receiver acquires
2 pieces of information:(i)the transmitted word of length
log
2
¦ M )
bits and(ii)an error word statistically
independent from (i).In a collection of n bits the number
of possible errors that can occur is
2
n
,however the
probable errors are limited to a range defined by the
hamming weight n(p±δ).This results in a typical set of
error sequences ,where,to borrow from [15]almost each
one is almost equally probable.Each of these error words
has an error probability of
2
−nH¦ p)
and so each conveys
information equal to nH(p).Now net bits in possession:
log
2
¦ M )+nH ¦ p)
but actually only n bits were transmitted,and so
log
2
¦ M )+nH ¦ p)≤n
log
2
¦ M )/ n=1−H ¦ p)
A Note on the Input Distribution of a Symmetric
Discrete Memoryless Channel:
Very often the choice of equally probably input
symbols maximizes the average mutual information
thereby achieving the capacity of the channel. However it
must be noted that this is not always the case,and that
such an equi-probable distribution will only result in
maximum information when the channel transition
probabilities exhibit a certain symmetry. More
specifically,when each row of the probability transition
matrix 'P' is a permutation of any other row,and each
column is a permutation of any other column,the
probability transition matrix is symmetrical and equi-
probable inputs will maximize I(X;Y).Consider for
example,the following two channels
Channel(i) Channel (ii)
Transition probability values borrowed from [4].
.Since the probability transition matrices are:
Channel (i)
P
1
=
¦
0.5
0.1
)
P
2
=
¦
0.3
0.3
)
P
1
=
¦
0.1
0.5
)
Channel (ii)
P
1
=
¦
0.6
0.3
)
P
1
=
¦
0.3
0.1
)
P
1
=
¦
0.1
0.6
)
We expect the first channel to achieve it's capacity
through equi-probably symbols,and the second channel to
achieve it's capacity through a set of symbols distributed
so as to maximize I(X;Y) (when expressed as a function
of p(x1)=p and p(x2)=(1-p))
This leads to a further conclusion regarding the input
probabilities {P(xi)} that maximizes I(X;Y) which can be
better illustrated through the analysis of a cost function
through which we define C(X) as:
C ¦ X )=I ¦ X ; Y )−λ

j =0
q−1
P ¦ x
j
)−1
The cost function denotes that I(X;Y) need not always
equal C(X)and will only do so for a certain probability
distribution (which may or may not be the equi-probable
distribution) .By maximizing C(X) as follows:

∂P ¦ x
k
)
C ¦ X )=

∂P ¦ x
k
)
¦

j=0
q−1
p ¦ x
j
) I ¦ x
j
; Y ))

∂ P¦ x
k
)
¦\

j =0
q−1
p ¦ x
j
)+\)=0
Where P(xk) is the optimum distribution.On
simplification we find that
I ¦ x
k
; Y )=log ¦e)
.This is an expected and yet sometimes slighted result
which denotes that,for the optimal distribution
P ¦ x
k
) , I ¦ x
k
; Y )
is constant over all xk and
moreover,the capacity of the Symmetric Discrete
memoryless Channel is:
C=max
p¦ x
j
)
I ¦ x
k
; Y )
Going back to the 2 channel example presented earlier,the
first channel achieves
I ¦ x
1
; Y )=I ¦ x
2
; y )=C
for an
equi-probable distribution,while the second channel
doesn't.
Orthogonal Channels:
The best possible scenario for someone who wishes to
broadcast information is to be presented with a channel in
which communication to one receiver,in no way interferes
with communication to another. Further,if the channel
matrices of all channels involved only contain either 1's or
0's,i.e they are perfectly noiseless channels,then all their
capacities will be 1 bit/transmission.
Consider a source X that broadcasts information to
(I(X/Y1),I(X,Y2))=(1,1) can be achieved by:
IETE Technical Review:Draft for submission
 Choosing the input probabilities in such a way
that information conveyed is maximized,as
 Efficiently symbolizing the input alphabet in such
a way that C1 and C2 are jointly achievable will
then result in maximum capacity.
For example,if C1=C2=1 bit/transmission (perfectly
noiseless channel discussed earlier),then an input
probability distribution of P{x=i}=1/4 will achieve
(I(X/Y1),I(X,Y2))=(1,1).Then assuming there are n
possible input bits u Є{I,2,3...n} that we wish to transmit
to Y1 and n possible input bits v Є {1,2,3...n} that we wish
to transmit to Y2,there will be a total of
n
2
combinations(assuming that we transmit one member of u
and one of v,together).To uniquely represent these n
2
possibilities we need to make use of a same number of
symbols s Є{1,2,... n
2
receives a symbol s1 it associates it with the corresponding
member of u while every time Y2 receives the same
symbol s1 it associates it with a member of v. Hence
the achievable rate region for orthogonal channels is:
Achievable Rates for an Orthogonal Channel
The noiseless of the channel,however,is no crucial.
An orthogonal broadcast channel will still maintain it's
position of superiority over other types of broadcast
channels in the sense that (R1,R2)=(C1,C2) can still be
BSC channels each with parameter P1,P2.In this case each
of the individual channels will have capacity CN=1-
H(PN).The point here is that any R12 (a common rate of
0≤R
12
≤min ¦C
1,
C
2
)
can be
achieved,and if C1=C2 then R12=R1=R2 achieves maximum
capacity.
paradigms for the analysis of further,more practical
broadcast channels discussed in preceding sections. The
capacity regions of these channels will always lie within
the region of the orthogonal channel,as there will always
be some destructive interference between receivers.
In this section we discuss, Incompatible,Gaussian and
the capacity region established by Marton. An
understanding of these channels is essential for the
analysis and development of multi-rate broadcast
systems.
Martons Inner Bound:
The capacity region of a general broadcast channel is
still to be established (save certain special cases),but
Marton [9] has established an inner bound. The inner
bound states that any (R1,R2)ЄR0 is achievable for a
provided R1 and R2,individually,are both bounded by the
information we wish to convey to receiver Y1and Y2
respectively,and (R1,R2) are both bounded by the net
information we wish to convey across the channel.
Assume we wish to send the auxillary variable u to
according to Martons bound:
any
R
0
=¦ R
1,
R
2
): R
1,
R
2
≥0
is achievable provided

R
1
I ¦U ; Y
1
) , R
2
I ¦V ; Y
2
)
and

R
1
=R
2
I ¦U ; Y
1
)+I ¦V ; Y
2
)−I ¦U ; V )
for the DBMC
¦ X , p ¦ y
1,
y
2
/ x ) , Y
1
×Y
2
)
.
This trivial bound can be achieved by binning,as
depicted in [10] and [11].
The worst cases of incompatibility for simultaneous
communication.
Switch-To-Talk:The switch to Talk broadcast channel, as
described in Cover [3] consists of a single transmitter and
2 or more receivers (at least for the sake of analysis,this is
the model chosen).The transmitter does not transmit the
same message to both receivers,nor does it make use of
the same alphabet. The idea is that when the sender
wishes to communicate with Y1 he uses x ЄX1 and when
he wishes to communicate with Y2 he uses x ЄX2.
Naive Time Sharing
A common analogy used to describe this type of
channel is a situation in which you,a speaker fluent in 2
languages, needs to communicate with 2 people,each of
whom can only understand 1 language. A simple
benchmark for this channel is naive time sharing as
described in the introduction section of this paper and also
IETE Technical Review:Draft for submission
in [1] and [3].Since both (R1,R2)=(C1,0) and
(R1,R2)=(0,C2) are achievable by devoting a fraction of
the transmission time
\
to A and
¯\
to B one can easily
achieve (R0,RA,RB)=
¦0, \C
1,
¯\C
2
)
.
We can do much better than this,however.
Consider the earlier stated example of speaking 2
languages. You talk in English and Hindi while each of
the listeners can only understand either one language. If
naive time sharing is used ,you would be using periods of
time
\
to differentiate the messages of receiver (1) and
can distinguish if a particular word is spoken in his
language or not. Then,if at a particular instant of time a
hindi word is spoken,the Indian would get some
information in the form of a message, while both the
Indian and the American would get information about the
source of the message (they would both realize that the
source is the Hindi script).
In the switch-to-talk channel,if channel 1 is used
o
of the time and channel 2 is used
¦1−o)
proportion of
the time,
H ¦o)
achieved. This is different from the naïve time sharing
bound in that, o is usually chosen based on source
probabilities so as to facilitate the perfect transmission of
one of 2
¦ nH¦o))
additional messages to Y1 and Y2.The
H ¦o)
additional bits transmitted in such a way give
knowledge of the source. Thus all (R1,R2) of the form
¦oC
1
¦ 0)
+H ¦o) , ¯ oC
2
¦0)
+H ¦o))

can be achieved,resulting in a capacity that dominates that
of naïve time sharing.
Switch-to-talk capacity region
The Worst case of Incompatibility:2 channels,so
incompatible that one can do no better than time sharing.
This type pf channel has very detrimental interference
coupled from one channel to the other. If X wishes to
communicate with Y1,he must send pure noise to
Y2.Borrowing the description of such a channel from [1,2]

P
1
=
¦
1 0
0 1
0.5 0.5
0.5 0.5
)

P
2
=
¦
0.5 0.5
0.5 0.5
1 0
0 1
)
as the transition matrices of the two channels C1 and C2 of
a system with X={1,2,3,4},Y1={1,2},Y2={1,2}.
We see that to find the capacity, we need to maximize
I(X/Y1),I(X/Y2) over the input probability distribution.
To see how exactly this sort of channel functions,consider
the following scenario:

o=p
1
+p
2
;
¯
o=p
3
+p
4
where P(X=i)=pi, ,
H ¦Y
1
)=H ¦ p
1
+o/ 2)
(or p2,since anyway to find the
maximum information we will maximize from
0≤p
1
≤o
)
H ¦Y
1
/ X )=o
by definition,the average information
transmitted to Y1 by X is
o
.
this results in the capacities:

C
1
=o; C
2
=
¯
o
when
o
=1 C1=1 but C2=0 (it receives pure noise and no
information).
Gaussian Channels: The unpredictability involved in
broadcasting often makes it's analysis more challenging
than unicasting.One transmitter has to efficiently code and
transmit data to n receivers in such a way that a rate as
close to the capacities of each of those n channels is
achieved.The Gaussian Channel is a time-discrete channel
that appears to draw noise from an independent and
identically distributes Gaussian distribution,add it to the
input, and generate an output that doesn't depend on the
input alone.The easiest way to represent this channel is:

Yi=Xi+Zi Zi~N(0,N)
transmitted symbol perfectly.
To transmit data efficiently over this channel we establish
a power limitation on any codeword,denoted as:
1/ n

i=0
n−1
x
1
=P
for a codeword (x1,x2......xn),and one of the simplest
ways to transmit data would be to send one of 2 levels
.P , −.P
.In such a scheme,when the noise in the
channel overwhelms the signal power,a misrepresentation
of data occurs. Exploiting this fact one can convert a
continuous Gaussian channel into a Discrete Channel with
cross-over probability Pe,where Pe is the probability of
error defined according to the input signal levels used (for
a
.P , −.P
system it would be :
P
e
=0.5P¦Y 0 / X =.P )
+0.5P¦Y >0/ X =−.P) )
The power constraint P however,leads to a subversion
IETE Technical Review:Draft for submission
of the classical formula for channel capacity
C=maxI(X;Y).Even though this formula still holds
good,the typical influences of signal power and noise are
not reflected in it.One needs only to realize that the noise
distribution,and hence the output distribution (since
Y=X+Z) are both normal,due to which the entropy of each
can be found easily in terms of the cumulative normal
function Φ(x).In short,the entropy of both Z and Y are
1
2
log2nec
2
bits (because Φ¦ x)=
1
.2nc
2
e
x
2
2 c
2
and
h¦ !)=−

!ln¦ !)
as shown in [6]).Since the variance of Z
is N,and the variance of Y is (P+N) were P is E¦X
2
) ,the
mutual information is bounded by:

I ¦ X ; Y )≤
1
2
log¦2ne¦ P+N ))−
1
2
log ¦2ne N )
since
C=max
E¦ X
2
)≤P
I ¦ X ;Y )
the capacity of a gaussian channel with power constraint P
and noise variance N is:

C=
1
2
log ¦1+
P
N
)
bits per transmission...(1)
and for a channel bandlimited to W,since there are 2W
samples per second:

C=2Wlog ¦1+
P
N '
)
bits per second.
Where N' is the noise spectral density=N0/2 watts/Hz.
Now consider the time-discrete Gaussian Broadcast
own noise distributions Z1,Z2,.mean zero,and noise
variance N1,N2.It is well known that each of these
channels has a capacity given by (1),but for efficient
transmission a scheme that time-shares these capacities is
s2,the data stream meant for Y2,onto the sequence s1 meant
for Y1.If the transmitted sequence actually consists of 2
streams of information s1,s2 in such a way that s2 is
intended for Y2 and s1 is intended for Y1,then the received
sequence are y1=s1+s2+z1 and y2=s1+s2+z2.Hence s1 and
z2 are considered noise by Y2,and S2 and Z1 contribute to
the loss of information with respect to Y1. If the net signal
power is S,let
oS , ¯ oS
be the signal power proportions
devoted to transmitting information to Y1 and Y2
respectively. Then,noise power felt by Y2 will be
oS
+N2 resulting in:

C
2
¦o)=
1
2
log ¦1+
¯
oS
oS +N
2
)
If we proceed along the same lines and analyze receiver
Y1 we may conclude that:

C
1
¦o)=
1
2
log ¦1+
oS
¯
oS +N
1
)
However,a better rate is definitely possible if we only
realize that,more often than not,one receiver experiences
more noise than the other,i.e when N1<N2.
Here we introduce superposition coding in one of it's
most fundamental manifestation, subtractive de-
coding.Based on the assumption that any data stream
decodable by receiver Y2 is also decodable by Y1 (since
C2 is more hostile to data transmission),we can decode S2
at Y2 and then find the data stream intended for receiver
Y1 by subtracting s2 from y1.The receiver Y1 therefore
correctly receives both s1 and s2 by first decoding s2,
which has been embedded in the overall data stream,and
then using this knowledge to isolate s1+z1.Conversely,the
rate pair:

R
1
=
1
2
log¦1+
¯ oS
o S+N
2
)+
1
2
log ¦1+
oS
N
1
)

R
2
=
1
2
log ¦1+
¯ oS
oS+N
2
)
is simultaneously possible,and it is clear that our previous
conclusion is over ruled simply because a much higher
rate is possible for Y1.
IV. SUPERPOSITION CODING
An engineer who wishes to broadcast music at the
best possible quality to all listeners is posed with a
predicament. He can either:
 Prepare for the worst and transmit monoaural
quality music to everyone.
 Hope for the best and transmit stereophonic
quality music to everyone.
The results depicted in [13] imply that he needn't resort to
either option (or equally,that he can use both). Rather, he
can create a data stream that contains both monoaural and
stereophonic music by superimposing the latter on the
former,and transmit this data to all receivers in such a way
that the quality of music heard by a listener is dictated by
the noise power present in the channel between him and
the transmitter.When Noise in the channel (N)>Noise
threshold (Nt),a receiver can do no better than recover the
monoaural data stream alone;however when N<Nt a
receiver can first recover the monoaural data stream, and
then use it to fine tune its reception so that stereophonic
music is heard.
A Note on Codebook Sizes:.
The transmission of an 'n' bit codeword will result in
the reception of a vector of power n(P+N).The space of
.n¦ P+N ) ,and it is within this sphere that any possible
codeword will be mapped.The received vectors are
normally distributed with mean equal to the true
codeword and variance equal to the noise variance,which
means that they will very rarely be mapped as the exact
transmitted word but have a high probability of being
mapped inside a sphere of radius .n¦ N +c) around the
true codeword.These decoding spheres denote the limits
of error the decoder can tolerate,any transmittes vector
will only result in an error if it is plotted outside of it's
decoding sphere.The volume of an n dimensional
sphere,as given in [6] is
C
n
¦r
n
)
where r is the radius of the
IETE Technical Review:Draft for submission
sphere.Hence the number of decoding spheres of radius
.n¦ N ) that can be packed in a sphere of radius
.n¦ P+N ) is:
C
n
¦n ¦ P+N )
n
/ 2)
C
n
¦nN )
n
/ 2
=¦1+
P
N
)
n/ 2
=2

1
2
log¦1+
P
N
)
Efficient transmission over a channel of capacity C
can be achieved using a ( 2
¦nC)
,n ) codebook because we
need a codeword to represent each 'decoding sphere',and
there are 2
¦nC)
decoding spheres in a larger sphere of
Outline of Achievability:
Consider a noiseless channel along with a BSC channel
of parameter
p
.The noisy channel has lower capacity
C(
p
),and the number of codes in it's codebook is limited
to 2
¦nC
2
¦ p )−c)
.Fewer codewords leads to a higher noise
tolerance however,because the constellation points are
more spaced out .This noise tolerance is exploited to pack
in extra bits of information that are not decodable by
Y2,but carry meaningful information for Y1.
A ¦2
nC
2
¦ p )
, n) codebook can be constructed for
channel 2,with low probability of error. Normally one
would choose one word out of a possible 2
n

,transmit it,and decode it based on it's hamming
distance from any of one 2
nC
2
¦ p)
words in the
codebook.
In superposition coding we construct a code of this
type for a channel X (which is noisier than channel
2),and pack in extra information for Y1 that will still
keep the codeword within the hamming distance of
the intended word in the codebook of channel X.
Assume we have a broadcast channel that consists of 2
BSC's,one perfect and the other with parameter p. Even
though the worse of the two channels only has parameter
p,we design our code for a channel X,with parameter
BSC of parameter
o
).Thus our basic codebook will only
have 2
¦nC ¦o¯p+¯ o p )−c)
codewords,but a noise tolerance=n
bits(probability that each bit is wrong)=n( ¦o¯p+¯o p) ).
 These codewords form the cloudcenteres.
Each 'cloud' will have radius =noise tolerance of channel
X,n( ¦o¯p+¯o p) ),and withing each cloud will be a set of
points distinguishable only by channel 1.
 The number of such points will be 2
nH ¦o)
This is because the number of probable errors for each
codeword/cloudcenter is 2
nH ¦o)
,as explained in part II of
this paper.Only, in our case they are not really errors,but
information bits meant for Y1.Hence the extra
information we can transmit to Y1,with each codeword
sent to Y2, is H¦o) .Resulting in the rates:
 R1= C¦o ¯p+¯o p)+H ¦o)
R2= C¦o ¯p+¯o p)
A Note on Codebook Generation:
First generate 2
nR
2
codewords of length n to give the
cloud centers, u
n
¦w
2
) [1].Then for each of these 2
nR
2
nR
1
code
words x
n
¦w
1,
w
2
) .
Cloudcenter
To transmit the pair (w1,w2) send the codeword
x
n
¦w
1,
w
2
) .The cloudcenter u
n
¦w
2
) is never actually
sent.
This code structure allows the transmission of a code
word (r,s) where r is received and decoded by both Y1
and Y2 while S is decoded only by Y2.It must be noted
that the parameter o controls the ration of power
allocated to the two data streams.If o is high it means the
number of cloud centers is low and there are more 'error'
points within each cloud.Power allocated to data stream r
depends on the number of bits used to locate each cloud
center,and is proportional to (1- o ) ; Power allocated to
data stream s depends on the number of bits used to
identify each point within the clouds,and is proportional
to o .To better elucidate this point we make a brief
venture into the actual coding schemes used for multi-rate
Multi-rate Signal Constellation
The first two bits of each word denote the 'cloud
centers'.Receivers in noisy channels can only decode
empty dots,while receivers in better channels can decode
black dots.For a fixed rate code in which the number of
points cannot change, o determines the Euclidian
distance between constellation points.
“Superposition coding dominates frequency multiplexing
which in turn dominates time multiplexing”,as shown in
[13].
V. CONCLUSIONS
We are trying to develop a superposition coding
technique that will function efficiently in wireless fading
IETE Technical Review:Draft for submission
channels,and this paper lays the ground work for further
analysis and design of such variable-rate codes. Our
coding scheme includes layering of data and the use of
auxillary random variables,or virtual signals,that will only
participate in the construction of the code;one useful idea
is that of achieving Multi-rate Broadcast using
Superposition Turbo TCM.
REFERENCES
Theory, IEEE Transactions on , Oct 1998 Volume: 44
[2] David J. C. MacKay, Information Theory, Inference, and Learning
Algorithms Cambridge: Cambridge University Press, 2003.
[3] Cover, T.M ,“Broadcast channels,”IEEE Transactions on
Information Theory, IT-18(1):2--14, January 1972. Reprinted in
Record of COMSAT, seminar on Multiple User Communications,
UPO43CL, Clarksburg, Maryland, May 1975. Reprinted in Key
Papers in the Development of Information Theory. IEEE Press,
1974. ed. by D. Slepian.
[4] T. Cover and J. Thomas, Elements of Information Theory, Wiley &
Sons, New York, 1991. Second edition, 2006.
[5] Thomas M. Cover “An Achievable Rate Region for the Broadcast
Channel”IEEE Transactions on Information Theory,
IT-21(4):399--404, July 1975.
[6] S. Diggavi and T. Cover. Is maximum Entropy noise the worst?
Proceedings of IEEE International Symposium on Information
Theory, June 1997, Ulm, Germany, p. 278. n.
[7] C. J. Kaufman, Rocky Mountain Research Lab., Boulder, CO,
private communication, May 1995.
[8] Thomas M. Cover , “Open Problems in information Theory,” IEEE
IEEE USSR Joint Workshop on Information Theory, IEEE Press,
35 - 36, December 1975.
[9] Marton, K.“A coding theorem for the discrete memoryless
broadcast channel” Information Theory, IEEE Transactions on
,Volume 25, Issue 3, May 1979 Page(s): 306 - 311
[10] El Gamal, A.van der Meulen, E. , “A proof of Marton's coding
theorem for the discrete memoryless broadcast channel,”
Information Theory, IEEE Transactions on, Publication Date: Jan
1981 Volume: 27
[11] S. Chen, B. Mulgrew, and P. M. Grant, “An outer bound to the
capacity region of broadcast channels,” Information Theory, IEEE
Transactions on, vol. 4,Publication Date: May 1978
[12] Verdu, S., “Fifty years of Shannon theory ” Information Theory,
IEEE Transactions on ,Volume 44, Issue 6, Oct 1998 Page(s):2057
- 2078 .