Professional Documents
Culture Documents
formation theory. Many new properties of the coherent information and entanglement fidelity are
proved. Two non-classical features of the coherent information are demonstrated: the failure of
subadditivity, and the failure of the pipelining inequality. Both properties arise as a consequence
of quantum entanglement, and give quantum information new features not found in classical in-
formation theory. The problem of a noisy quantum channel with a classical observer measuring
the environment is introduced, and bounds on the corresponding channel capacity proved. These
bounds are always greater than for the unobserved channel. We conclude with a summary of open
problems.
∗
hbarnum@tangelo.phys.unm.edu
†
mnielsen@tangelo.phys.unm.edu
‡
schumacb@kenyon.edu
1
cludes with a summary of our results, a discussion of bound on the rate at which information can be reliably
the new features which quantum mechanics adds to the transmitted through the channel. The other part demon-
problem of the noisy channel, and suggestions for further strates that there are coding and decoding schemes which
research. attain this bound, which is therefore the channel capac-
ity. We do not prove such a channel capacity theorem in
this paper. We do, however, derive bounds on the rate at
II. NOISY CHANNEL CODING which information can be sent through a noisy quantum
channel.
The problem of noisy channel coding will be outlined in
this section. Precise definitions of the concepts used will
be given in later sections. The procedure is illustrated in III. QUANTUM OPERATIONS
figure 1.
What is a quantum noisy channel, and how can it be
Channel Channel
Source Input Output Receiver described mathematically? This section reviews the for-
N D
Encoding Channel Decoding malism of quantum operations, which is used to describe
ρs C ρc ρo ρr noisy channels. Previous papers on the noisy channel
problem [4–8] have used apparently different formalisms
FIG. 1. to describe the noisy channel. In fact, all the formalisms
The noisy quantum channel, together with encodings can be shown to be equivalent, as we shall see in this sec-
and decodings. tion. Historically, quantum operations have also some-
times been known as completely positive maps or super-
There is a quantum source emitting unknown quantum scattering operators. The motivation in all cases has been
states, which we wish to transmit through the channel to to describe general state changes in quantum mechanics.
some receiver. Unfortunately, the channel is usually sub- A simple example of a state change in quantum me-
ject to noise, which prevents it from transmitting states chanics is the unitary evolution experienced by a closed
with high fidelity. For example, an optical fiber suffers quantum system. The final state of the system is related
losses during transmission. Another important example to the initial state by a unitary transformation U ,
of a noisy quantum channel is the memory of a quan-
ρ → E(ρ) = U ρU † . (3.1)
tum computer. There the idea is to transmit quantum
states in time. The effect of transmitting a state from Although all closed quantum systems are described by
time t1 to t2 can be described as a noisy quantum chan- unitary evolutions, in accordance with Schrödinger’s
nel. Quantum teleportation [9] can also be described as a equation, more general state changes are possible for
noisy quantum channel whenever there are imperfections open quantum systems, such as noisy quantum channels.
in the teleportation process [6,10]. How does one describe a general state change in quan-
The idea of noisy channel coding is to encode the quan- tum mechanics? The answer to this question is provided
tum state emitted by the source, ρs , which one wishes to by the quantum operations formalism. This formalism is
transmit, using some encoding operation, which we de- described in detail by Kraus [11] (see also Hellwig and
note C. The encoded state is then sent through the chan- Kraus [12]) and is given short but detailed reviews in
nel, whose operation we denote by N . The output state Choi [13] and in the Appendix to [4]. In this formalism
of the channel is then decoded using some decoding opera- there is an input state and an output state, which are
tion, D. The objective is for the decoded state to match connected by a map
with high fidelity the state emitted by the source. As
in the classical theory, we consider the fidelity of large E(ρ)
ρ→ . (3.2)
blocks of material produced by repeated emission from tr E(ρ)
the source, and allow the encoding and decoding to oper-
This map is a quantum operation, E, a linear, trace-
ate on these blocks. A channel is said to transmit a source
decreasing map that preserves positivity. The trace in
reliably if a sequence of block-coding and block-decoding
the denominator is included in order to preserve the trace
procedures can be found that approaches perfect fidelity
condition, tr(ρ) = 1.
in the limit of large block size.
The most general form for E that is physically reason-
What then is the capacity of such a channel - the high-
able (in addition to being linear and trace-decreasing and
est rate at which information can be reliably transmitted
preserving positivity, a physically reasonable E must sat-
through the channel? The goal of a channel capacity the-
isfy an additional property called complete positivity),
orem is to provide a procedure to answer this question.
can be shown to be [11]
This procedure must be an effective procedure, that is, an
explicit algorithm to evaluate the channel capacity. Such Ai ρA†i .
X
E(ρ) = (3.3)
a theorem comes in two parts. One part proves an upper i
2
The
P † system operators Ai , which must satisfy can interact unitarily. Conversely, it tells us that any
i Ai Ai ≤ I, completely specify the quantum opera- such unitary interaction with an initially uncorrelated
tion. In the particular case of a unitary transformation, environment gives rise to a trace-preserving quantum op-
there is only one term in the sum, A1 = U , leaving us eration. Both these facts are useful in what follows. The
with the transformation (3.1). picture we have of a quantum operation is neatly sum-
A class of operations that is of particular interest are marized by the following diagram.
the trace-preserving or non-selective operations. Physi-
cally, these arise in situations where the system is coupled
to some environment which is not under observation; the
effect of the evolution is averaged over all possible out-
comes of the interaction with the environment. Trace- ρQ Q - Q′
preserving operations are defined by the requirement that 6
X †
Ai Ai = I. (3.4) U QE
i
?
This is equivalent to requiring that for all density opera-
tors ρ, E
tr(E(ρ)) = 1, (3.5)
explaining the nomenclature “trace-preserving”. Notice FIG. 2.
that this means the evolution equation (3.2) reduces to
the simpler form
ρ → E(ρ), (3.6) Here, Q denotes the state of the system before the in-
teraction with the environment, and Q′ the state of the
when E is trace-preserving. system after the interaction. Unless stated otherwise we
The following representation theorem is proved in [11], follow the convention that Q and Q′ are d-dimensional.
[13], and [4]. It shows the connection between trace- The environment system E and the operator U QE might
preserving quantum operations and systems interacting be chosen to be the actual physical environment and its
unitarily with an environment, and thus provides part of interaction with Q, but this is not necessary. The only
the justification for the physical interpretation of trace- thing that matters for the description of noisy channels
preserving quantum operations described above. is the dynamics of Q. For any given quantum operation
Theorem (representation theorem for trace-preserving E there are many possible representations of E in terms
quantum operations): of environments E and interactions U QE . We always
Suppose E is a trace-preserving quantum operation on assume that the initial state of E is a pure state, and re-
a system with a d-dimensional state space. Then it is gard E as a mathematical artifice. Of course, the actual
possible to construct an “environment” E of at most d2 physical environment, EA , may be initially impure, but
dimensions, such that the system plus environment are the above representation theorem shows that for the pur-
initially uncorrelated, the environment is initially in a poses of describing the dynamics of Q, it can be replaced
pure state σ = |sihs| and there exists a unitary evolution by an “environment”, E, which is initially pure and gives
U on system and environment such that rise to exactly the same dynamics. In what follows it is
this latter E that is most useful.
E(ρ) = trE (U (ρ ⊗ σ)U † ). (3.7)
Shannon’s classical noisy coding theorem is proved for
Here and elsewhere in the paper a subscript on a trace discrete memoryless channels. Discrete means that the
indicates a partial trace over the corresponding system channel only has a finite number of input and output
(E in this case). states. By analogy we define a discrete quantum channel
Conversely, given any initially uncorrelated environ- to be one which has a finite number of Hilbert space di-
ment σ (possibly of more than d2 dimensions, and ini- mensions. In the classical case, memoryless means that
tially impure), a unitary interaction U between the sys- the output of the channel is independent of the past, con-
tem and the environment gives rise to a trace-preserving ditioned on knowing the state of the source. Quantum
quantum operation, mechanically we take this to mean that the output of the
channel is completely determined by the encoded state
E(ρ) = trE (U (ρ ⊗ σ)U † ). (3.8)
of the source, and is not affected by the previous history
This theorem tells us that any trace-preserving quan- of the source.
tum operation can always be mocked up as a unitary evo- Phrased in the language of quantum operations, we as-
lution by adding an environment with which the system sume that there is a quantum operation, N , describing
3
the dynamics of the channel. The input ρi of the channel The picture we have described thus far applies only to
is related to the output ρo by the equation trace-preserving quantum operations. Later in the paper
we will also be interested in quantum operations which
ρi → ρo = N (ρi ). (3.9) are not trace-preserving. That is, they do not satisfy the
relation i A†i Ai = I, and thus tr(E(ρ)) 6= 1 in general.
P
For the majority of this paper we assume, as in the previ- Such quantum operations arise in the theory of general-
ous equation, that the operation describing the action of ized measurements. To each outcome, m, of a measure-
the channel is trace-preserving. This corresponds to the ment there is an associated quantum operation, Em , with
physical assumption that no classical information about an operator-sum representation,
the state of the system or its environment is obtained
Ami ρA†mi .
X
by an external classical observer. All previous work on Em (ρ) = (3.12)
noisy channel coding with the exception of [14] has as- i
sumed that this is the case, and we do so for the major- The probability of obtaining outcome m is postulated to
ity of the paper. In section XI we consider the case of a be
noisy channel which is being observed by some classical X †
observer. Pr(m) = tr(Em (ρ)) = tr( Ami Ami ρ). (3.13)
In addition to the environment, E, it is also extremely i
useful to introduce a reference system, R, in the following
way. One might imagine that the system Q is initially The
P completeness relation for probabilities
part of a larger system, RQ, and that the total is in a m Pr(m) = 1 is equivalent to the completeness re-
pure state, |ψ RQ i, satisfying lation for the operators appearing in the operator-sum
representations
ρQ = trR (|ψ RQ ihψ RQ |). (3.10) X †
Ami Ami = I. (3.14)
mi
Such a state |ψ RQ i is called a purification of ρQ , and it
can be shown [15] that such a system R and purifica- Thus for each m,
tions |ψ RQ i always exist. From our point of view R is
A†mi Ami ≤ I.
X
introduced simply as a mathematical device to purify the (3.15)
initial state. The joint system RQ evolves according to i
the dynamics IR ⊗ E given by
As an aside, it is interesting to note that the formulation
R′ Q′ RQ of quantum measurement based on the projection pos-
ρ = (IR ⊗ E)(ρ ). (3.11)
tulate [16–18], taught in most classes on quantum me-
chanics, is a special case of the quantum operations for-
The overall picture we have of a trace-preserving quan- malism, obtainable by using a single projector Am = Pm
tum operation thus looks like: in the operator-sum representation for Em . The formal-
ism of positive operator valued measures (POVMs) [15]
is also related
P to the generalized measurements formal-
ism: Em ≡ i A†mi Ami are the elements of the POVM
which is measured.
R A result analogous to the earlier representation the-
orem for trace-preserving quantum operations can be
proved for general operations.
RQ
Ψ
Theorem (general representation theorem for opera-
tions)
Q - Q′
Suppose E is a general quantum operation. Then it
6 is possible to find an environment, E, initially in a pure
U QE state σ = |sihs| uncorrelated with the system, a unitary
U QE , a projector P E onto the environment alone, and a
? constant c > 0, such that
E †
E(ρ) = c trE (P E U QE (ρ ⊗ σ)U QE P E ). (3.16)
4
E
and the projectors Pm form a complete orthogonal set, notation E1 , E2 , . . . for the different operations, and
E E E E
P
m mP = I, P P
m m ′ = δ m,m′ Pm . the notation E2 ◦ E1 to denote composition of oper-
Conversely, any map of the form (3.16) is a quantum ations,
operation.
Once again, introducing a reference system R which (E2 ◦ E1 )(ρ) ≡ E2 (E1 (ρ)). (3.18)
purifies ρQ we are left with a picture of the dynamics
that looks like this:
Furthermore it is sometimes useful to use the RQE
picture of quantum operations to discuss composi-
tions. We denote the environment corresponding to
operation Ei by Ei , and assume environments cor-
R responding to different values of i are independent
and initially pure. So, for example, the initial state
RQ
Ψ for a two-stage composition would be
FIG. 4.
IV. ENTROPY EXCHANGE
gives very useful relations amongst Von Neumann Se (ρ, E) ≡ S(ρE ), (4.1)
′ ′
entropies S(ρ) ≡ −tr(ρ log2 ρ), such as S(ρR Q ) = ′
′
S(ρE ) and all other permutations amongst R, Q where ρE is the state of an initially pure environment
and E. These are used frequently in what follows. (the “mock” environment of the previous section) after
the operation, and S(ρ) ≡ −tr(ρ log2 ρ) is the Von Neu-
3. Generically we denote quantum operations by E P †
mann entropy. If E(ρ) = i Ai ρAi then a convenient
and the dimension of the quantum system Q by
form for the entropy exchange is found by defining a ma-
d.
trix W with elements
4. Trace-preserving quantum operations arise when a
system interacts with an environment, and no mea- tr(Ai ρA†j )
Wij ≡ . (4.2)
surement is performed on the system plus environ- tr(E(ρ))
ment. Non trace-preserving operations arise when
classical information about the state of the sys- It can be shown [4,14] that
tem is made available by such a measurement. For
most of this paper the noisy quantum channel is de- Se (ρ, E) = S(W ) ≡ −tr(W log2 W ). (4.3)
scribed by a trace-preserving quantum operation.
5. Sometimes we consider the composition of two (or The last equation is frequently useful when performing
more) quantum operations. Generically we use the calculations.
5
V. CLASSICAL NOISY CHANNELS IN A Exy ≡ |yihx|, (5.5)
QUANTUM SETTING
we find that the channel operation defined by
In this section we show how classical noisy channels X
†
can be formulated in terms of quantum mechanics. We N (ρ) ≡ py|x Exy ρExy . (5.6)
xy
begin by reviewing the formulation in terms of classical
information theory. is a trace-preserving quantum operation, and that
A classical noisy channel is described in terms of dis-
tinguishable channel states, which we label by x. If the
X
N (ρX ) = ρY = p(y)|yihy|, (5.7)
input to the channel is symbol x then the output is sym- y
bol y with probability py|x . The channel is assumed to
act independently where ρY is the density operator corresponding to the
P on each input. For each x, the proba-
bility sum rule y py|x = 1 is satisfied. These conditional random variable Y that would have been obtained from
probabilities py|x completely describe the classical noisy X given a classical channel with probabilities py|x . This
channel. gives a quantum mechanical formalism for describing
Suppose the input to the channel, x, is represented by classical sources and channels. It is interesting to see
some classical random variable, X, and the output by a what form the mutual information and channel capacity
random variable Y . The mutual information between X take in the quantum formalism.
and Y is defined by Notice that
where the maximum is taken over all possible distribu- Se (ρX , N ) = H(X, Y ). (5.11)
tions p(x) for the channel input, X. Notice that although
this is not an explicit expression for the channel capacity It follows that
in terms of the conditional probabilities px|y , the maxi-
mization can easily be performed using well known tech- H(X : Y ) = S(ρX ) + S(N (ρX )) − Se (ρX , N ), (5.12)
niques from numerical mathematics. That is, Shannon’s
and thus the Shannon capacity CS of the classical chan-
result provides an effective procedure for computing the
nel is given in the quantum formalism by
capacity of a noisy classical channel.
All these results may be re-expressed in terms of quan- CS = max [S(ρX ) + S(N (ρX )) − Se (ρX , N )] , (5.13)
tum mechanics. We suppose the channel has some pre- ρX
6
entanglement of states. This cannot be done by consid- particular purification |ψ RQ i of ρ that is used [4]. If E has
ering the transmission of a set of orthogonal pure states operation elements {Ai } then the entanglement fidelity
alone. has the expression [4,14]
It is not difficult to see that CS is not correct as a mea-
|tr(Ai ρ)|2
P
sure of how many subspace dimensions may be reliably Fe (ρ, E) = i . (6.2)
transmitted through a quantum channel. For example tr(E(ρ))
consider the classical noiseless channel,
This expression simplifies for trace-preserving quantum
X operations since the denominator is one. The entangle-
N (ρ) = |xihx|ρ|xihx|, (5.14)
x
ment fidelity has the following properties [4,5,14].
7
′ ′
An appropriate measure of how well a subspace of hψi |ρR Q |ψi i, then it is possible to show (see for example
quantum states is transmitted is the subspace fidelity [22], page 240)
′ ′
Fs (P, E) ≡ minhψ|E(|ψihψ|)|ψi, (6.7) S(ρR Q ) ≤ H(p1 , . . . , pd2 ), (6.11)
|ψi
where H(pi ) is the Shannon information of the set pi .
where the minimization is over all pure states |ψi in the But by easily verified grouping properties of the Shan-
subspace whose projector is P. Item 5 above implies that non entropy,
if the subspace fidelity is close to one, the entanglement
p2 p 2
fidelity is also close to one. The converse is not in general H(p1 , . . . , pd2 ) = h(p1 ) + (1 − p1 )H( , . . . , d ),
true. That is, reliable transmission of subspaces is a more 1 − p1 1 − p1
stringent requirement than transmission of entanglement. (6.12)
Therefore using entanglement fidelity as our criterion for p2 p
and it is easy to show that H( 1−p d2
, . . . , 1−p ) ≤ log(d2 −
reliable transmission yields capacities at least as great 1 1
as those obtained when subspace fidelity is used as the 1). Combining these results and noting that p1 =
criterion. We conjecture that these two capacities are Fe (ρ, E) by definition of the entanglement fidelity,
identical. Se (ρ, E) ≤ h(Fe (ρ, E)) + (1 − Fe (ρ, E)) log(d2 − 1),
As an alternative measure of subspace fidelity, one
(6.13)
might consider the average pure state fidelity
Z which is the quantum Fano inequality.
d|ψihψ|E(|ψihψ|)|ψi, (6.8) For applications it is useful to understand the continu-
ity properties of the entanglement fidelity. To that end
where the integration is done using the unitarily invariant we prove the following lemma:
measure on the subspace of interest. By item 4 above, Lemma (continuity lemma for entanglement fidelity)
the capacity resulting from this measure of reliability is Suppose E is a trace-preserving quantum operation, ρ
at least as great as that which results when entanglement is a density operator, and ∆ is a Hermitian operator with
fidelity is used as the measure of reliability. We do not trace zero. Then
know whether these two capacities are equal.
|Fe (ρ + ∆, E) − Fe (ρ, E)| ≤ 2tr(|∆|) + tr(|∆|)2 .
The lesson to be learnt from this discussion is that
there are many different measures which may be used (6.14)
to quantify how reliably quantum states are transmitted,
To prove the lemma we apply (6.2) to obtain
and different measures may result in different capacities.
Which measure is used depends on what resource is most |tr(Ai ρ)| |tr(A†i ∆)| +
X
|Fe (ρ + ∆, E) − Fe (ρ, E)| ≤ 2
important for the application of interest. For the remain- i
der of this paper, we use the entanglement fidelity as our X
|tr(Ai ∆)|2 . (6.15)
measure of reliability.
i
There is a very useful inequality, the quantum Fano
inequality, which relates the entropy exchange and the Applying a Cauchy-Schwarz inequality to each sum,
P the
entanglement fidelity. It is [4]: first with respect to the complex inner product i x∗i yi ,
the second with respect to the Hilbert-Schmidt inner
Se (ρ, E) ≤ h(Fe (ρ, E)) + (1 − Fe (ρ, E)) log2 (d2 − 1), product tr(X † Y ), we obtain
(6.9)
|Fe (ρ + ∆, E) − Fe (ρ, E)| ≤
where h(p) ≡ −p log p − (1 − p) log(1 − p) is the dyadic 12
Shannon information associated with p. It is useful |tr(A†j ∆)|2 +
X X
2 |tr(Ai ρ)|2
to note for our later work that 0 ≤ h(p) ≤ 1 and i j
log(d2 − 1) ≤ 2 log d, so from the quantum Fano inequal-
|tr(Ai |∆|A†i )| |tr(|∆|)|,
X
ity, (6.16)
i
Se (ρ, E) ≤ 1 + 2(1 − Fe (ρ, E)) log d. √
where |∆| ≡ ∆† ∆. Applying (6.2) and Fe (ρ, E) ≤ 1 to
(6.10) the first sum and the trace-preserving property of E to
The proof of the quantum Fano inequality, (6.9), the final sum gives
is simple enough that for convenience we repeat it
sX
here. Consider an orthonormal set of d2 basis states, |Fe (ρ + ∆, E) − Fe (ρ, E)| ≤ 2 |tr(A†j ∆)|2 + tr(|∆|)2 .
j
|ψi i, for the system RQ. This basis set is chosen so
|ψ1 i = |ψ RQ i. If we form the quantities pi ≡ (6.17)
8
One final application of the Cauchy-Schwarz inequality true motivation for all definitions in information theory,
and the trace-preserving property of E gives whether classical or quantum: their success at quanti-
fying the resources needed to perform some interesting
|Fe (ρ + ∆, E) − Fe (ρ, E)| ≤ 2tr(|∆|) + tr(|∆|)2 , (6.18) physical task, not some abstract mathematical motiva-
tion.
as required. In subsection VII A we review the data processing in-
This result gives bounds on the change in the entan- equality which provides motivation for regarding the co-
glement fidelity when the input state is perturbed.
p Note, herent information as a quantum analogue of the mutual
incidentally, that during the proof a coefficient Fe (ρ, E) information, and whose application is crucial to later rea-
was dropped from the first term on the right hand side soning. Subsection VII B studies in detail the properties
of the inequality. For some applications it may be useful of the coherent information. In particular, we prove sev-
to apply the inequality with this coefficient in place. eral results related to convexity that are useful both as
calculational aids, and also for proving later results. Sub-
section VII C proves a lemma about the entanglement
VII. COHERENT INFORMATION
fidelity that glues together many of our later proofs of
upper bounds on the channel capacity. Finally, subsec-
In this section we investigate the coherent information. tions VII D and VII E describe two important ways the
The coherent information was defined in [5], where it was behaviour of the coherent information differs from the
suggested that the coherent information plays a role in behaviour of the mutual information when quantum en-
quantum information theory analogous to the role played tanglement is allowed.
by mutual information in classical information theory in
the following sense. Consider a classical random process,
M
A. Quantum Data Processing Inequality
X → Y, (7.1)
The role of coherent information in quantum informa-
in which the random variable X is used as the input to
tion theory is intended to be similar to that of mutual
some process which produces as output the random vari-
information in classical information theory. This is not
able Y . The distributions of X and Y are related by a
obvious from the definition, but can be given an opera-
linear map, M, determined by the conditional probabil-
tional motivation in terms of a procedure known as data
ities of the process. An example of such a process is a
processing. The classical data processing inequality [3]
noisy classical channel with input X and output Y . As
states that any three variable Markov process,
discussed earlier, an important quantity in information
theory is the mutual information, H(X : Y ), between
X → Y → Z, (7.4)
the input X and the output Y of the process. Note that
H(X : Y ) can be regarded as a function of the input X satisfies a data processing inequality,
and the map M only, since the joint distribution of X
and Y is determined by these. H(X) ≥ H(X : Y ) ≥ H(X : Z). (7.5)
Quantum mechanically we can consider a process de-
fined by an input ρ, and output ρ′ , with the process de- The idea is that the operation Y → Z represents some
scribed by a quantum operation, E, kind of “data processing” of Y to obtain Z, and the mu-
tual information after processing, H(X : Z), can be no
E
ρ → ρ′ = E(ρ). (7.2) higher than the mutual information before processing,
H(X : Y ). Furthermore, suppose we have a Markov pro-
We assert that the coherent information, defined by cess,
E(ρ) X → Y, (7.6)
I(ρ, E) ≡ S − Se (ρ, E), (7.3)
tr(E(ρ))
such that H(X) = H(X : Y ). Intuitively, one might ex-
plays a role in quantum information theory analogous pect that it be possible to do data processing on Y to
to that played by the mutual information H(X : Y ) in recover X. It is not difficult to show that it is possible,
classical information theory. This is not obvious from using Y alone, to construct a third variable, Z, forming
the definition, and one goal of this section is to make a third stage in the Markov process,
it appear plausible that this is the case. Of course, the
true justification for regarding the coherent information X → Y → Z, (7.7)
as the quantum analogue of the mutual information is
its success as the quantity appearing in results on chan- such that X = Z with probability one, if and only if
nel capacity, as discussed in later sections. This is the H(X) = H(X : Y ).
9
An analogous quantum result has been proved by where we are now using an RQE1 E2 picture of the op-
Schumacher and Nielsen [5]. It states that given trace- erations. From purity of the total state of RQE1 E2 it
preserving quantum operations E1 and E2 defining a follows that
quantum process, ′′
E1′′ E2′′ ′′
S(ρR ) = S(ρQ ). (7.17)
ρ → E1 (ρ) → (E2 ◦ E1 )(ρ), (7.8)
Neither of the systems R or E1 are involved in the second
then stage of the dynamics in which Q and E2 interact unitar-
ily. Thus, their state does not change during this stage:
′′ ′′ ′ ′
S(ρ) ≥ I(ρ, E1 ) ≥ I(ρ, E2 ◦ E1 ). (7.9) ρR E1 = ρR E1 . But from the purity of RQE1 after the
first stage of the dynamics,
Furthermore, it was shown in [5] that given a process ′′
E1′′ ′ ′ ′
S(ρR ) = S(ρR E1 ) = S(ρQ ). (7.18)
ρ → E1 (ρ), (7.10)
The remaining two terms in the subadditivity inequality
it is possible to find an operation E2 which reverses E1 if are now recognized as entropy exchanges,
and only if ′′ ′
S(ρE1 ) = S(ρE1 ) = Se (ρ, E1 ), (7.19)
S(ρ) = I(ρ, E1 ). (7.11) S(ρ E1′′ E2′′
) = Se (ρ, E2 ◦ E1 ). (7.20)
The close analogy between the classical and quantum Making these substitutions into the inequality obtained
data processing inequalities provides a strong operational from strong subadditivity (7.16) yields
motivation for considering the coherent information to
be the quantum analogue of the classical mutual infor- ′′
S(ρQ ) + Se (ρ, E1 ) ≤ S(ρQ ) + Se (ρ, E2 ◦ E1 ),
′
(7.21)
mation.
The proof of the quantum data processing inequality which can be rewritten as the second stage of the data
is repeated here in order to address the issue of what processing inequality,
happens when E1 and E2 are not trace-preserving. The
proof of the first inequality is to apply the subadditivity I(ρ, E1 ) ≥ I(ρ, E2 ◦ E1 ). (7.22)
′ ′ ′ ′
inequality [22] S(ρR E ) ≤ S(ρR ) + S(ρE ) in the RQE
picture of operations to obtain Notice that this inequality holds provided E2 is trace-
preserving, and does not require any assumption that E1
I(ρ, E1 ) = S(E1 (ρ)) − Se (ρ, E1 ) (7.12) is trace-preserving. This is very useful in our later work.
Q′ E′
= S(ρ ) − S(ρ ) (7.13)
R′ E ′ E′
= S(ρ ) − S(ρ ) (7.14) B. Properties of Coherent Information
R′ R
≤ S(ρ ) = S(ρ ) = S(ρ). (7.15)
The set of completely positive maps forms a positive
It is clear that this part of the inequality need not hold cone, that is, if Ei is a collection of completely posi-
when E1 is not trace-preserving. The reason for this is tive maps and λi is a set of non-negative numbers then
P
i λi Ei is also a completely positive map. In this sec-
′
that it is no longer necessarily the case that ρR = ρR ,
and thus it may not be possible to make the identi- tion we prove two very useful properties of the coherent
′
fication S(ρR ) = S(ρR ). For example, suppose we information. First, it is easy to see that for any quantum
have a three dimensional state space with orthonormal operation E and non-negative λ,
states |1i, |2i and |3i. Let P12 be the projector onto
the two dimensional subspace spanned by |1i and |2i, I(ρ, λE) = I(ρ, E). (7.23)
and P3 the projector onto the subspace spanned by |3i.
This follows immediately from the definition of the co-
Let ρ = p2 P12 + (1 − p)P3 , where 0 < p < 1, and
herent information. A slightly more difficult property to
E(ρ) = P12 ρP12 . Then by choosing p small enough we
prove is the following.
can make S(ρ) ≈ 0, but I(ρ, E) = 1, so we have an exam-
ple of a non trace-preserving operation which does not
obey the data processing inequality.
The proof of the second part of the data processing
inequality is to apply the strong subadditivity inequality
[22],
′′
E1′′ E2′′ ′′ ′′
E1′′ ′′ ′′
S(ρR ) + S(ρE1 ) ≤ S(ρR ) + S(ρE1 E2 ), (7.16)
10
′ ′ ′
Theorem (generalized convexity theorem for coherent and the relationship S(ρR ) = S(ρQ E ) which follows
information). from purity we obtain
Suppose Ei are quantum operations. Then
P S(ρ) = S(ρR ) (7.29)
X tr(Ei (ρ))I(ρ, Ei ) R′
I(ρ, Ei ) ≤ i P . (7.24) = S(ρ ) (7.30)
tr( i Ei (ρ))
i Q′ E ′
= S(ρ ) (7.31)
This result is extremely useful in our later work. An Q′
≤ S(ρ ) + S(ρ ). E′
(7.32)
important and immediate corollary is the following:
Corollary (convexity theorem for coherent informa- Rewriting this slightly gives
tion) P
If a trace-preserving
P operation, E = i pi Ei is a convex S(ρ) − S(E(ρ)) ≤ Se (ρ, E), (7.33)
sum (pi ≥ 0, i pi = 1) of trace-preserving operations Ei ,
then the coherent information is convex, for any trace-preserving quantum operation E.
X X Lemma (entanglement fidelity lemma for operations)
I(ρ, p i Ei ) ≤ pi I(ρ, Ei ). (7.25) Suppose E is a trace-preserving quantum operation,
i i and ρ is some quantum state. Then for all trace-
The proof of the corollary is immediate from the the- preserving quantum operations D,
orem. The theorem follows from the concavity of the
S(ρ) ≤ I(ρ, E) + 2 + 4(1 − Fe (ρ, D ◦ E)) log d. (7.34)
conditional entropy (see references cited in [22], pages
249–250), which for two systems, 1 and 2, is defined by
This lemma is extremely useful in obtaining proofs of
S(2|1) ≡ S(ρ12 ) − S(tr2 (ρ12 )). (7.26) bounds on the channel capacity. In order for the entan-
glement fidelity to be close to one, the quantity appearing
This expression is concave in ρ12 . Now notice that on the right hand side must be close to zero. This shows
′ ′ ′
that the entropy of ρ cannot greatly exceed the coherent
I(ρ, E) = S(ρQ ) − S(ρR Q ) = −S(R′ |Q′ ). (7.27) information I(ρ, E) if the entanglement fidelity is to be
close to one.
The theorem now follows from the concavity of the con- To prove the lemma, notice that by the second part of
ditional entropy. the data processing inequality, (7.9),
A further useful result concerns the additivity of co-
herent information, S(ρ) − I(ρ, E) ≤ S(ρ) − S((D ◦ E)(ρ)) + Se (ρ, D ◦ E).
Theorem (additivity for independent channels)
(7.35)
Suppose E1 , . . . , En are quantum operations and
ρ1 , . . . , ρn are density operators. Then Applying inequality (7.33) gives
X
I(ρ1 ⊗ . . . ⊗ ρn , E1 ⊗ . . . En ) = I(ρi , Ei ). (7.28) S(ρ) − S((D ◦ E)(ρ)) ≤ Se (ρ, D ◦ E), (7.36)
i
The proof is immediate from the additivity property of and combining the last two inequalities gives
entropies for product states.
S(ρ) − I(ρ, E) ≤ 2Se (ρ, D ◦ E) (7.37)
≤ 2h(Fe (ρ, D ◦ E)) +
C. A Lemma About Entanglement Fidelity 2(1 − Fe (ρ, D ◦ E)) log(d2 − 1), (7.38)
The following lemma is the glue which holds together where the second step follows from the quantum Fano
much of our later work on proving upper bounds to chan- inequality (6.9). But the dyadic Shannon entropy h is
nel capacities. In this section we prove the lemma only for bounded above by 1 and log(d2 − 1) ≤ 2 log d, so
the special case of trace-preserving operations. A similar
but more complicated result is true for general opera- S(ρ) ≤ I(ρ, E) + 2 + 4(1 − Fe (ρ, D ◦ E)) log d. (7.39)
tions, and is given in section XI.
We begin by repeating the proof of a simple inequality This completes the proof.
that was first proved in [4], which states that the decrease This inequality is strong enough to prove the asymp-
(if any) in system entropy must be bounded above by the totic bounds which are of most interest for our later work.
increase in the entropy of a pure environment. This ap- The somewhat stronger inequality (7.38) is also useful
plies only for trace-preserving operations E. Applying the when proving one-shot results, that is, when no block
′ ′ ′ ′
subadditivity inequality [22] S(ρQ E ) ≤ S(ρQ ) + S(ρE ) coding is being used.
11
D. Quantum Characteristics of the Coherent An example of (7.43) is as follows. For convenience
Information I we use the language of coding and channel operations,
since that language is most convenient later. E1 is to be
There are at least two important respects in which the identified with the coding operation, C, and E2 is to be
coherent information behaves differently from the classi- identified with the channel operation, N .
cal mutual information. In this subsection and the next Suppose we have a four dimensional state space.
we explain what these differences are. We suppose that we have an orthonormal basis
Classically, suppose we have a Markov process, |1i, |2i, |3i, |4i, and that P12 is the projector onto the
space spanned by |1i and |2i, and P34 is the projector
X → Y → Z. (7.40) onto the space spanned by |3i and |4i. Let U be a uni-
Intuitively we expect that tary operator defined by
12
A straightforward though slightly tedious calculation
Process 1 shows that with this channel setup
X1 Y1 I(ρ1 , E) = I(ρ2 , E) = 0, (7.58)
and
I(ρ12 , E ⊗ E) = 2. (7.59)
FIG. 5.
In this section we return to noisy channel coding. Re-
Dual classical channels operating on inputs X1 and X2
call the basic procedure for noisy channel coding, as il-
produce outputs Y1 and Y2 .
lustrated in figure 6.
These channels are numbered 1, . . . , n and take as in- Channel Channel
puts random variables X1 , . . . , Xn . The channels might Source Input Output Receiver
N D
Encoding Channel Decoding
be separated spatially, as shown in the figure, or in time. ρs C ρc ρo ρr
The channels are assumed to act independently on their
respective inputs, and produce outputs Y1 , . . . , Yn . It is
FIG. 6.
not difficult to show that
X The noisy quantum channel, together with encodings
H(X1 , . . . , Xn : Y1 , . . . , Yn ) ≤ H(Xi : Yi ). (7.53) and decodings.
i
Suppose a quantum source has output ρs . A quantum
This property is known as the subadditivity of mutual operation, which we shall denote C, is used to encode
information. It is used, for example, in proofs of the the source source, giving the input state to the chan-
weak converse to Shannon’s noisy channel coding the- nel, ρc ≡ C(ρs ). The encoded state is used as input to
orem. We now show that the corresponding quantum the noisy channel, giving a channel output ρo ≡ N (ρc ).
statement about coherent information fails to hold. Finally, a decoding quantum operation, D, is used to de-
Example 2: There exists a quantum operation E and code the output of the channel, giving a received state,
a density operator ρ12 such that ρr ≡ D(ρo ). The goal of noisy channel coding is to find
I(ρ12 , E ⊗ E) 6≤ I(ρ1 , E) + I(ρ2 , E), (7.54) out what source states can be sent with high entangle-
ment fidelity. That is, we want to know for what states
where ρ1 ≡ tr2 (ρ12 ) and ρ2 ≡ tr1 (ρ12 ) are the usual re- ρs can encoding and decoding operations be found such
duced density operators for systems 1 and 2. that
An example of (7.54) is the following. Suppose system
1 consists of two qubits, A and B. System 2 consists of Fe (ρs , D ◦ N ◦ C) ≈ 1. (8.1)
two more qubits, C and D. As the initial state we choose
If large blocks of source states with entropy R per use of
IA IC the channel can be sent through the channel with high
ρ12 = ⊗ |ψ BD ihψ BD | ⊗ , (7.55)
2 2 fidelity, we say the channel is transmitting at the rate R.
where |ψ BD i is a Bell state shared between systems B Shannon’s noisy channel coding theorem is an example
and D. of a channel capacity theorem. Such theorems come in
The action of the channel on A and B is as follows: two parts:
it sets bit B to some standard state, |0i, and allows A 1. An upper bound is placed on the rate at which in-
through unchanged. This is achieved by swapping the formation can be sent reliably through the channel.
state of B out into the environment. Formally, There should be an effective procedure for calculat-
E(ρAB ) = ρA ⊗ |0ih0|. (7.56) ing this upper bound.
The same channel is now set to act on systems C and D: 2. It is proved that a reliable scheme for encoding and
decoding exists which comes arbitrarily close to at-
E(ρCD ) = ρC ⊗ |0ih0|. (7.57) taining the upper bound found in 1.
13
This maximum rate at which information can be reliably of that channel is given by the pair (Hs⊗n , N ⊗n ), where
sent through the channel is known as the channel capac- ⊗n is used to denote n-fold tensor products. The mem-
ity. oryless nature of the channel is reflected in the fact that
In this paper we consider only the first of these tasks, the operation performed on the n copies of the channel
the placing of upper bounds on the rate at which quan- system is a tensor product of independent single-system
tum information can be reliably sent through a noisy operations.
quantum channel. The results we prove are analogous to Define an n-code (C, D) from Hs into Hc to consist of
the weak converse of the classical noisy coding theorem, a trace-preserving quantum operation, C, from Hs⊗n to
but cannot be considered true converses until attainabil- Hc⊗n , and a trace-preserving quantum operation D from
ity of our bounds is demonstrated. We do consider it Hc⊗n to Hs⊗n . We refer to C as the encoding and D as
likely that our bounds are equal to the true quantum the decoding.
channel capacity. The total coding operation T is given by
T ≡ D ◦ N ⊗n ◦ C. (8.4)
A. Mathematical Formulation of Noisy Channel
Coding The measure of success we use for the total procedure is
the total entanglement fidelity,
Up to this point the procedure for doing noisy chan- Fe (ρns , T ). (8.5)
nel coding has been discussed in broad outline but we
have not made all of our definitions mathematically pre- In practice we frequently abuse notation, usually by
cise. This subsection gives a precise formulation for the omitting explicit mention of the Hilbert spaces Hs and
most important concepts appearing in our work on noisy Hc . Note also that in principle the channel could have
channel coding. different input and output Hilbert spaces. To ease nota-
Define a quantum source Σ = (Hs , Υ) to consist of a tional clutter we do not consider that case here, but all
Hilbert space Hs and a sequence Υ = {ρ1s , ρ2s , ..., ρns , ...} the results we prove go through without change.
where ρ1s is a density operator on Hs , ρ2s a density op- Given a source state ρs and a channel N , the goal
erator on Hs ⊗ Hs , and ρns a density operator on Hs⊗n , of noisy channel coding is to find an encoding C and a
etc... Using, for example, “tr34 ” to denote the partial decoding D such that Fe (ρs , T ) is close to one; that is,
trace over the third and fourth copies of Hs , we require ρs and its entanglement is transmitted almost perfectly.
as part of our definition of a quantum source that for all In general this is not possible to do. However, Shannon
j and all n > j, showed in the classical context that by considering blocks
of output from the source, and performing block encod-
trj+1,...,n (ρns ) = ρjs , (8.2) ing and decoding it is possible to considerably expand
the class of source states ρs for which this is possible.
i.e. that density operators in the sequence be consistent
The quantum mechanical version of this procedure is to
with each other in the sense that earlier ones be deriv-
find a sequence of n-codes, (C n , Dn ) such that as n → ∞,
able from later ones by an appropriate partial trace. The
the measure of success Fe (ρns , T n ) approaches one, where
n-th density operator is meant to represent the state of n
T n = Dn ◦ N ⊗n ◦ C n . (We will sometimes refer to such
emissions from the source, normally thought of as taking
a sequence as a coding scheme.)
n units of time. (We could have used a single density
Suppose such a sequence of codes exists for a given
operator on a countably infinite tensor product of spaces
source Σ. In this case the channel is said to transmit Σ
Hs , but we wish to avoid the technical issues associated
reliably. We also say that the channel can transmit reli-
with such products.) We define the entropy rate of a
ably at a rate R = S(Σ). (Note that this definition does
general source Σ as
not require that the channel be able to transmit reliably
S(ρns ) any source with entropy rate less than or equal to R; that
S(Σ) ≡ lim sup . (8.3) is a different potential definition of what it means for a
n→∞ n
channel to transmit reliably at rate R. We conjecture
A special case of this general definition of a quantum that the two definitions are equivalent in the contexts
source is the i.i.d. source (Hs , {ρs , ρs ⊗ ρs , ..., ρ⊗n
s , ...}), considered in this paper.)
for some fixed ρs . Such a source corresponds to the classi- A noisy channel coding theorem enables one to de-
cal notion of an independent, identically distributed clas- termine, for any source and channel, whether or not the
sical source, thus the term i.i.d. The entropy rate of this source can be transmitted reliably on that channel. Clas-
source is simply S(ρs ). sically, this is determined by comparing the entropy rate
A discrete memoryless channel, (Hc , N ) consists of of the source to the capacity of the channel. If the en-
a finite-dimensional Hilbert space, Hc , and a trace- tropy rate of the source is greater than the capacity, the
preserving quantum operation N . The n-th extension source cannot be transmitted reliably. If the entropy rate
14
is less than the capacity, it can. The conjunction of these lim Fe (ρns , Dn ◦ N ⊗n ◦ U n ) = 1. (9.3)
n→∞
two statements is precisely the noisy channel coding theo-
rem. (The case when the entropy rate of the source equals Then
the capacity requires separate consideration; sometimes
reliable transmission is achievable, and sometimes not.) S(ρns )
S(Σ) ≡ lim sup ≤ C(N ). (9.4)
We expect that in quantum mechanics, the entropy rate n→∞ n
S(Σ) of the source will play the role of the classical en-
tropy rate. A channel will be able to transmit reliably This theorem tells us that we cannot reliably trans-
any source with entropy rate less than the capacity; fur- mit more than C(N ) qubits of information per use of the
thermore, no source with entropy rate greater than the channel.
capacity will be reliably transmissible (i.e., the channel For unitary U n we have
will be unable to transmit reliably at a rate greater than
the capacity.) The first part of this would constitute a I(ρs , N ⊗n ◦ U n ) = I(U n (ρs ), N ⊗n ), (9.5)
quantum noisy channel coding theorem; the second, a
“weak converse” of the theorem. (A “strong converse” and thus
would require not just that no source with entropy rate
greater than the capacity can be reliably transmitted, i.e. I(ρs , N ⊗n ◦ U n ) ≤ Cn . (9.6)
transmitted with asymptotic fidelity approaching unity,
but would require that all such sources have asymptotic By (7.34) with E ≡ N ⊗n ◦ U n , and the fact that
fidelity of transmission approaching zero.) I(U n (ρs ), N ⊗n ) ≤ maxρ I(ρ, N ⊗n ) ≡ Cn , it now follows
that
S(ρns ) Cn 2
IX. UPPER BOUNDS ON THE CHANNEL ≤ + +
CAPACITY n n n
4(1 − Fe (ρns , Dn ◦ N ⊗n ◦ U n )) log d. (9.7)
In this section we investigate a variety of upper bounds
(Note that d here is the dimension of a single copy of the
on the capacity of a noisy quantum channel.
source Hilbert space, so that we have inserted dn for the
overall dimension d of (7.34)). Taking lim sups on both
A. Unitary Encodings sides of the equation completes the proof of the theorem.
It is extremely useful to study this result at length,
since the basic techniques employed to prove the bound
This subsection is concerned with the case where the
encoding, C, is unitary. are the same as those that appear in a more elaborate
For this subsection only we define guise later in the paper. In particular, what features
of quantum mechanics necessitate a change in the proof
Cn ≡ max I(ρ, N ⊗n ), (9.1) methods used to obtain the classical bound?
ρ
Suppose the quantum analogue of the classical subad-
where the maximization is over all inputs ρ to n copies of ditivity of mutual information were true, namely
the channel. The bound on the channel capacity proved n
in this section is defined by
X
I(ρn , N ⊗n ) ≤ I(ρni , N ), (9.8)
i=1
Cn
C(N ) ≡ lim . (9.2)
n→∞ n where ρn is any density operator that can be used as
It is not immediately obvious that this limit exists. To input to n copies of the channel, and ρni is the density
see that it does, notice that Cn ≤ n log2 d and Cm +Cn ≤ operator obtained by tracing out all but the i-th chan-
Cm+n and apply the lemma proved in Appendix A. No- nel. Then it would follow easily from the definition that
tice that C(N ) is a function of the channel operation Cn = C1 for all n, and thus
only.
We begin with a theorem that places a limit on the C(N ) = C1 = max I(ρ, N ). (9.9)
ρ
entropy rate of a source which can be sent through a
quantum channel. This expression is exactly analogous to the classical ex-
Theorem pression for channel capacity as a maximum over input
Suppose we consider a source Σ = (Hs , {..ρns ...}) and distributions of the mutual information between chan-
a sequence of unitary encodings U n for the source. Sup- nel input and output. If this were truly a bound on the
pose further that there exists a sequence of decodings, quantum channel capacity then it would allow easy nu-
Dn , such that merical evaluations of bounds on the channel capacity, as
15
the maximization involved is easy to do numerically, and Again, this result places an upper bound on the rate
the coherent information is not difficult to evaluate. at which information can be reliably transmitted through
Unfortunately, it is not possible to assume that the a noisy quantum channel. The proof is very similar to
quantum mechanical coherent information is subadditive, the earlier proof of a bound for unitary encodings. One
as shown by example (7.54), and thus in general it is pos- simply applies (7.34) with E = N ⊗n ◦ C n and D = Dn ,
sible that to give:
B. General Encodings This possibility stems from the failure of the quantum
pipelining inequality, (7.43). It is clear that the existence
of such a scheme would cause the line of proof suggested
We now consider the case where something more gen- above to fail. Classically the pipelining inequality holds,
eral than a unitary encoding is allowed. In principle, it is and therefore the complication of having to maximize
always possible to perform a non-unitary encoding, C, by over encodings does not arise.
introducing an extra ancilla system, performing a joint Having proved that C(N ) is an upper bound on the
unitary on the source plus ancilla, and then discarding channel capacity, let us now investigate some of the prop-
the ancilla. erties of this bound. First we examine the range over
We define which C(N ) can vary. Note that
Cn ≡ max I(ρ, N ⊗n ◦ C), (9.11) 0 ≤ Cn ≤ n log2 d, (9.17)
ρ,C
where the maximization is over all inputs ρ to the encod- since if ρ is pure then I(ρ, N ⊗n ◦ C) = 0 for any encoding
ing operation, C, which in turn maps to n copies of the C, and for all ρ and C, I(ρ, N ⊗n ◦ C) ≤ log2 dn = n log2 d,
channel. The bound on the channel capacity proved in since the channel output has dn dimensions. It follows
this section is defined by that
S(ρns )
X
S(Σ) ≡ lim sup ≤ C(N ). (9.14) N (ρ) = Pi ρPi , (9.20)
n→∞ n i
16
with Pi ≡ |iihi| a complete orthonormal set of one- classes of encodings. In this section we describe several
dimensional projectors. This channel is classically noise- such classes.
less, yet a straightforward application of (7.24) yields We begin by considering the class involving local uni-
C(N ) = 0, and therefore this channel has zero capac- tary operations only. We refer to this class as U -L. It
ity for the transmission of entanglement. consists of the set of operations C which can be written
P †
More generally, if N (ρ) = i Ai ρAi , where Ai = in the form
λi |ai ihbi |, then C(N ) = 0 by the same argument, and
thus the channel capacity for such a channel is zero. As C(ρ) = (U1 ⊗ · · · ⊗ Un )ρ(U1† ⊗ · · · ⊗ Un† ), (9.23)
a special case of this result, it follows that the capacity of
any classical channel as defined in section V to transmit where U1 , . . . , Un are local unitary operations on systems
entanglement is zero. 1 through n. Another possibility is the class L of encod-
Provided the input and output dimensions of the chan- ings involving local operations only, i.e. operations of the
nel are the same, it is not difficult to show that C(N ) = form:
log2 d if and only if N is unitary. X
(Ai1 ⊗ Bi2 ⊗ · · · ⊗ Zin )ρ
It is also of interest to consider what happens when i1 ,...,in
channels N1 and N2 are composed, forming a concate-
nated channel, N = N2 ◦ N1 . From the data processing (A†i1 ⊗ Bi†2 ⊗ · · · ⊗ Zi†n ). (9.24)
inequality it follows that
In other words, the overall operation has a tensor product
C(N1 ) ≥ C(N ). (9.21) form A ⊗ B ⊗ · · · ⊗ Z.
A more realistic class is 1-L – encoding by local opera-
It is clear by repeated application of the data-processing tions with one way classical communication. The idea is
inequality that this result also holds if we compose more that the encoder is allowed to do encoding by perform-
than two channels together, and even holds if we allow ing arbitrary quantum operations on individual mem-
intermediate decoding and re-encoding stages. Classical bers (typically, a single qubit) of the strings of quan-
channel capacities also behave in this way: the capacity tum systems emitted by a source. This is not unrealis-
of a channel made by composing two (or more) channels tic with present day technology for manipulating single
together is no greater than the capacity of the first part qubits. Such operations could include arbitrary unitary
of the channel alone. rotations, and also generalized measurements. After the
Although (7.43) might seem to suggest otherwise, in qubit is encoded, the results of any measurements done
fact during the encoding may be used to assist in the encoding
of later qubits. This is what we mean by one way com-
C(N2 ) ≥ C(N ). (9.22) munication - the results of the measurement can only be
used to assist in the encoding of later qubits, not earlier
For let us suppose that C is the encoding which achieves qubits.
C(N ), so that the total operation is D ◦ N ◦ C ≡ D ◦ N2 ◦ Another possible class is 2-L - encoding by local oper-
N1 ◦ C. As our encoding for the channel N2 , we may use ations with two-way classical communication. This may
N1 ◦ C and decode with D, hence achieving precisely the arise in a situation where there are many identical chan-
same total operation. nels operating side by side in space. Once again it is
Inequalities analogous to (9.21) and (9.22) may also assumed that the encoder can perform arbitrary local op-
be stated for the actual channel capacity. Clearly these erations, only this time two-way classical communication
statements are true as well. is allowed when performing the encoding.
For any class of encodings Λ arguments analogous to
those used above for general and for unitary block coding,
C. Other Encoding Protocols ensure that the expression
CΛ,n
So far we have considered two allowed classes of encod- CΛ (N ) ≡ lim , (9.25)
n→∞ n
ings: encodings where a general unitary operation can be
performed on a block of quantum systems, and encod- where
ings where a general trace-preserving quantum operation
can be performed on a block of quantum systems. If CΛ,n ≡ max I(ρ, N ⊗n ◦ C), (9.26)
ρ,C∈Λ
large-scale quantum computation ever becomes feasible
it may be realistic to consider encoding protocols of this is an upper bound to the rate at which quantum infor-
sort. However, for present-day applications of quantum mation can be reliably transmitted using encodings in
communication such as quantum cryptography and tele- Λ. Thus, in addition to the bounds for general and uni-
portation, it is realistic to consider much more restricted tary encodings, there are bounds CU−L , CL , C1−L , and
17
C2−L , which provide upper bounds on the rate of quan- possible to perform an arbitrary encoding and decoding
tum information transmission for these types of encod- operation using a look-up table. However, quantum me-
ings. A priori it is not clear what the exact relationships chanically this is far from being the case, so there is corre-
are amongst these bounds, although various inequalities spondingly more interest in studying the channel capac-
may easily be proved, ities that may result from considering different classes of
encodings and decodings.
CU−L ≤ CL ≤ C1−L ≤ C2−L ≤ Cgeneral (9.27)
CU−L ≤ Cunitary (9.28)
X. DISCUSSION
Cunitary ≤ Cgeneral . (9.29)
What then can be said about the status of the quantum
Furthermore, note that these bounds allow general de-
noisy channel coding theorem in the light of comments
coding schemes. It is possible that much tighter bounds
made in the preceding sections? While we have estab-
may result if we restrict the decoding schemes in the same
lished upper bounds, we have not proved achievability of
way we have restricted the encoding schemes.
these bounds. How might one prove that these bounds
An interesting and important question is whether there
are achievable?
are closed-form characterizations of the sets of quantum
Lloyd [7] has also proposed an expression involving a
operations corresponding to particular types of encoding
maximum of the coherent information as the channel ca-
schemes such as 1-L and 2-L. For example, in the cases
pacity,
of U -L and L there are explicit forms (9.23,9.24) for the
classes of encodings allowed. For 1-L the operations take max I(ρ, N ), (10.1)
the form: ρ
18
purposes of proving upper bounds it is not sufficient to were so, it would greatly aid in establishing that one has
consider a restricted class of encodings, since it is possible reached a maximum, since in this case a local maximum
that other coding schemes may do better, and therefore is guaranteed to be a global maximum (Appendix B).
that the capacity is somewhat larger than (10.2). We
suspect that this is not the case, but have been unable
to provide a rigorous proof. A heuristic argument is pro- A. Unitary versus Nonunitary Encoding
vided in subsection X A.
In the light of these remarks it is interesting that the
For the purposes of obtaining a capacity theorem for
coding scheme of Shor and Smolin [23] provides an ex-
general encodings and decodings, a restriction on the
ample where rates of transmission exceeding (10.1) are
class of encodings is clearly unacceptable. For example,
obtained. Nevertheless, the general bound (9.12) must
given a source density operator whose eigenvalues are
still be obeyed by a coding scheme of their type.
not all equal, we may not even be able to send it reliably
However, one can still make progress towards a proof
through a noiseless channel whose capacity is just greater
that the expression, (9.12), which bounds the channel
than the source entropy rate without doing non-unitary
capacity, is the correct capacity. If we accept that it is
compression as described in [24–26]. This compression,
possible to attain rates up to (10.2), then the following
which is essentially projection onto the typical subspace
four-stage construction shows that (9.12) is a correct ex-
[24] of the source, is not a unitary operation, and thus
pression for the capacity; i.e. that in addition to being an
we expect that nonunitary operations will be essential to
upper bound as shown in section IX, it is also achievable.
showing achievability of the noisy channel capacity.
Channel Channel We conjecture that once the projection onto the typical
Pre-source Source Input Output Receiver
Pre-
source subspace is accomplished, nonunitary operations
N D
encoding
19
is asymptotically necessary and sufficient that the com- that channel, by allowing the decoder to choose different
posite operation Dn ◦ N ⊗n ◦ C n have high entanglement decodings Dm depending on the measurement result m.
fidelity when the source is restricted to the typical sub-
Observer
space, τ . Hence, if an encoding C n is nonunitary on τ, it
Environment Result: m
must be “close to reversible” on τ, and Dn ◦N ⊗n must be
close to reversing it. In [14] it is shown that perfect re- Source Receiver
versibility of an operation on a subspace M is equivalent
to the statement that the operation, restricted to that ρs C ρc N ρo Dm ρr
√
subspace, may be represented by operators { pi Ui PM },
where PM Uj† Ui PM = δij PM and PM is the projector onto FIG. 8.
M . That is, the operation randomly (with probabilities Noisy quantum channel with a classical observer.
pi ) chooses a unitary which moves the state into one of
a mutually orthogonal set of subspaces. Hence C n , in its We now give a simple example which illustrates that
action on the source’s typical subspace, is close to some this can be the case. Suppose we have a two-level sys-
perfectly reversible operation C∗n consisting of “randomly tem in a state ρ and an initially uncorrelated four-level
picking a unitary into an orthogonal subspace.” Hence environment initially in the maximally mixed state I/4,
the entanglement fidelity of the total operation T is close so the total state of the joint system is
to that of T∗n , in which the encoding C n is replaced with I
C∗n . The linearity of the entanglement fidelity in the op- ρ⊗ . (11.2)
4
eration implies that for at least one of the unitaries Ui in
We fix an orthonormal basis |1i, |2i, |3i, |4i for the envi-
the random-unitaries representation of the perfectly re-
ronment. We assume that a unitary interaction between
versible operation C∗n , the entanglement fidelity is at least
the system and environment takes place, given by the
as good if the unitary is substituted for C∗n . Therefore,
unitary operator
arbitrary encodings C n are close to unitary encodings of
τ into a subspace of the channel’s Hilbert space. Thus U = I ⊗ |1ih1| + σx ⊗ |2ih2| +
the only nonunitarity which it is necessary to consider is σy ⊗ |3ih3| + σz ⊗ |4ih4|. (11.3)
the restriction to the source’s typical subspace.
The output of the channel is thus
I
XI. CHANNELS WITH A CLASSICAL ρ → N (ρ) ≡ trE U ρ ⊗ U† . (11.4)
4
OBSERVER
The quantum operation N can be given two particularly
In this section we consider a generalized version of the useful forms,
quantum noisy channel coding problem. Suppose that 1
in addition to a noisy interaction with the environment N (ρ) = (IρI + σx ρσx + σy ρσy + σz ρσz ) (11.5)
4
there is also a classical observer who is able to perform a I
measurement. This measurement may be on the channel = . (11.6)
2
or the environment of the channel, or possibly on both.
It is not difficult to show from the second form that
The result of the measurement is then sent to the de-
coder, who may use the result to assist in decoding. We C(N ) = 0, (11.7)
assume that this transmission of classical information is
and thus the channel capacity for the channel N is equal
done noiselessly, although it is also interesting to con-
to zero. Suppose now that an observer is introduced, who
sider what happens when the classical transmission also
is allowed to perform a measurement on the environment.
involves noise. It can be shown [11] that the state re-
This measurement is a Von Neumann measurement in the
ceived by the decoder is again related to the state ρ used
|1i, |2i, |3i, |4i basis, and yields a corresponding measure-
as input to the channel by a quantum operation Nm ,
ment result, m = 1, 2, 3, 4. Then the quantum operations
where m is the measurement result recorded by the clas-
corresponding to these four measurement outcomes are
sical observer,
1
Nm (ρ) N1 (ρ) = ρ (11.8)
ρ→ . (11.1) 4
tr(Nm (ρ)) 1
N2 (ρ) = σx ρσx (11.9)
4
The basic situation is illustrated in figure 8. The idea 1
is that by giving the decoder access to classical informa- N3 (ρ) = σy ρσy (11.10)
4
tion about the environment responsible for noise in the 1
channel it may be possible to improve the capacity of N4 (ρ) = σz ρσz . (11.11)
4
20
X
Each of these is unitary, up to a constant multiplying S(ρ) ≤ tr(Em (ρ))I(ρ, Em ) + 2 +
factor, so conditioned on knowing the measurement re- m
sult m, the corresponding channel capacity Cm is perfect. 4(1 − Fe (ρ, T )) log2 d, (11.13)
That is,
where
Cm = 1 (11.12) X
for all measurement outcomes m. This is an example T ≡ Dm ◦ Em . (11.14)
m
where the capacity of the observed channel is strictly
greater than for the unobserved channel. By the second step of the data processing inequality,
This result is particularly clear in the context of tele- (7.9), I(ρ, Em ) ≥ I(ρ, Dm ◦ Em ) for each m, and not-
portation. Nielsen and Caves [10] showed that the prob- ing also that by the trace-preserving property of Dm ,
lem of teleportation can be understood as the problem tr(Em (ρ)) = tr((Dm ◦ Em )(ρ)), we obtain
of a quantum noisy channel with an auxiliary classical
channel. In the single qubit teleportation scheme of Ben-
X
S(ρ) ≤ S(ρ) + [tr(Em (ρ))I(ρ, Em )−
nett et al [9] there are four quantum operations relating m
the state Alice wishes to teleport to the state Bob re- tr((Dm ◦ Em )(ρ))I(ρ, Dm ◦ Em )] . (11.15)
ceives, corresponding to each of the four measurement
results. In that scheme it happens that those four opera- Applying the generalized convexity theorem for coherent
tions are the Nm we have described above. Furthermore information (7.24) gives
in the absence of the classical channel, that is, when Alice X
does not send the result of her measurement to Bob, the − tr((Dm ◦ Em )(ρ))I(ρ, Dm ◦ Em ) ≤ −I(ρ, T ).
channel is described by the single operation N . Clearly, m
in order that causality be preserved we expect that the (11.16)
channel capacity be zero. On the other hand, in order
that teleportation be able to occur we expect that the We obtain
channel capacity Cm = 1, as was shown above. Telepor- X
tation understood in this way as a noisy channel with S(ρ) ≤ tr(Em (ρ))I(ρ, Em ) + S(ρ) − I(ρ, T ). (11.17)
a classical side channel offers a particularly elegant way m
21
an observed channel, in precisely the same way we earlier So an application of the bound (9.12) on the rate of trans-
used (7.34) to prove bounds for unobserved channels. mission through the unobserved channel M shows the ex-
We may derive the same bound in another fashion if pression on the right hand side of (11.21) which bounds
we associate observed channels with tracepreserving un- the capacity of the observed channel {Nm } also bounds
observed channels in the following fashion suggested by the capacity of M. This result provides some evidence for
examples in [8]. To an observed channel {Nm } we asso- the intuitively reasonable proposition that M and {Nm }
ciate a single tracepreserving operation M from Hc to are equivalent with respect to transmission of quantum
a larger Hilbert space Hc ⊗ M , where M is a “register” information.
Hilbert space. Each dimension of M corresponds to a dif- Bennett et. al [8] derive capacities for three simple
ferent measurement result, m. The operation is specified channels which may be viewed as taking the form (11.22).
by: The quantum erasure channel takes the input state to a
X fixed state orthogonal to the input state with probabil-
M(ρ) = Nm (ρ) ⊗ |mihm|, (11.22) ity ǫ; otherwise, it transmits the state undisturbed. An
m equivalent observed channel would with probability ǫ re-
place the input state with a standard pure state |0ih0|
where |mi is some set of orthogonal states corresponding
within the input subspace, and also provide classical in-
to the measurement results which may occur. This map
formation as to whether this replacement has occurred or
is an “all-quantum” version of the observed channel.
not. The phase erasure channel randomizes the phase of
Since our upper bounds to the capacity of an unob-
a qubit with probability δ, and otherwise transmits the
served channel apply also to channels with output Hilbert
state undisturbed; it also supplies classical information
spaces of different dimensionality than the input space,
as to which of these alternatives occurred. The mixed
they apply to this map as well. It is easily verified that
erasure/phase-erasure channel may either erase or phase-
the coherent information for the map M acting on ρ is
erase, with exclusive probabilities ǫ and δ. Bennett et.
the same as the average coherent information for the ob-
al. note that the capacity max(0, 1 − 2ǫ) of the erasure
served channel Nm acting on ρ, which appears in (11.13)
channel is in fact the one-shot maximal coherent infor-
and in the bound (11.21). To show this, define
mation. We have verified that the capacities they de-
pm ≡ tr(Nm (ρQ )), (11.23) rive for the phase-erasure channel (1 − δ) and the mixed
erasure/phase-erasure channel max(0, 1 − 2ǫ − δ) are the
where we are again working in the RQ picture of opera- same as the one-shot maximal average coherent infor-
′
tions. Then ρQ = M(ρQ ) is given by (11.22), so that mation for the corresponding observed channels, lending
some additional support to the view that the bounds we
′ X Nm (ρQ ) have derived here are in fact the capacities.
S(ρQ ) = H(pm ) + pm S( ) (11.24)
m
pm
since the density matrices Nm (ρQ )⊗|mihm| are mutually B. Relationship to Unobserved Channel
orthogonal. Also,
′ ′ X
∗
ρR Q = (I ⊗ Nm )(ρRQ ), (11.25) Suppose a quantum system passes through a chan-
m
nel, interacts with an environment, and then measure-
∗
where by definition Nm (ρ) = Nm (ρ) ⊗ |mihm|. Thus ments are performed on the environment alone. How is
the capacity of this observed channel related to the ca-
′ ′ X (I ⊗ Nm )(ρRQ ) pacity of the channel which results if no measurement
S(ρR Q ) = H(pm ) + pm S( ). (11.26) had been performed on the environment? Physically, it
m
pm
is clear that the capacity when measurements are per-
Hence the coherent information is formed must be at least as great as when no measure-
ments on the environment are performed, since the de-
X Nm (ρQ ) (I ⊗ Nm )(ρRQ ) coder can always ignore the result of the measurement.
I(ρQ , M) = pm [S( ) − S( )],
pm pm In this subsection we show that bounds we have derived
m
on channel capacity have this same property: observa-
(11.27)
tion of the environment can never decrease the bounds
we have obtained.
which can be rewritten as the average coherent informa-
tion for {Nm }, Suppose {Nm } are the operations describing the differ-
ent possible measurement outcomes. Then the operation
describing the same channel, but without any observation
X
I(ρQ , M) = pm I(ρQ , Nm ). (11.28)
m of the environment, is
22
X
N = NM . (11.29) Many new twists on the problem of the quantum noisy
m channel arise when an observer of the environment is al-
lowed. For example, one might consider the situation
Recall the expressions for the bound on the capacity
where the classical channel connecting the observer to
of the unobserved channel,
the decoder is noisy. What then are the resources re-
I(ρ, N ⊗n ◦ C n ) quired to transmit coherent quantum information?
C(N ) = lim max , (11.30) It may also be interesting to prove results relating the
n
n→∞ C ,ρ n
classical and quantum resources that are required to per-
and the observed channel, form a certain task. For example, in teleportation it can
be shown that one requires not only the quantum chan-
C({Nm }) = lim max
n nel, but also two bits of classical information, in order to
n→∞ C ,ρ
X transmit quantum information with perfect reliability.
tr((Nm1 ⊗ · · · ⊗ Nmn ◦ C n )(ρ))
m1 ,...mn
I(ρ, Nm1 ⊗ · · · ⊗ Nmn ◦ C n )
, (11.31) XII. CONCLUSION
n
But the generalized convexity theorem (7.24) for coher- In this paper we have shown that different informa-
ent information implies that tion transmission problems may result in different chan-
X nel capacities for the same noisy quantum channel. We
tr((Nm1 ⊗ · · · ⊗ Nmn ◦ C n )(ρ)) have developed some general techniques for proving up-
m1 ,...,mn
per bounds on the amount of information that may be
I(ρ, Nm1 ⊗ · · · ⊗ Nmn ◦ C n ) transmitted reliably through a noisy quantum channel.
n Perhaps the most interesting thing about the quantum
I(ρ, N ⊗n ◦ C n ) noisy channel problem is to discover what is new and es-
≤ , (11.32)
n sentially quantum about the problem. The following list
and thus summarizes what we believe are the essentially new fea-
tures:
C(N ) ≤ C({Nm }). (11.33)
1. The insight that there are many essentially differ-
To see that this inequality may sometimes be strict, ent information transmission problems in quantum
return to the example considered earlier in this section mechanics, all of them of interest depending on the
in the context of teleportation. In that case it is not application. These span a spectrum between two
difficult to verify that extremes:
23
can only be done using local operations and one- 3. Estimation of channel capacities for realistic chan-
way classical communication. This may give rise nels. This work could certainly be done theoret-
to a different channel capacity than occurs if we al- ically and perhaps also experimentally. Recent
low non-local encoding and decoding. Thus there work on quantum process tomography [27,28] points
are different noisy channel problems depending on the way toward experimental determination of the
what classes of encodings and decodings are al- quantum channel capacity. A related problem is
lowed. to analyze how stable the determination of channel
capacities is with respect to experimental error.
2. The use of quantum entanglement to construct ex-
amples where the quantum analogue of the classical 4. As suggested in subsection IX C it would be inter-
pipelining inequality H(X : Z) ≤ H(Y : Z) for a esting to see what channel capacities are attainable
Markov process X → Y → Z, fails to hold (cf. for different classes of allowable encodings and /
eqtn. (7.43)). or decodings, for example, encodings where the en-
coder is only allowed to do local operations and one-
3. The use of quantum entanglement to construct ex- way classical communication, or encodings where
amples where the subadditivity property of mutual the encoder is allowed to do local operations and
information, two-way classical communication. We have showed
X how to prove bounds on the channel capacity in
H(X1 , . . . , Xn : Y1 , . . . , Yn ) ≤ H(Xi : Yi ), these cases; whether these bounds are attainable is
i unknown.
(12.1) 5. The development of rigorous general techniques for
proving attainability of channel capacities, which
fails to hold (cf. eqtn. (7.54)). may be applied to different classes of allowed en-
codings and decodings.
There are many more interesting open problems asso-
ciated with the noisy channel problem than have been 6. Finding the capacity of a noisy quantum channel
addressed here. The following is a sample of those prob- for classical information. A related problem arises
lems which we believe to be particularly important: in the context of superdense coding, where one half
of an EPR pair can be used to send two bits of clas-
1. The development of an effective procedure for de- sical information. It would be interesting to know
termining channel capacities. We believe that this to what extent this performance is degraded if the
is the most important problem remaining to be ad- pair of qubits shard between sender and receiver is
dressed. Assuming our upper bound not an EPR pair, but rather the sharing is done
using a noisy quantum channel, leading to a de-
C(N ) = lim max I(ρ, N ⊗n ◦ C) (12.2) crease in the number of classical bits that can be
n→∞ ρ,C
sent. Given a noisy quantum channel, what is the
maximum amount of classical information that can
is, in fact, the channel capacity for general encod- be sent in this way?
ings, it still remains to find an effective procedure
for evaluating this quantity. Both maximizations 7. All work done thus far has been for discrete chan-
can be done relatively easily, since they are of a nels, that is, channels with finite dimensional state
continuous function over a compact set. However, spaces. It is an important and non-trivial problem
we do not yet understand the convergence of the to extend these results to channels with infinite di-
limit well enough to have an effective procedure for mensional state spaces.
evaluating this quantity.
8. A more thorough study of noisy channels which
2. Once an effective procedure has been obtained for have a classical side channel. Can the classical in-
evaluating channel capacities, it still remains to de- formation obtained by an observer be related to
velop good numerical algorithms for performing the changes in the channel capacity? What if the clas-
evaluation. Assuming that the evaluation involves sical side channel is noisy? Many other fascinat-
some kind of maximization of the coherent informa- ing problems, too many to enumerate here, suggest
tion, it becomes important to know whether the co- themselves in this context.
herent information is concave in ρ. Combined with There are many other ways the classical results on
the convexity of the coherent information in the noisy channels have been extended - considering chan-
operation this would give a powerful tool for the nels with feedback, developing rate-distortion theory, un-
development of numerical algorithms for the deter- derstanding networks consisting of more than one chan-
mination of channel capacity. nel, and so on. Each of these could give rise to highly
24
cm−n m
interesting work on noisy quantum channels. It is also ≥⌊ ⌋−1 (A6)
to be expected that interesting new questions will arise cn n
m
as experimental efforts in the field of quantum informa- ≥ − 2, (A7)
tion develop further. Perhaps of chief interest to us is n
to develop a still clearer understanding of the essential where ⌊x⌋ is the integer immediately below x. Plugging
differences between the quantum noisy channel and the the last inequality into (A5) gives
classical noisy channel problem.
cm cn n
≥ 1− . (A8)
m n m
ACKNOWLEDGMENTS
But −n/m > −ǫ and cn /n ≥ c − ǫ, so
We thank Carlton M. Caves, Isaac L. Chuang, cm
Richard Cleve, David P. DiVincenzo, Christo- ≥ (c − ǫ)(1 − ǫ). (A9)
m
pher A. Fuchs, E. Knill, and John Preskill for many
instructive and enjoyable discussions about quantum This equation holds for all sufficiently large m, and thus
information. This work was supported in part by the cn
Office of Naval Research (Grant No. N00014-93-1-0116). lim inf ≥ (c − ǫ)(1 − ǫ). (A10)
n n
We thank the Institute for Theoretical Physics for its
hospitality and for the support of the National Science But ǫ was an arbitrary number greater than 0, so letting
Foundation (Grant No. PHY94-07194). MAN acknowl- ǫ → 0 we see that
edges financial support from the Australian-American cn cn
Educational Foundation (Fulbright Commission). lim inf ≥ c = lim sup . (A11)
n n n n
This appendix contains a lemma that can be used to APPENDIX B: MAXIMA OF THE COHERENT
prove the existence of several limits that appear in this INFORMATION
paper.
Lemma: Suppose c1 , c2 , . . . is a nonnegative sequence Various convexity and concavity properties are useful
such that cn ≤ kn for some k ≥ 0, and in calculating classical channel capacities, and the same is
true in the quantum situation. This appendix is devoted
cm + cn ≤ cm+n , (A1) to an explication of the basic properties of convexity and
for all m and n. Then concavity related to the coherent information and the re-
lation of these properties to expressions such as (9.12).
cn A convex set, S, is a subset of a vector space such
lim (A2)
n→∞ n that given any two points s1 , s2 ∈ S and any λ such that
exists and is finite. 0 < λ < 1, then the convex combination, λs1 + (1 − λ)s2 ,
Proof is also an element of S. Geometrically, this means that
Define given any two points in the set, the line joining them is
also in the set. An extremal point of S is a point s which
cn
c ≡ lim sup . (A3) cannot be formed from the convex combination of any
n n other two points in the set. A convex function f on S
This always exists and is finite, since cn ≤ kn for some is a real-valued function such that for any λ satisfying
k ≥ 0. Fix ǫ > 0 and choose n sufficiently large that 0 < λ < 1,
cn f (λs1 + (1 − λ)s2 ) ≤ λf (s1 ) + (1 − λ)f (s2 ); (B1)
> c − ǫ. (A4)
n
a concave function satisfies the same condition but with
Suppose m is any integer strictly greater than
the inequality reversed.
max(n, n/ǫ). Then by (A1),
The first useful fact about maxima is the following:
Local maximum is a global maximum: Suppose f is a
cm cn n cm−n
≥ 1+ . (A5) concave function on a convex set S. Then a local maxi-
m n m cn
mum of f is also a global maximum of f .
Using the fact that lcn ≤ cln , (an immediate consequence This follows by supposing that s1 and s2 are distinct
of (A1)) with l = ⌊ m
n ⌋ − 1 gives local maxima. If f (s1 ) < f (s2 ), say, then
25
f (λs1 + (1 − λ)s2 ) ≥ λf (s1 ) + (1 − λ)f (s2 ) (B2) [1] C. E. Shannon, Bell System Tech. J. 27, 379 (1948).
> f (s1 ), (B3) [2] C. E. Shannon and W. Weaver, The Mathematical Theory
of Communication (University of Illinois Press, Urbana,
by concavity of f . By choosing sufficiently small values 1949).
[3] T. M. Cover and J. A. Thomas, Elements of Information
of λ we see that this violates the fact that s1 is a lo-
Theory (John Wiley and Sons, New York, 1991).
cal maximum. Thus f has the same value for all local
[4] B. W. Schumacher, Phys. Rev. A 54, 2614 (1996).
maxima, from which it follows that all local maxima are [5] B. W. Schumacher and M. A. Nielsen, Phys. Rev. A 54,
also global maxima for the function. If the coherent in- 2629 (1996).
formation turns out to be concave in the input density [6] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin, and W. K.
operator, this property will be useful in evaluating ca- Wootters, Phys. Rev. A 54, 3824 (1996).
pacity bounds such as (9.12). [7] S. Lloyd, LANL e-print quant-ph/9604015 (1996).
The following lemma, from [29], is extremely useful [8] C. H. Bennett, D. DiVincenzo, and J. Smolin, LANL e-
in computing the maxima of convex functions on convex print quant-ph/9701015 (1997).
sets. [9] C. H. Bennett et al., Phys. Rev. Lett. 70, 1895 (1993).
Convexity Lemma: Suppose f is a continuous convex [10] M. A. Nielsen and C. M. Caves, LANL e-print quant-
ph/9608001 (1996).
function on a compact, convex set, S. Then there is an
[11] K. Kraus, States, Effects, and Operations (Springer-
extremal point at which f attains its global maximum.
Verlag, Berlin, 1983).
The proof is obvious. The reason for our interest in [12] K. Hellwig and K. Kraus, Commun. Math. Phys. 16, 142
the proof is because for fixed ρ and trace-preserving op- (1970).
erations E, the coherent information I(ρ, E) is a convex, [13] M.-D. Choi, Linear Algebra and Its Applications 10, 285
continuous function of the operation E. The set of trace- (1975).
preserving quantum operations forms a compact, convex [14] M. A. Nielsen, H. Barnum, C. M. Caves, and B. W. Schu-
set, and thus by the convexity lemma, I(ρ, E) attains its macher, unpublished (1997).
maximum for a quantum operation E which is extremal [15] A. Peres, Quantum Theory: Concepts and Methods
in the set of all trace-preserving quantum operations. (Kluwer Academic, Dordrecht, 1993).
Choi [13] has proved that any extremal point in the [16] C. Cohen-Tannoudji, B. Diu, and F. Laloë, Quantum Me-
chanics (John Wiley and Sons, New York, 1977).
set of trace-preserving quantum operations has a set of
[17] R. I. G. Hughes, The Structure and Interpretation of
operation elements {Ai } such that
Quantum Mechanics (Harvard University Press, Cam-
bridge, 1989).
1. There are at most d elements Ai . This is to be [18] G. Lüders, Annalen der Physik 8, 323 (1951).
contrasted with the general situation, where there [19] R. Jozsa, J. Mod. Optics 41, 2315 (1995).
may be up to d2 elements. [20] E. Knill and R. Laflamme, Phys. Rev. A 55, 900 (1997).
[21] M. A. Nielsen, LANL e-print quant-ph/9606012 (1996).
2. The Ai are linearly independent.
[22] A. Wehrl, Rev. Mod. Phys. 50, 221 (1978).
[23] P. W. Shor and J. A. Smolin, LANL e-print quant-
This result provides a considerable saving in the class ph/9604006 (1996).
of quantum operations that must be optimized over in [24] B. Schumacher, Phys. Rev. A 51, 2738 (1995).
order to numerically calculate expressions of the form [25] R. Jozsa and B. Schumacher, J. Mod. Optics 41, 2343
(9.12). Unfortunately, this only takes us part of the way (1994).
towards proving that the expressions (9.12) and (9.2) are [26] H. Barnum, C. A. Fuchs, R. Jozsa, and B. Schumacher,
identically equal, or, alternatively, it suggests a starting Phys. Rev. A 54, 4707 (1996).
point for a search for counterexamples to the proposition [27] J. F. Poyatos, J. I. Cirac, and P. Zoller, Phys. Rev. Lett.
that the two quantities are equal. If the extremal points 78, 390 (1997).
of the set of quantum operations were the unitary oper- [28] I. L. Chuang and M. A. Nielsen, LANL e-print quant-
ations we would be done. However that is not the case, ph/9610001 (1996).
[29] M. Marcus and H. Minc, A Survey of Matrix Theory and
as the above theorem shows.
Matrix Inequalities (Dover, New York, 1992).
26