Professional Documents
Culture Documents
Book - Towards A Design Flow For Reversible Logic - Robert Wille, Rolf Drechsler
Book - Towards A Design Flow For Reversible Logic - Robert Wille, Rolf Drechsler
Towards a
Design Flow for
Reversible Logic
Robert Wille Rolf Drechsler
Institute of Computer Science Institute of Computer Science
University of Bremen University of Bremen
Bibliothekstr. 1 Bibliothekstr. 1
28359 Bremen 28359 Bremen
Germany Germany
rwille@informatik.uni-bremen.de drechsle@informatik.uni-bremen.de
The development of computing machines found great success in the last decades.
But the ongoing miniaturization of integrated circuits will reach its limits in the
near future. Shrinking transistor sizes and power dissipation are the major barriers
in the development of smaller and more powerful circuits. Reversible logic pro-
vides an alternative that may overcome many of these problems in the future. For
low-power design, reversible logic offers significant advantages since zero power
dissipation will only be possible if computation is reversible. Furthermore, quantum
computation profits from enhancements in this area, because every quantum circuit
is inherently reversible and thus requires reversible descriptions.
However, since reversible logic is subject to certain restrictions (e.g. fanout and
feedback are not directly allowed), the design of reversible circuits significantly
differs from the design of traditional circuits. Nearly all steps in the design flow
(like synthesis, verification, or debugging) must be redeveloped so that they become
applicable to reversible circuits as well. But research in reversible logic is still at the
beginning. No continuous design flow exists so far.
In this book, contributions to a design flow for reversible logic are presented. This
includes advanced methods for synthesis, optimization, verification, and debugging.
Formal methods like Boolean satisfiability and decision diagrams are thereby ex-
ploited. By combining the techniques proposed in the book, it is possible to synthe-
size reversible circuits representing large functions. Optimization approaches en-
sure that the resulting circuits are of small cost. Finally, a method for equivalence
checking and automatic debugging allows to verify the obtained results and helps
to accelerate the search for bugs in case of errors in the design. Combining the
respective approaches, a first design flow for reversible circuits of significant size
results.
This book addresses computer scientists and computer architects and does not
require previous knowledge about the physics of reversible logic or quantum com-
putation. The respective concepts as well as the used models are briefly introduced.
v
vi Preface
All approaches are described in a self-contained manner. The content of the book
does not only conveys a coherent overview about current research results, but also
builds the basis for future work on a design flow for reversible logic.
This book is the result of more than three years of intensive research in the area
of reversible logic. During this time, we experienced many support from different
people for which we would like to thank them very much.
In particular, the Group of Computer Architecture at the University of Bremen
earns many thanks for providing a comfortable and inspirational environment. Many
thanks go to Stefan Frehse, Daniel Große, Lisa Jungmann, Hoang M. Le, Sebastian
Offermann, and Mathias Soeken who actively helped in the development of the
approaches described in this book.
Sincere thanks also go to Prof. D. Michael Miller from the University of Victo-
ria, Prof. Gerhard W. Dueck from the University of New Brunswick, and Dr. Mehdi
Saeedi from the Amirkabir University of Technology in Tehran for very fruitful col-
laborations. In this context, we would like to thank the German Academic Exchange
Service (DAAD) which enabled the close contact with the groups in Canada.
Special thanks go to the German Research Foundation (DFG) which funded parts
of this work under the contract number DR 287/20-1.
Finally, we would like to thank Marc Messing who did a great job of proofreading
as well as Christiane and Shawn Mitchell who closely checked the manuscript for
English style and grammar.
vii
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Reversible Functions . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Reversible Circuits . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Quantum Circuits . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Decision Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Binary Decision Diagrams . . . . . . . . . . . . . . . . . . 17
2.2.2 Quantum Multiple-valued Decision Diagrams . . . . . . . 19
2.3 Satisfiability Solvers . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 Boolean Satisfiability . . . . . . . . . . . . . . . . . . . . 21
2.3.2 Extended SAT Solvers . . . . . . . . . . . . . . . . . . . . 22
6 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.1 Adding Lines to Reduce Circuit Cost . . . . . . . . . . . . . . . . 114
6.1.1 General Idea . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.1.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.1.3 Experimental Results . . . . . . . . . . . . . . . . . . . . 117
6.2 Reducing the Number of Circuit Lines . . . . . . . . . . . . . . . 124
6.2.1 General Idea . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2.3 Experimental Results . . . . . . . . . . . . . . . . . . . . 130
6.3 Optimizing Circuits for Linear Nearest Neighbor Architectures . . 131
6.3.1 NNC-optimal Decomposition . . . . . . . . . . . . . . . . 133
6.3.2 Optimizing NNC-optimal Decomposition . . . . . . . . . . 134
6.3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . 138
6.4 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . 138
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Acronyms
xiii
Chapter 1
Introduction
In the last decades, great achievements have been made in the development of com-
puting machines. While computers consisting of a few thousands of components
filled whole rooms in the early 70’s, nowadays billions of transistors are built on
some square millimeters. This is a result of the achievements made in the domain
of semiconductors which are still holding on: The number of transistors in a cir-
cuit doubles every 18 months (which is also known as Moore’s Law according
to the founder of Intel, Gordon E. Moore, who formulated this as a prediction in
1965 [Moo65]).1 Until today, this prediction has not lost any of its validity—each
year more complex systems and chips are introduced.
However, it is obvious that such an exponential growth must reach its limits in
the future—at least when the miniaturization reaches a level, where single transistor
sizes are approaching the atomic scale. Besides that, power dissipation more and
more becomes a crucial issue for designing high performance digital circuits. In the
last decades, the amount of power dissipated in the form of heat to the surrounding
environment of a chip increased by orders of magnitudes. Since excessive heat may
decrease the reliability of a chip (or even destroys it), power dissipation is one of
the major barriers to progress the development of smaller and faster computer chips.
Due to these reasons, some researchers expect that from the 2020s on, duplication
of transistor density will not be possible any longer.
To further satisfy the needs for more computational power, alternatives are
needed that go beyond the scope of “traditional” technologies like CMOS.2 Re-
versible logic marks a promising new direction where all operations are performed
in an invertible manner. That is, in contrast to traditional logic, all computations
can be reverted (i.e. the inputs can be obtained from the outputs and vice versa).
A simple standard operation like the logical AND already illustrates that reversibil-
ity is not guaranteed in traditional circuit systems. Indeed, it is possible to obtain the
inputs of an AND gate if the output is assigned to 1 (then both inputs must be as-
1 Originally, Moore predicted a doubling every 12 months; ten years later he updated to 18 months.
2 CMOS is the abbreviation for Complementary Metal Oxide Semiconductor, the technology
mainly used for today’s integrated circuits.
signed to 1 as well). But, it is not possible to determine the input values if the AND
outputs 0. In contrast, reversible logic allows bijective operations only, i.e. n-input
n-output functions that map each possible input vector to a unique output vector.
This reversibility builds the basis for emerging technologies that may replace or at
least enhance the traditional computer chip.
Two examples of such technologies making use of reversible logic are sketched
in the following:
• Reversible Logic for Low-Power Design
As mentioned above, power dissipation and therewith heat generation is a serious
problem for today’s computer chips. A significant part of energy dissipation is
due to the non-ideal behaviors of transistors and materials. Here, higher levels of
integration and new fabrication processes reduced the heat generation in the last
decade. However, a more fundamental reason for power dissipation arises from
the observations made by Landauer in 1961 [Lan61]. Landauer proved that using
traditional (irreversible) logic, gates always lead to energy dissipation regardless
of the underlying technology. More precisely, exactly k · T · log 2 Joule of energy
is dissipated for each “lost” bit of information during the irreversible operation
(where k is the Boltzmann constant and T is the temperature). While this amount
of power currently does not sound significant, it may become crucial additionally
considering that (1) today millions of operations are performed in some seconds
(i.e. increasing processor frequency multiplies this amount) and (2) more and
more operations are performed with smaller and smaller transistor sizes (i.e. in a
smaller area).
In contrast, Bennett showed that energy dissipation is reduced or even eliminated
if computation becomes information-lossless [Ben73]. This holds for reversible
logic, since data is bijectively transformed without losing any of the original infor-
mation. Bennett proved that circuits with zero power dissipation are only possible
if they are built from reversible gates. In 2002, first reversible circuits have been
built that exploit this observation [DV02]. In fact, these circuits were powered
by their input signals only (i.e. without additional power supplies). In the future,
such circuits may be an alternative that can cope with the heat generation prob-
lem of traditional chips. Furthermore, since reversible circuits already work with
low power, applications are also possible in domains where power is a limited
resource (e.g. for mobile computation).
• Reversible Logic as a Basis for Quantum Computation
Quantum circuits [NC00] offer a new kind of computation. Instead of logic sig-
nals 0 and 1, quantum circuits make use of qubits. A qubit is a two level quantum
system, described by a two dimensional complex Hilbert space. The resulting
tools can be obtained under www.revkit.org. This allows to represent not only 0
and 1 but also a superposition of both. As a result, qubits may represent multi-
ple states at the same time enabling enormous speed-ups in computations. For
example, it has been shown that using a quantum circuit it is possible to solve
the factorization problem in polynomial time, while for traditional circuits only
exponential methods exist [Sho94, VSB+01].
1 Introduction 3
But, research in the area of quantum circuits is still at the beginning. Neverthe-
less, first promising results exist: At the University of Innsbruck one of the first
quantum circuits consisting of 8 qubits was built in 2005. This has been further
improved so that today circuits with dozens of qubits exists—with upward trend.
Even first commercial realizations of quantum circuits (e.g. a random number
generator) are available. Reversible logic is important in this area because ev-
ery quantum operation is inherently reversible. Thus, progress in the domain of
reversible logic can be directly applied to quantum logic.
Besides that, reversible logic additionally finds application in domains like
optical computing [CA87], DNA computing [TS05], as well as nanotechnolo-
gies [Mer93]. Also, cryptography or encoding/decoding methods (e.g. for music
and videos) can profit from enhancements in this area (see e.g. [ML01]). Further-
more, already today reversible operations are used in instruction sets for micropro-
cessors [SL00].
The basic concepts of reversible logic are thereby not new and have already
been introduced in the 60’s by Landauer [Lan61] and were further refined by Ben-
nett [Ben73] and Toffoli [Tof80]. They observed that due to the reversibility, a
straightforward usage of fanouts and feedback is not possible in reversible logic.
Furthermore, new libraries of (reversible) gates have been introduced to represent
invertible operations [Tof80, FT82, Per85, NC00] and it was stated that each re-
versible circuit must be a cascade of these reversible gates.
Even if this still represents the basis for research in the area of reversible logic,
the topic was not intensively studied by computer scientists before the year 2000.
The main reason for that may lie in the fact that applications of reversible logic (in
particular in the domain of quantum computation) have been seen as “dreams of
the future”. But, this changed as with factorization a very important problem (fac-
torization builds the basis for most of the today’s encrypting methods) was solved
on a physically implemented quantum circuit [Sho94, VSB+01]. Therewith, a proof
of concept was available showing that quantum computing, in fact, may be one so-
lution for future computational problems. In particular, this achievement (together
with further ones e.g. in reversible CMOS design as mentioned above) significantly
moved the topic forward so that nowadays reversible logic is seen as a promising re-
search area. As a consequence, in the last years computer scientists have also started
to develop new methods e.g. for synthesis of reversible circuits.
However, no real design flow for reversible logic exists until today. This is crucial
since, due to the mentioned restrictions (e.g. no fanout and feedback), the design of
reversible circuits significantly differs from the design of traditional circuits. Nearly
all elaborated methods for synthesis, verification, debugging, and test available for
traditional circuit design must be redeveloped so that they become applicable to
reversible circuits as well. Now, while applications of reversible logic are starting
to become feasible and traditional technologies more and more suffer from the in-
creasing miniaturization, it is even more necessary to work towards such a flow.
Moreover, considering the traditional design flow, it can be concluded that until to-
day, computer scientists cannot fully exploit the technical state-of-the-art. That is,
the number of transistors that can be physically implemented on a chip grows faster
4 1 Introduction
than the ability to design them in a useful manner (also known as the design gap).
This becomes even more crucial if, additionally, the ability to verify the correct-
ness of the designed circuits is considered (known as the verification gap). Once
reversible logic becomes feasible for large designs in the future, researchers will be
faced with similar challenges. Thus, it is worth working towards a design flow for
reversible logic already today.
First steps in this direction have been made in the domain of synthesis (see
e.g. [SPMH03, MDM05]), verification (see e.g. [VMH07, GNP08]), and test (see
e.g. [PHM04, PBL05, PFBH05]). However, they are all still far away from covering
real design needs. As an example, most synthesis approaches are only applicable
for small functions and often produce circuits with relatively high cost. In contrast,
design methods to create complex circuits and to efficiently verify their correctness
are needed.
This book makes contributions to a future design flow for reversible logic by
proposing advanced methods for synthesis, optimization, verification, and debug-
ging. Figure 1.1 shows the interaction of the proposed steps in an integrated flow.
The left-hand side sketches the restrictions or challenges, respectively, to be solved
in comparison to traditional methods. By combining the techniques proposed in the
book, it is possible to synthesize reversible circuits representing large functions.
Optimization methods ensure that the resulting circuits are of small cost. Finally,
methods for equivalence checking and automatic debugging allow to verify the ob-
tained results and help to accelerate the search for bugs in case of errors in the
design. In the following, the respective contributions are briefly introduced in the
order they appear in this book. A more detailed description of the problems as well
as the proposed solutions is given at the beginning of each chapter.
1 Introduction 5
Altogether, the contributions of this book to the design flow for reversible logic
can be summarized as follows:
• Synthesis methods for large functions (i.e. functions with more than 100 vari-
ables)
• A hardware description language for reversible logic
• Exact approaches for synthesizing minimal circuits that can later be used as build-
ing blocks
• Embedding methods to automatically realize circuits for irreversible functions
• Optimization approaches to reduce the cost with respect to the addressed technol-
ogy
• Equivalence checking of large circuits (i.e. circuits with several thousands of
gates)
• Automatic debugging and fixing of erroneous circuits
All proposed methods have been implemented and experimentally evaluated.
To this end, a uniform format for specifying reversible functions as well as re-
versible circuits has been defined which was used in all experiments throughout
this book (see also the note on benchmarks on p. 26). Furthermore, all benchmark
functions as well as the circuits have been made online available at RevLib under
www.revlib.org. The resulting tools can be obtained under www.revkit.org. This al-
lows other researchers to compare their results with the ones obtained in this work.
However, the results together with a discussion, related work, and future research
directions are, of course, also given in the respective chapters.
According to the outline sketched above, the remainder of this book is structured
as depicted in Fig. 1.2. The next chapter gives a more detailed introduction into
both reversible as well as quantum logic and provides the basic notations and def-
initions as used in the rest of this book. Afterwards, the chapters about synthesis
(Chap. 3), optimization (Chap. 6), as well as verification and debugging (Chap. 7)
can be read independently of each other. Only for Chap. 4 about exact synthesis and
Chap. 5 about embedding irreversible functions is it recommended to read the pre-
vious chapters beforehand. Chapter 8 summarizes all findings and gives directions
for future work.
Chapter 2
Preliminaries
This chapter provides the basic definitions and notations to keep the remaining book
self-contained. The chapter is divided into three parts. In the first section, Boolean
functions, reversible functions, and the respective circuit descriptions are intro-
duced. This builds the basis for all approaches described in this book. Since many
of the proposed techniques exploit decision diagrams and satisfiability solvers, re-
spectively, the basic concepts of these core techniques are also introduced in the last
two sections. All descriptions are thereby kept brief. For a more in-depth treatment,
references to further reading are given in the respective sections.
2.1 Background
Reversible logic realizes bijective Boolean functions. Thus, first the basics regard-
ing Boolean functions are revisited and further extended by a description of the
properties specifically applied to reversible functions. Then, reversible circuits as
well as quantum circuits are introduced which are used as realizations of reversible
functions.
0 0 0 0 0 0 0 1
0 1 0 0 1 1 1 0
1 0 0 1 0 1
1 1 1 1 1 1
Example 2.1 Table 2.1 shows the truth tables of the operations AND, OR, and NOT,
respectively. Each truth table has 2n rows, showing the mapping of each input pat-
tern to the respective output pattern.
Taking AND, OR, and NOT as a basis, every Boolean function can be derived.
For example, the often used functions XOR, implication, and equivalence are de-
rived as follows:
• XOR: x1 ⊕ x2 := (x1 ∧ x 2 ) ∨ (x 1 ∧ x2 )
• Implication: x1 ⇒ x2 := x 1 ∨ x2
• Equivalence: x1 ⇔ x2 := x1 ⊕ x2
So far, single-output functions have been introduced. However, in practice also
multi-output functions are widely used.
Example 2.2 Table 2.2(a) shows the truth table of a 3-input, 2-output function rep-
resenting the adder function.
Example 2.3 Table 2.2(c) shows a 3-input, 3-output function. This function is re-
versible, since each input pattern maps to a unique output pattern. In contrast the
function depicted in Table 2.2(a) is irreversible, since n
= m. Moreover, also the
2.1 Background 9
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 0
0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0
0 1 1 1 0 0 1 1 0 1 1 0 1 1 1 0 1
1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 0 1
1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1
1 1 0 1 0 1 1 0 1 1 1 1 1 0 1 1 0
1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1
function in Table 2.2(b) is irreversible. Here, the number n of inputs indeed is equal
to the number m of outputs, but there is no unique input-output mapping. For exam-
ple, both inputs 000 and 001 map to the output 000.
library depicted in Fig. 2.1 is used. This includes gates for the operations AND,
OR, and NOT, based on which any Boolean function can be realized. Furthermore,
fanouts are applied to use signal values more than once.
In contrast, to realize reversible logic some restrictions must be considered:
fanouts and feedback are not directly allowed, since they would destroy the re-
versibility of the computation [NC00]. Also, the gate library from above as well
as the traditional design flow cannot be utilized. As a result, a cascade structure
over reversible gates is the established model to realize reversible logic.
Example 2.4 Figure 2.2 shows a Toffoli gate (a), a Fredkin gate (b), and a Peres
gate (c) together with a truth table of its functionality. A ● is used to indicate a
control line, while an ⊕ (×) is used for denoting the target line of a Toffoli and
Peres gate (Fredkin gate).
Remark 2.1 These definitions also provide the basis for other gate types. For exam-
ple, the Toffoli gate builds the basis for the NOT gate (a Toffoli gate with no control
lines, i.e. with C = ∅), for the controlled-NOT gate (a Toffoli gate with one control
2.1 Background 11
line),1 as well as for the Toffoli gate as originally proposed in [Tof80]. In contrast,
the Fredkin gate builds the basis for a SWAP gate (a Fredkin gate with C = ∅, i.e. an
interchanging of two lines).
In the following, the notations MCT(C, xj ), MCF(C, xj1 , xj2 ), and P (xi , xj1 , xj2 )
are used to denote a Toffoli, Fredkin, and Peres gate, respectively. The number of
control lines, a Toffoli (Fredkin) gates consists of, defines the size of the gate.
Using these gate types, universal libraries can be composed. A gate library is
called universal, if it enables the realization of any reversible function. For example,
it has been proven that every reversible function can be realized using MCT gates
only [MD04b]. Also, the gate library consisting of NOT, CNOT, and two-controlled
Toffoli gates is universal [SPMH03]. In contrast, a library including only CNOT
gates allows the realization of linear reversible functions only [PMH08].
Example 2.5 Figure 2.3 shows reversible circuits realizing the function depicted in
Table 2.2(c) with the help of Toffoli and Fredkin gates, respectively.
0 1 3
1 1 7
2 5 15
d−1
c= ci ,
i=0
The concrete cost for a single gate of course depends on the respective type but
also on the addressed technology. In this book, the following cost metrics are used:
• Gate count denotes the number of gates the circuit consists of (i.e. ci = 1 and
c = d).
• Quantum cost denotes the effort needed to transform a reversible circuit to a quan-
tum circuit (see also next section). Table 2.3 shows the quantum cost for a selec-
tion of Toffoli and Fredkin gate configurations as introduced in [BBC+95] and
further optimized in [MD04a] and [MYDM05]. As can be seen, gates of larger
size are considerably more expensive than gates of smaller size. The Peres gate
2.1 Background 13
represents a special case, since it has quantum cost of 4, while the realization with
two Toffoli gates would imply a cost of 6.
• Transistor cost denotes the effort needed, to realize a reversible circuit in CMOS
according to [TG08]. The transistor cost of a reversible gate is 8 · s where s is the
number of control lines.
Example 2.6 Consider the circuits from Example 2.5 depicted in Fig. 2.3. The Tof-
foli circuit has a gate count of 6, quantum cost of 10, and transistor cost of 56, while
the Fredkin circuit has a gate count of 3, quantum cost of 13, and transistor cost
of 8, respectively.
As can be seen, the costs significantly differ depending on the applied cost model.
Even if the number of gates in a cascade is a very simple measure of its complexity,
it is the most technology-independent metric. Thus, the gate count is often used to
evaluate the quality of a reversible circuit. Besides that, also the quantum cost metric
is popular because it represents a measure for the most intensely studied application
(namely quantum computation) and considers larger gates to be more costly. The
transistor cost model is a relatively new model that arose with the application of
reversible circuits to the area of low-power CMOS design. In this book, gate count
and quantum cost are primarily considered as it allows a fair comparison of synthesis
results with respect to previous work. Transistor costs are additionally addressed
where appropriate.
Finally, a special property of reversible logic is reviewed:
Proof Each reversible gate realizes a reversible function. That is, for each input
pattern a unique output pattern, i.e. a one-to-one mapping, exists. Thus, calculating
the inverse of the function f for an output pattern is essentially the same operation
as propagating this pattern backwards through the circuit.
Definition 2.8 A qubit is a two level quantum system, described by a two dimen-
sional complex Hilbert space. The two orthogonal quantum states
1 0
|0 ≡ and |1 ≡
0 1
are used to represent the Boolean values 0 and 1. Any state of a qubit may be written
as |Ψ = α|0 + β|1, where α and β are complex numbers with |α|2 + |β|2 = 1.
The quantum state of a single qubit is denoted by the vector
α
.
β
The state of a quantum system with n > 1 qubits is given by an element of the
tensor product of the respective state spaces and can be represented as a normalized
vector of length 2n , called the state vector. The state vector is changed through mul-
tiplication of appropriate 2n × 2n unitary matrices. Thus, each quantum computation
is inherently reversible but manipulates qubits rather than pure logic values. At the
end of the computation, a qubit can be measured. Then, depending on the current
state of the qubit a 0 (with probability of |α|2 ) or a 1 (with probability of |β|2 )
returns, respectively. After the measurement, the state of the qubit is destroyed.
In other words, using quantum computation and qubits in superposition, func-
tions can be evaluated with different possible input assignments in parallel. But,
it is not possible to obtain the current state of a qubit. Instead, if a qubit is mea-
sured, either 0 or 1 is returned depending on the respective probability. Nevertheless,
researchers exploited quantum computation (in particular superposition) to solve
many practically relevant problems faster than by traditional computing machines.
For example, it was possible to solve the factorization problem in polynomial time—
for traditional machines only exponential algorithms are known. Even if the research
in this area is still at the beginning (so far, quantum algorithms with only up to 28
qubits have been implemented), these first promising results motivate further re-
search in this area.
The focus of this book is how to design reversible and quantum circuits, respec-
tively. Thus, in the following the model for quantum circuits as used in this book
is introduced. For a more detailed treatment of the respective physical background,
the reader is referred to [Pit99, NC00, Mer07].
Example 2.7 Figure 2.6 shows a quantum circuit realizing the reversible function
depicted in Table 2.2(c).
All quantum gates are assumed to be the basic blocks of each quantum computa-
tion. This is also reflected in the cost metric.
Definition 2.10 Each quantum gate has cost of 1. Thus, the cost of a quantum circuit
is defined by the number d of its gates.
Remark 2.2 In previous work, also an extended cost metric has been applied: When
a CNOT and a V (or V+) gate are applied to the same two qubits, the cost of the
pair can be considered unit as well [SD96, HSY+06]. The possible pairs (denoted
by double gates in the following) are shown in Fig. 2.7. In this book, primarily the
cost metric from Definition 2.10 is applied. However, all approaches can also be
extended to consider unit cost of double gates. Exemplarily, this is shown for exact
synthesis of quantum circuits in Sect. 4.2.2.
Since quantum circuits are inherently reversible, every reversible circuit can be
transformed to a quantum circuit. To this end, each gate of the reversible circuit is
decomposed into a cascade of quantum gates.
Example 2.8 Figure 2.8(a) (Fig. 2.8(b)) shows the quantum gate cascade which can
be used to transform a Toffoli (Fredkin) gate to a quantum circuit. As can be seen,
the number of required quantum gates is equal to the quantum cost of the Toffoli
(Fredkin) gate as introduced in Table 2.3.
large functions in a more compact way than truth tables. In the past, several types
of decision diagrams have been introduced. In this book, Binary Decision Dia-
grams (BDDs) [Bry86] are considered to represent Boolean functions. Quantum
Multiple-valued Decision Diagrams (QMDDs) [MT06, MT08] are used to repre-
sent reversible functions that may include quantum operations. Both are briefly in-
troduced in this section.
Definition 2.11 A Binary Decision Diagram (BDD) over Boolean variables X with
terminals T = {0, 1} is a directed acyclic graph G = (V , E) with the following prop-
erties:
1. Each node v ∈ V is either a terminal or a non-terminal.
2. Each terminal node v ∈ V is labeled by a value t ∈ T and has no outgoing edges.
3. Each non-terminal node v ∈ V is labeled by a Boolean variable xi ∈ X and rep-
resents a Boolean function f .
4. In each non-terminal node (labeled by xi ), the Shannon decomposition [Sha38]
f = x i fxi =0 + xi fxi =1
is carried out, leading to two outgoing edges e ∈ E whose successors are denoted
by low(v) (for fxi =0 ) and high(v) (for fxi =1 ), respectively.
The size of a BDD is defined by the number of its (non-terminal) nodes.
A BDD is called free if each variable is encountered at most once on each path
from the root to a terminal node. A BDD is called ordered if in addition all variables
are encountered in the same order on all such paths. The respective order is defined
by π : {1, . . . , n} → {1, . . . , n}. Finally, a BDD is called reduced if it does neither
contain isomorphic sub-graphs nor redundant nodes. To achieve reduced BDDs,
reduction rules as depicted in Fig. 2.10 are applied. Applying the reduction rules
leads to shared nodes, i.e. nodes that have more than one predecessor.
Example 2.10 Figure 2.11 shows two reduced ordered BDDs representing the func-
tion f = x1 ·x2 +x3 ·x4 +· · ·+xn−1 ·xn . For the order x1 , x2 , . . . , xn−1 , xn , the BDD
depicted in Fig. 2.11(a) has a size of O(n), while the BDD depicted in Fig. 2.11(b)
with the order x1 , x3 , . . . , xn−1 , x2 , x4 , . . . , xn has size of O(2n ).
Remark 2.3 In the following, reduced ordered binary decision diagrams are called
BDDs for brevity. BDDs are canonical representations, i.e. for a given Boolean func-
tion and a fixed order, the BDD is unique [Bry86].
As shown by Example 2.10, BDDs are very sensitive to the chosen variable order.
It has been shown in [BW96] that proving the existence of a BDD with a lower
number of nodes (i.e. proving that no other order leads to a smaller BDD size) is
NP-complete. As a consequence, several heuristics to find good orders have been
proposed. In particular, sifting [Rud93] has been shown to be quite effective.
Further reductions of the BDD size can be achieved, if complement edges
[BRB90] are applied. They allow to represent a function as well as its comple-
ment by one single node only. BDDs can also be used to represent multi-output
functions. Then, all BDDs for the respective functions are shared, i.e. isomorphic
sub-functions are represented by a single node as well.
For a more comprehensive introduction into BDDs, the reader is refereed
to [DB98, EFD05]. For the application of BDDs in practice, many well-engineered
BDD packages (e.g. CUDD [Som01]) are available.
2.2 Decision Diagrams 19
Example 2.11 Figure 2.12(a) shows a V gate in a 3-line circuit. The unitary ma-
trix describing the behavior of this gate is given in Fig. 2.12(b) where v = 1+i2 and
v = 1−i
2 . The QMDD for this matrix is given in Fig. 2.12(c). The edges from each
non-terminal node point to four sub-matrices indexed 0, 1, 2, 3 from left to right.
Each edge has a complex-valued weight. For clarity, edges with weight 0 are indi-
cated as stubs. In fact, they point to the terminal node.
The key features of QMDD are evident in this example. There is a single termi-
nal node. Furthermore, each edge has a complex-valued weight. Each non-terminal
node represents a matrix partitioning. For example, the top node in Fig. 2.12(c) rep-
resents the partitioning shown in Fig. 2.12(b). The non-terminal nodes lower in the
diagram represent similar partitioning of the resulting sub-matrices. The represen-
tation of common sub-matrices is shared. To ensure the uniqueness of the represen-
tation, edges with weight 0 must point to the terminal node and normalization is
applied to non-terminal nodes so that the lowest indexed edge with non-zero weight
has weight 1.
As for BDDs, an efficient implementation exists also for QMDDs. However,
since QMDD involve multiple edges from nodes and are applicable to both binary
2.3 Satisfiability Solvers 21
and multiple-valued problems, the QMDD package is not built using a standard
decision diagram package. Nevertheless, the implementation employs well-known
decision diagram techniques like sharing, reordering, and so on. For a more com-
prehensive introduction into QMDDs, the reader is referred to [MT08].
The methods described in this book make use of techniques for solving the Boolean
satisfiability problem (SAT problem). The SAT problem is one of the central
NP-complete problems. In fact, it was the first known NP-complete problem that
was proven by Cook in 1971 [Coo71]. Despite this proven complexity, efficient solv-
ing algorithms have been developed that found great success as proof engines for
many practically relevant problems. Today there exists algorithms exploiting SAT
that solve many practical problem instances, e.g. in the domain of automatic test pat-
tern generation [Lar92, DEF+08], logic synthesis [ZSM+05], debugging [SVAV05],
and verification [BCCZ99, CBRZ01, PBG05].
In this section, the SAT problem, the respective solving algorithm, and its ap-
plication are introduced. Furthermore, extended SAT solvers additionally exploiting
bit-vector logic, quantifiers, or problem-specific modules, respectively, are briefly
reviewed. These engines are used later as core techniques for selected steps in the
proposed flow for reversible logic.
In other words, SAT asks if ∃Xh for an h over variables X and determines a
satisfying assignment in this case. In this context, the Boolean formula h is often
given in Conjunctive Normal Form (CNF). A CNF is a set of clauses, each clause
is a set of literals, and each literal is a Boolean variable or its negation. The CNF
formula is satisfied if all clauses are satisfied, a clause is satisfied if at least one of
its literals is satisfied, and a variable is satisfied when 1 is assigned to the variable
(the negation of a variable is satisfied under the assignment 0).
To solve SAT problems, in the past several (backtracking) algorithms (or SAT
solvers, respectively) have been proposed [DP60, DLL62, MS99, MMZ+01, GN02,
ES04]. Most of them apply the steps depicted in Fig. 2.13: While there are free
variables left (a), a decision is made (c) to assign a value to one of these variables.
Then, implications are determined due to the last assignment (d). This may cause
a conflict (e) that is analyzed. If the conflict can be resolved by undoing assign-
ments from previous decisions, backtracking is done (f). Otherwise, the instance is
unsatisfiable (g). If no further decision can be made, i.e. a value is assigned to all
variables and this assignment did not cause a conflict, the CNF is satisfied (b). Ad-
vanced techniques like e.g. efficient Boolean constraint propagation [MMZ+01] or
conflict analysis [MS99] as well as efficient decision heuristics [GN02] are common
in state-of-the-art SAT solvers today.
These techniques as well as the tremendous improvements in the performance of
the respective implementations [ES04] enable the consideration of problems with
more than hundreds of thousands of variables and clauses. Thus, SAT is widely used
in many application domains. Therefore, the real world problem is transformed into
CNF and then solved by using a SAT solver as a black box.
Despite their efficiency, Boolean SAT solvers have a major drawback: they work
on the Boolean level. But, many problems are formulated on a higher level of
abstraction and would benefit from a more general description, respectively. As a
consequence, researchers investigated the use of more expressive formulations than
CNF—by still exploiting the established SAT techniques. This leads (1) to the com-
bination of SAT solvers with decision procedures for decidable theories resulting
2.3 Satisfiability Solvers 23
in SAT Modulo Theories (SMT) [BBC+05, DM06b] and (2) to the application of
quantifiers resulting in Quantified Boolean Formulas (QBF) [Bie05, Ben05]. Fur-
thermore, problem-specific knowledge is exploited during the solving process by
the SAT solver SWORD [WFG+07]. The respective concepts are briefly reviewed in
the following.
An SMT solver integrates a Boolean SAT solver with other solvers for specialized
theories (e.g. linear arithmetic or bit-vector logic). The SAT solver thereby works on
an abstract representation (still in CNF) of the problem and steers the overall search
process, while each (partial) assignment of this representation has to be validated by
the theory solver for the theory constraints. Thus, advanced SAT techniques together
with specialized theory solvers are exploited.
In this book, the theory of quantifier free bit-vector logic (QF_BV) is utilized.
This logic is defined as follows:
Example 2.13 Let a, b, and c be three bit-vector variables with bit-width n = 3 and
(a ∨ b = c) ∧ (a + b = c) an SMT bit-vector instance over these variables. Then,
a = (010), b = (001), and c = (011) is a satisfying solution of this instance, since it
satisfies each constraint.
Example 2.14 Let ∃x2 , x3 ∀x1 (x1 + x2 + x 3 )(x 1 + x3 )(x 2 + x3 ). Then x2 = 1 and
x3 = 1 is a satisfying assignment for the QBF h. The value of x2 ensures that the
first clause becomes satisfied, while x3 ensures this for the remaining two clauses
for all possible assignments to x1 .
Obviously, solving QBF problems is significantly harder than solving pure SAT
instances. In fact, it is PSPACE-complete [Pap93]. Nevertheless, QBF enables the
formulation of many problems in a more compact way. In this sense, complexity is
moved from the problem formulation to the solving engine, i.e. the task can be for-
mulated in a more compact way resulting in a more complex problem to be solved
by the solver. However, since usually solving engines are well-engineered with re-
spect to the dedicated problem, this may lead to a faster solving process. Today,
recent solvers (e.g. [Bie05, Ben05]) exploit techniques like symbolic skolemization
to solve QBF instances (i.e. converting the instance to a normal form which enables
simplifications).
Due to the translation of the problem into CNF (or QF_BV logic, respectively),
problem-specific knowledge is lost. More illustrative, decisions, implications, and
learning schemes can only exploit the Boolean (bit-vector) description. In contrast,
with more problem-specific knowledge available, more options exist how to control
the search space traversion. This observation is exploited by the problem-specific
SAT solver SWORD [WFG+07].2
SWORD represents the problem in terms of so called modules. Each module de-
fines an operation over bit vectors of module variables. Each module variable is a
Boolean variable. By this, structural and semantical knowledge is available which
can be exploited by special algorithms for each kind of module. Furthermore, this
2 SWORD has been co-developed by the authors of this book. Even if SWORD is focused on
problem-specific knowledge, it can also be used as an SMT solver and already participated in the
respective SMT competitions in 2008 [WSD08] and 2009 [JSWD09], respectively.
2.3 Satisfiability Solvers 25
Due to the two operation levels, problem-specific strategies e.g. for decision mak-
ing and propagation can be exploited by the modules. For example, decision making
can be prioritized so that modules, which are assumed to be “more important” than
others, are selected for a decision with a higher priority than less important modules.
Furthermore, different modules can be equipped with different strategies. For a more
detailed description of SWORD, the reader is referred to [WFG+07, WFG+09].
Therewith, all preliminaries required for this book have been introduced. Be-
sides an introduction of reversible and quantum logic, also the applied core tech-
niques have been briefly described. With that as a basis, the contributions towards
a design flow for reversible logic are proposed in the following chapters. Decision
diagrams are thereby applied for synthesis (Sect. 3.2), partially for exact synthesis
(Sect. 4.3.2), and for verification (Sect. 7.1.2), while Boolean satisfiability is ex-
ploited in exact synthesis (especially in Sect. 4.2 and partially in Sect. 5.3 as well as
Sect. 6.3), verification (Sect. 7.1.3), and debugging (Sect. 7.2). The extended SAT
solvers (i.e. SMT solvers, QBF solvers, and SWORD) are used to improve exact
synthesis (Sect. 4.3).
Synthesis is the most important step while building complex circuits. Considering
the traditional design flow, synthesis is carried out in several individual steps such
as high-level synthesis, logic synthesis, mapping, and routing (see e.g. [SSL+92]).
To synthesize reversible logic, adjustments and extensions are needed. For example,
further tasks such as embedding of irreversible functions must be added. Further-
more, throughout the whole flow, the restrictions caused by the reversibility (no
fanout and feedback) and a completely new gate library must be considered as well.
In the last years, first approaches addressing some of these issues have been in-
troduced (see e.g. [SPMH03, MMD03, MD04b, Ker04, MDM05, GAJ06, HSY+06,
MDM07]). The first section of this chapter briefly reviews existing methods for the
individual steps. However, the research in this area is still at the beginning. So far,
the desired behavior of the circuit to be synthesized is given by function descriptions
like truth tables or permutations, respectively. As a result, current synthesis methods
are applicable to relatively small functions only and often need a significant amount
of run-time. This must be improved in order to design larger functions or complex
reversible systems in the future.
In this book, the wide area of reversible logic synthesis is covered by the follow-
ing three chapters—each chapter with an own detailed view on a particular aspect.
While Chap. 4 introduces exact (i.e. minimal) circuit synthesis, Chap. 5 discusses
aspects of embedding in detail. The present chapter builds the basis for them and ad-
ditionally proposes new methods that allows a fast synthesis of significantly larger
functions and more complex circuits, respectively. Since Toffoli circuits as intro-
duced in Sect. 2.1.2 generally build the basis for both, reversible and quantum cir-
cuits, in the following, the focus is on synthesis of Toffoli cascades. Nevertheless,
quantum circuit synthesis is additionally considered when it is appropriate.
As already mentioned, the first part of this chapter builds the basis for all re-
maining synthesis sections. Here, it is shown how irreversible functions must be
embedded into reversible ones before they can be applied to existing synthesis meth-
ods. Then, by using the example of the transformation-based approach introduced
in [MMD03], one of the previous synthesis methods is described and discussed.
Altogether, this briefly summarizes the basic synthesis steps for reversible logic as
they exist today.
R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 27
DOI 10.1007/978-90-481-9579-4_3, © Springer Science+Business Media B.V. 2010
28 3 Synthesis of Reversible Logic
Motivated by this (in particular by the limitations of the current synthesis meth-
ods), the second part of this chapter introduces a new synthesis approach [WD09,
WD10] that exploits Binary Decision Diagrams (BDDs) [Bry86]. BDDs allow an
efficient representation of large Boolean functions that can be mapped into re-
versible cascades. As a result, for the first time Toffoli circuits for functions con-
taining over 100 variables can be derived efficiently.
Finally, how to specify and synthesize more complex reversible circuits at higher
abstractions is considered in the third part of this chapter. For this purpose, a new
programming language (called SyReC) and a respective hierarchical synthesis ap-
proach is presented and evaluated [WOD10].
This section illustrates the current synthesis steps that use well-established meth-
ods. First, the problem of embedding irreversible functions is considered. Second,
the synthesis itself is introduced. For the latter, a widely known approach, namely
the transformation-based approach introduced in [MMD03], is used. Most of the re-
maining synthesis methods apply similar strategies (e.g. [Ker04, GAJ06, MDM07])
or are developed on top of this method (e.g. [MDM05]).
Table 3.1 shows the truth table of a 1-bit adder which is used as an example in this
section. The adder has three inputs (the carry-in cin as well as the two summands x
and y) and two outputs (the carry-out cout and the sum). The adder obviously is
irreversible, since
• the number of inputs differs from the number of outputs and
• there is no unique input-output mapping.
Even adding an additional output to the function (leading to the same number
of inputs and outputs) would not make the function reversible. Then, without loss
of generality, the first four lines of the truth table can be embedded with respect to
reversibility as shown in the rightmost column of Table 3.1. However, since cout = 0
and sum = 1 already appeared two times (marked bold), no unique embedding for
the fifth line is possible any longer. The same also holds for the lines marked italic.
This has already been observed in [MD04b]. Here, the authors came to the con-
clusion that at least log2 (μ) additional (garbage) outputs are required to make
an irreversible function reversible, where μ is the maximum number of times an
output pattern is repeated in the truth table. Since for the adder at most three out-
put patterns are repeated, log2 (3) = 2 additional outputs are required to make the
function reversible.
3.1 Current Synthesis Steps 29
0 0 0 0 0 0
0 0 1 0 1 0
0 1 0 0 1 1
0 1 1 1 0 0
1 0 0 0 1 ?
1 0 1 1 0 1
1 1 0 1 0 ?
1 1 1 1 1 1
0 0 0 0 0 0 0 0
0 0 0 1 0 1 1 1
0 0 1 0 0 1 1 0
0 0 1 1 1 0 0 1
0 1 0 0 0 1 0 0
0 1 0 1 1 0 1 1
0 1 1 0 1 0 1 0
0 1 1 1 1 1 0 1
1 0 0 0 1 0 0 0
1 0 0 1 1 1 1 1
1 0 1 0 1 1 1 0
1 0 1 1 0 0 0 1
1 1 0 0 1 1 0 0
1 1 0 1 0 0 1 1
1 1 1 0 0 0 1 0
1 1 1 1 0 1 0 1
Adding new lines causes constant inputs and garbage outputs. The value of the
constant inputs can be chosen by the designer. Garbage outputs are by definition
don’t cares and thus can be left unspecified leading to an incompletely specified
function. However, many synthesis approaches require a completely specified func-
tion so that often all don’t cares must be assigned to a concrete value.
As a result, the adder is embedded in a reversible function including four vari-
ables, one constant input, and two garbage outputs. A possible assignment to the
constant as well as the don’t care values is depicted in Table 3.2 (where the original
adder function is marked bold). In the following, a synthesis method is introduced
assuming a completely specified reversible function as input. However, the concrete
embedding of irreversible functions (in particular the concrete assignment to don’t
30 3 Synthesis of Reversible Logic
cares) can have a significant impact on the synthesis results (i.e. on the number of
gates in the resulting circuit). Thus, this issue is again considered in Chap. 5 which
also provides examples showing the effect of different embeddings.
In this section, synthesis of reversible logic is exemplarily described using the ap-
proach from [MMD03]. The basic idea is to traverse each line of the truth table and
to add gates to the circuit until the output values match the input values (i.e. until the
identity is achieved). Gates are thereby chosen so that they don’t alter already con-
sidered lines. Furthermore, gates are added starting at the output side of the circuit
(this is because output values are transformed until the identity is achieved).
In the following, the approach is described using the example of the embedded
adder from Table 3.2. Table 3.3 shows the respective steps. The first column denotes
the truth table line numbers, while the second and third column give the function
specification of the adder. For brevity, the inputs 0, cin , x, y and the outputs cout ,
sum, g1 , g2 are denoted by a, b, c, d, respectively. The remaining columns provide
the transformed output values for the respective steps.
The approach starts at truth table line 0. Since for this line the input is already
equal to the output (both are assigned to 0000), no gate has to be added. In con-
trast, to match the output with the input in line 1, the values for c and b must be
3.2 BDD-based Synthesis 31
inverted. To this end, two gates MCT({d}, c) (1st step) and MCT({d}, b) (2nd step)
are added as depicted in Fig. 3.1. Due to the control line d, this does not affect
the previous truth table line. In line 2 and line 3, an MCT({c}, b) as well as an
MCT({c, d}, a) is added to match the values of b and a, respectively (step 3 and 4).
For the latter gate, two control lines are needed to keep the already traversed truth
table lines unaltered. Afterwards, only two more gates MCT({d, b}, a) (5th step)
and MCT({c, b}, a) (6th step) are necessary to achieve the input-output identity.
The resulting circuit is shown in Fig. 3.1. This circuit consists of six gates and has
quantum cost of 18.
In [MMD03], further variations of this approach are discussed. In fact, this trans-
formation can also be applied in the inverse direction (i.e. so that the input must
match the output) and in both directions simultaneously. Furthermore, in [MDM05]
the approach has been extended by the application of templates. These help to re-
duce the size of the resulting circuits and thus to achieve circuits with lower cost.
Having this as a general introduction into synthesis of reversible logic, in the
following new synthesis approaches are proposed.
causing fanouts (which are not allowed in reversible logic), this may require addi-
tional circuit lines.
As a result, circuits composed of Toffoli or quantum gates, respectively, are ob-
tained in time and with memory linear to the size of the BDD. Moreover, since the
size of the resulting circuit is bounded by the BDD size, theoretical results known
from BDDs (see e.g. [Weg00, LL92]), can be transferred to reversible circuits. The
experiments show significant improvements (with respect to the resulting circuit
cost as well as to the run-time) in comparison to previous approaches. Furthermore,
for the first time large functions with more than hundred variables can be synthe-
sized at very low run-time.
In the remainder of this section, the BDD-based synthesis approach is introduced
as follows: In Sect. 3.2.1, the general idea and the resulting synthesis approach is de-
scribed in detail. How to exploit BDD optimizations is shown in Sect. 3.2.2, while
Sect. 3.2.3 briefly reviews some of the already known theoretical results from re-
versible logic synthesis and introduces bounds which follow from the new synthesis
approach. Finally, in Sect. 3.2.4 experimental results are given.
Example 3.1 Consider the BDD in Fig. 3.2(a). Applying the substitutions given in
Table 3.4 to each node of the BDD, the Toffoli circuit depicted in Fig. 3.2(b) results.
1 For the same reason, it is also not possible to preserve the values for low(v) or high(v), respec-
Table 3.5 (Partial) Truth (a) w/o add. line (b) with additional line
tables for node v with
high(v) = 0 xi low(f ) f – 0 xi low(f ) f xi low(f )
0 0 0 0 0 0 0 0 0 0
0 1 1 1 0 0 1 1 0 1
1 0 0 1 0 1 0 0 1 0
1 1 0 ? 0 1 1 0 1 1
To build compact BDDs, current state-of-the-art BDD packages exploit several op-
timization techniques such as shared nodes [Bry86], complement edges [BRB90],
or reordering [Bry86, Rud93]. In this section, it is shown how these techniques can
be applied to the proposed BDD-based synthesis.
If a node v has more than one predecessor, then v is called a shared node. The
application of shared nodes is common for nearly all BDD packages. Shared nodes
can be used to represent a sub-formula more than once without the need to rebuild
the whole sub-graph. In particular, functions f : Bn → Bm (i.e. functions with more
than one output) can be represented more compactly using shared nodes.
3.2 BDD-based Synthesis 35
However, to apply shared nodes in reversible logic synthesis, the output value of
a respective node has to be preserved until it is not needed any longer. Considering
the substitutions depicted in Table 3.4, this holds for all cases where one of the edges
low(v) or high(v) lead to a terminal node. Here, all values of the inputs (in particular
of high(v) or low(v) that represent output values of other nodes) are preserved.
In contrast, this is not the case for the general case (first row of Table 3.4). Here,
only one value (namely the value from the select variable xi ) is preserved. Thus, a
modified substitution for shared nodes without terminals as successors is required.
Figures 3.3(a) and 3.3(b) show one possible substitution to a reversible cascade
and a quantum cascade, respectively. Besides an additional constant circuit line, this
requires one (three) additional reversible gates (quantum gates) in comparison to
the substitution of Table 3.4. But therefore, shared nodes are supported. Moreover,
this substitution also allows to represent the identity of a select variable (last row of
Table 3.4) by the respective input line of the circuit (i.e. without any additional gates
or lines). Previously, this was not possible, since the value of this circuit line was
not necessarily preserved (as an example see Fig. 3.2 where the value of the identity
node f gets lost after node f is substituted).
Exploiting this, the synthesis algorithm proposed in the last section can be im-
proved as follows: Again a BDD for the function to be synthesized is build which is
afterwards traversed in a depth-first manner. Then, for each node v ∈ V , the follow-
ing checks are performed:
1. Node v represents the identity of a primary input (i.e. the select input)
In this case no cascade of gates is added to the circuit, since the identity can be
represented by the same circuit line as the input itself.
2. Node v contains at least one edge (low(v) or high(v), respectively) leading to a
terminal
In this case substitutions as depicted in Table 3.4 are applied, since they often
need a smaller number of gates and additionally preserve the values of all input
signals.
3. The values of low(v) and high(v) are still needed, since they represent either
shared nodes or the identity of an input variable
In this case the substitutions depicted in Fig. 3.3 are applied, since they preserve
the values of all input signals.
4. Otherwise
The substitution as depicted in the first row of Table 3.4 is applied, since no
input values must be preserved or a terminal successor occurs, respectively. In
36 3 Synthesis of Reversible Logic
this case, the smaller cascades (with respect to both the number of additional
lines and the number of gates) are preferred.
Example 3.2 In Fig. 3.4(a) a partial BDD including a shared node f is shown.
Since the value of node f is used twice (by nodes f1 and f2 ), an additional line
(the second one in Fig. 3.4(b)) and the cascade of gates as depicted in Fig. 3.3 are
applied to substitute node f1 . Then, the value of f is still available such that the
substitution of node f2 can be applied. The resulting circuit is given in Fig. 3.4(b).
Figure 3.4(c) shows the resulting circuit for low(f ) = 0 and high(f ) = 1,
i.e. for f representing the identity of xj . In this case no gates for f are added.
Instead, the fifth line is used to store the value for both, xj and f . Besides that, the
remaining substitutions are equal to the ones described above.
Further reductions in BDD sizes can be achieved if complement edges [BRB90] are
applied. In particular, this allows to represent a function as well as its negation by
a single node only. If there is a complement edge e.g. between v and low(v), then
Shannon decomposition with an inverted value of low(v) is applied. To support
3.2 BDD-based Synthesis 37
Finally, different BDD orders may influence the synthesis results. It has been
shown that the order of the variables has a high impact to the size of the result-
ing BDD [Bry86] (see e.g. Fig. 2.11 on p. 19). Since reducing the number of nodes
may also reduce the size of the resulting circuits, reordering is considered in this
section.
In the past, several approaches have been proposed to achieve good orders
(e.g. sifting [Rud93]) or to determine exact results (e.g. [DDG00]) with respect to
the number of nodes. All these techniques can be directly applied to the BDD-based
synthesis approach and need no further adjustments of the already introduced sub-
stitutions.
Using these optimization techniques (i.e. shared nodes, complement edges, and
reordering), in Sect. 3.2.4 it is considered how they influence the resulting Toffoli
or quantum circuits, respectively. But before, it is briefly shown how the proposed
approach can be used to transfer theoretical results from BDDs to reversible logic.
In the past, first lower and upper bounds for the synthesis of reversible functions
containing n variables have been determined. In [MD04b], it has been shown that
there exists a reversible function that requires at least (2n / ln 3) + o(2n ) gates (lower
bound). Furthermore, the authors proved that every reversible function can be real-
ized with no more than n · 2n gates (upper bound). For a restricted gate library
leading to smaller quantum cost and thus only consisting of NOT, CNOT, and
two-controlled Toffoli gates (the same as applied for the substitutions proposed
here), functions can be synthesized with at most n NOT gates, n2 CNOT gates, and
9 · n · 2n + o(n · 2n ) two-controlled Toffoli gates (according to [SPMH03]). A tighter
upper bound of n NOT gates, 2 · n2 + o(n · 2n ) CNOT gates, and 3 · n · 2n + o(n · 2n )
two-controlled Toffoli gates has been proved in [MDM07]. In [PMH08] it has been
shown that linear reversible functions are synthesizable with CNOT gates only.
Moreover, their algorithm never needs more than Θ(n2 / log n) CNOT gates for any
linear function f with n variables.
38
Table 3.6 Subst. of BDD nodes with complement edge to reversible/quantum circuits
3 Synthesis of Reversible Logic
3.2 BDD-based Synthesis 39
Using the synthesis approach proposed in the last sections, reversible circuits
for a function f with a size dependent on the number of nodes in the BDD can
be constructed. More precisely, let f be a function with n primary inputs which
is represented by a BDD containing k nodes.2 Then, the resulting Toffoli circuit
consists of at most
• k + n circuit lines (since besides the input lines, for each node at most one addi-
tional line is added) and
• 3 · k gates (since for each node cascades of at most 3 gates are added according
to the substitutions of Table 3.4 and Fig. 3.3, respectively).
Asymptotically, the resulting reversible circuits are bounded by the BDD size.
Since for BDDs many theoretical results exist, using the proposed synthesis ap-
proach, these results can be transferred to reversible logic as well. In the following,
some results obtained by this observation are sketched.
• A BDD representing a single-output function has 2n nodes in the worst case.
Thus, each function can be realized in reversible logic with at most 3 · 2n gates
(where at most 2 · 2n CNOTs and 2 · 2n Toffoli gates are needed).
• A BDD representing a symmetric function has n·(n+1) 2 nodes in the worst case.
Thus, each symmetric function can be realized in reversible logic with a quadratic
number of gates (more precisely, a quadratic number of CNOTs and a quadratic
number of Toffoli gates are needed).
• A BDD representing specific functions, like AND, OR, or XOR has a linear size.
Thus, there exists a reversible circuit realizing these functions in linear size as
well.
• A BDD representing an n-bit adder has linear size. Thus, there exists a reversible
circuit realizing addition in linear size as well.
Further results (e.g. tighter upper bounds for general functions as well as for
respective function classes) are also known (see e.g. [Weg00, LL92]). Moreover,
in a similar way bounds for quantum circuits can be obtained. However, a detailed
analysis of the theoretical results that can be obtained by the BDD-based synthesis
is left for future work.
The BDD-based synthesis method together with the suggested improvements has
been implemented in C++ on top of the BDD package CUDD [Som01]. In this
section, first a case study is given evaluating the effect of the respective BDD opti-
mization techniques on the resulting reversible or quantum circuits. Afterwards, the
proposed approach is compared against two previously proposed synthesis methods.
To investigate the effect of the respective BDD optimization techniques the pro-
posed synthesis approach has been applied to the benchmarks with the respective
techniques enabled or disabled. In the following, for each optimization technique
(i.e. shared nodes, complement edges, and reordering) the respective results are pre-
sented and discussed.
Complement Edges Complement edges are supported by the CUDD package and
can easily be disabled and enabled. For comparison, circuits from both, BDDs with
and BDDs without complement edges (denoted by WITH C OMPL . E DGES and W / O
C OMPL . E DGES, respectively), are synthesized. In the latter case, the substitutions
shown in Table 3.6 are applied whenever a successor is connected by a complement
edge. Shared nodes are also applied, since they make complement edges more ben-
eficial. The results are given in Table 3.8.3 The columns are labeled as described
above for Table 3.7.
Even if the cascades representing nodes with complement edges are larger in
some cases (see Sect. 3.2.2), improvements in the circuit sizes can be observed (see
3 Compared to Table 3.7, also benchmarks are considered for which no result could be determined
R EV L IB F UNCTIONS
decod24_10 2/4 7 7 21 <0.01 7 7 21 <0.01
4mod5_8 4/1 9 13 36 <0.01 9 13 36 <0.01
mini-alu_84 4/2 12 21 57 <0.01 11 20 52 <0.01
alu_9 5/1 15 30 73 <0.01 14 29 72 <0.01
rd53_68 5/3 31 85 212 <0.01 20 49 130 <0.01
hwb5_13 5/5 36 105 277 <0.01 32 91 238 <0.01
sym6_63 6/1 23 57 126 0.01 17 34 83 <0.01
hwb6_14 6/6 68 239 618 <0.01 53 167 437 <0.01
rd73_69 7/3 86 301 730 <0.01 38 105 272 <0.01
ham7_29 7/7 75 231 595 <0.01 36 88 224 <0.01
hwb7_15 7/7 136 526 1353 <0.01 84 284 744 <0.01
rd84_70 8/4 194 679 1650 0.01 52 140 373 <0.01
hwb8_64 8/8 277 1132 2903 0.02 129 456 1195 <0.01
sym9_71 9/1 104 325 724 <0.01 35 79 201 <0.01
e.g. rd84_70, 9sym, or cordic). But in particular for the LGSynth functions, some-
times better circuits result, when complement edges are disabled (see e.g. spla).
Here, the larger cascades obviously cannot be compensated by complement edge
optimization. In contrast, for quantum circuits in nearly all cases better realizations
are obtained with complement edges enabled. A reason for that is that the quan-
tum cascades for nodes with complement edges have the same size as the respective
cascades for nodes without complement edges in nearly all cases (see Table 3.4,
Fig. 3.3, and Table 3.6, respectively). Thus, the advantage of complement edges
(namely the possibility to create smaller BDDs) can be fully exploited without the
drawback that the respective gate substitutions become larger.
Reordering of BDDs To evaluate the effect of reordering the BDD on the result-
ing circuit sizes, three techniques are considered: (1) An order given by the occur-
rences of the primary inputs in the function to be synthesized (denoted by O RIG -
INAL ), (2) an optimized order achieved by sifting [Rud93] (denoted by S IFTING ),
and (3) an exact order [DDG00] which ensures the BDD to be minimal (denoted
by E XACT). Again, all created BDDs exploit shared nodes. Furthermore, comple-
ment edges are enabled in this evaluation. After applying the synthesis approach,
42 3 Synthesis of Reversible Logic
R EV L IB F UNCTIONS
decod24_10 2/4 7 7 21 <0.01 6 11 23 <0.01
4mod5_8 4/1 9 13 36 <0.01 8 16 37 <0.01
mini-alu_84 4/2 11 20 52 <0.01 10 22 49 <0.01
alu_9 5/1 14 29 72 <0.01 11 25 53 <0.01
rd53_68 5/3 20 49 130 <0.01 13 34 75 <0.01
hwb5_13 5/5 32 91 238 <0.01 27 85 201 <0.01
sym6_63 6/1 17 34 83 <0.01 14 29 69 <0.01
hwb6_14 6/6 53 167 437 <0.01 46 157 377 <0.01
rd73_69 7/3 38 105 272 <0.01 25 73 162 <0.01
ham7_29 7/7 36 88 224 <0.01 18 50 82 <0.01
hwb7_15 7/7 84 284 744 <0.01 74 276 665 <0.01
rd84_70 8/4 52 140 373 <0.01 34 104 229 <0.01
hwb8_64 8/8 129 456 1195 <0.01 116 442 1067 <0.01
sym9_71 9/1 35 79 201 <0.01 27 62 153 <0.01
circuit sizes as summarized in Table 3.9 result. Here again, the columns are labeled
as described above.
The results show that the order has a significant effect on the circuit size. In par-
ticular for the LGSynth functions, the best results are achieved with the exact order.
But as a drawback, this requires a longer run-time. Besides that, also in this evalua-
tion, examples can be found, showing that optimization for BDDs not always leads
to smaller circuits. Altogether, particularly for larger functions reordering is benefi-
cial. In most of the cases it is thereby sufficient to perform sifting instead of exact
reordering, since this lead to results of similar quality but without a notable increase
in run-time. For the following evaluations, BDD-based synthesis with shared nodes,
complement edges, and sifting has been applied.
3.2 BDD-based Synthesis 43
R EV L IB F UNCTIONS
decod24_10 2/4 6 11 23 <0.01 6 11 23 <0.01 6 11 23 <0.01
4mod5_8 4/1 8 16 37 <0.01 7 8 18 <0.01 7 8 18 <0.01
mini-alu_84 4/2 10 22 49 <0.01 10 20 43 <0.01 10 20 43 <0.01
alu_9 5/1 11 25 53 <0.01 7 9 22 <0.01 7 9 22 <0.01
rd53_68 5/3 13 34 75 <0.01 13 34 75 <0.01 13 34 75 <0.01
hwb5_13 5/5 27 85 201 <0.01 28 88 205 0.01 28 88 205 0.01
sym6_63 6/1 14 29 69 <0.01 14 29 69 <0.01 14 29 69 <0.01
hwb6_14 6/6 46 157 377 <0.01 46 159 375 <0.01 46 159 375 0.01
rd73_69 7/3 25 73 162 <0.01 25 73 162 <0.01 25 73 162 <0.01
ham7_29 7/7 18 50 82 <0.01 21 61 107 <0.01 21 61 107 0.01
hwb7_15 7/7 74 276 665 <0.01 73 281 653 <0.01 76 278 658 0.01
rd84_70 8/4 34 104 229 <0.01 34 104 229 <0.01 34 104 229 <0.01
hwb8_64 8/8 116 442 1067 <0.01 112 449 1047 <0.01 114 440 1051 0.03
sym9_71 9/1 27 62 153 <0.01 27 62 153 <0.01 27 62 153 <0.01
R EV L IB F UNCTIONS
decod24_10 2/4 4 11 55 497.51 7 19 <0.01 6 11 27 23 <0.01 −32 4
4mod5_8 4/1 5 9 25 0.86 5 9 <0.01 7 8 24 18 <0.01 −7 9
mini-alu_84 4/2 5 21 173 495.61 36 248 <0.01 10 20 60 43 <0.01 −130 −205
alu_9 5/1 5 9 49 122.48 9 25 0.01 7 9 29 22 0.01 −27 −3
rd53_68 5/3 7 – – >500.00 221 2646 0.14 13 34 98 75 <0.01 – −2571
hwb5_13 5/5 5 – – >500.00 42 214 0.01 28 88 276 205 0.01 – −9
sym6_63 6/1 7 36 777 485.47 15 119 0.13 14 29 93 69 <0.01 −708 −50
mod5adder_66 6/6 6 37 529 494.46 35 151 0.06 32 96 292 213 <0.01 −316 62
hwb6_14 6/6 6 – – >500.00 100 740 0.04 46 159 507 375 <0.01 – −365
rd73_69 7/3 9 – – >500.00 1344 20779 1.93 13 73 217 162 <0.01 – −20617
hwb7_15 7/7 7 – – >500.00 375 3378 0.18 73 281 909 653 <0.01 – −2725
ham7_29 7/7 7 – – >500.00 26 90 0.09 21 61 141 107 <0.01 – 17
rd84_70 8/4 11 – – >500.00 124 8738 9.92 34 104 304 229 <0.01 – −8509
hwb8_64 8/8 8 – – >500.00 229 3846 0.90 112 449 1461 1047 0.01 – −2799
sym9_71 9/1 10 – – >500.00 27 201 3.98 27 62 206 153 <0.01 – −48
hwb9_65 9/9 9 – – >500.00 2021 23311 1.45 170 699 2275 1620 0.02 – −21691
cycle10_2_61 12/12 12 26 1435 491.87 41 1837 26.17 39 78 202 164 0.09 −1271 −1673
plus63mod4096_79 12/12 12 – – >500.00 24 4873 17.74 23 49 89 79 0.08 – −4794
plus127mod8192_78 13/13 13 – – >500.00 25 9131 57.16 25 54 98 86 0.21 – −9045
plus63mod8192_80 13/13 13 – – >500.00 28 9183 57.19 25 53 97 87 0.20 – −9096
ham15_30 15/15 15 – – >500.00 – – >500.00 45 153 309 246 1.25 – –
3 Synthesis of Reversible Logic
Table 3.10 (Continued)
F UNCTION PI/PO P REVIOUS A PPROACHES BDD- BASED S YNTHESIS Δ QC Δ QC
RMRLS [GAJ06] RMS [MDM07] (RMRLS) (RMS)
3.2 BDD-based Synthesis
the quantum cost (QC), and the synthesis time (T IME) for the respective approaches
(i.e. RMRLS, RMS, and the BDD- BASED S YNTHESIS) are reported.4 For BDD-
based synthesis, additionally the resulting number of gates (and thus the quantum
cost) when directly synthesizing quantum gate circuits is given in the column de-
noted by dQua . Furthermore, a “∼” denotes that an embedding needed by the pre-
vious synthesis approaches could not be created within the given timeout. Finally,
the last two columns (Δ QC) give the absolute difference of the quantum cost for
the resulting circuits obtained by the BDD-based quantum circuit synthesis and the
RMRLS and RMS approach, respectively.
As a first result, one can conclude that for large functions to be synthesized it
is not always feasible to create a reversible embedding needed by the previous ap-
proaches. Moreover, even if this is possible, both RMRLS and RMS need a signif-
icant amount of run-time to synthesize a circuit from the embedding. As a conse-
quence, for most of the LGSynth benchmarks no result can be generated within the
given timeout. In contrast, the BDD-based approach is able to synthesize circuits
for all given functions within a few CPU seconds.
Furthermore, although BDD-based synthesis often leads to larger circuits with re-
spect to gate count and number of lines, the resulting quantum cost are significantly
lower in most of the cases (except for decod24_10, 4mod5_8, mod5adder_66, and
ham7_29). As an example, for plus63mod4096_79 the BDD- BASED S YNTHESIS
synthesizes a circuit with twice the number of lines but with two orders of magni-
tude fewer quantum cost in comparison to RMS. In the best cases (e.g. hwb9_65)
a reduction of several thousands in quantum cost is achieved. Note that quantum
costs are more important than gate count, since they consider gates with more con-
trol lines to be more costly. Thus, even if the total number of circuit lines that have
been added by the BDD- BASED S YNTHESIS is higher than by previous approaches,
significant improvements in the quantum cost are obtained. Furthermore, reversible
logic for functions with more than 100 variables can be automatically synthesized.
How to reduce the number of circuit lines is addressed later in Sect. 6.2.
Besides synthesis of reversible functions, also how to realize more complex cir-
cuits has to be addressed in order to provide an efficient design flow. Thus, synthe-
sis of reversible logic has to reach a level which allows the description of circuits
at higher abstractions. For this purpose, programming languages can be exploited.
Considering traditional synthesis, approaches using languages like VHDL [LSU89],
SystemC [GLMS02], or SystemVerilog [SDF04] have been established to specify
and subsequently synthesize circuits. Even if first programming languages are also
available in the reversible domain (see e.g. [Abr05, PHW06, YG07]), so far they
4 T IME for BDD- BASED S YNTHESIS includes both, the time to build the BDD as well as to derive
only have been used to design reversible software. Similar approaches for reversible
circuit synthesis are still missing.
In this section, the programming language SyReC is proposed intended to spec-
ify and afterwards to automatically synthesize reversible logic. For this purpose,
Janus [YG07]—an existing language designed to specify reversible software—is
used as a basis and enriched by new concepts as well as operations aimed to specify
reversible circuits. A hierarchical approach is presented that automatically trans-
forms the respective statements and operations of the new programming language
into a reversible circuit. Experiments show that complex circuits can be efficiently
generated with the help of SyReC. Moreover, a comparison to the BDD-based syn-
thesis approach presented in the previous section shows the advantages of SyReC,
if more complex circuits instead of single functions should be synthesized.
The remainder of this section is structured as follows: The SyReC programming
language as well as the new concepts, operations, and restrictions applied for hard-
ware synthesis are introduced in Sect. 3.3.1. Section 3.3.2 describes the hierarchi-
cal synthesis approach and explains in detail how reversible circuits specified in
SyReC can be generated. Finally, experimental results and conclusions are given in
Sect. 3.3.3.
As mentioned above, Janus [YG07] is used as a basis for the programming lan-
guage SyReC to specify reversible systems to be synthesized as circuits. This sec-
tion briefly reviews the syntax of the Janus language. Afterwards, the new concepts
and operations added to address circuit synthesis are introduced.
Janus is a reversible language that is simple but yet powerful enough to design prac-
tical reversible software systems [YG07]. It provides fundamental constructs to de-
fine control and data operations while still preserving reversibility.
Figure 3.5 shows the syntax of Janus. Each Janus program (denoted by P) con-
sists of variable declarations (denoted by D) and procedure declarations. The vari-
ables have non-negative integer values and are denoted by strings. They can be
grouped as arrays. New variables are initially assigned to 0. Constants are denoted
by c. Each procedure consists of a name (id) and a sequence of statements (denoted
by S) including operations, reversible conditionals, reversible loops, as well as call
and uncall of procedures (lines 4 to 7 in Fig. 3.5). Variables within statements are
denoted by V .
In the following, it is distinguish between reversible assignment operations (de-
noted by ⊕) and (not necessarily reversible) binary operations (denoted by ). The
former ones assign values to a variable on the left-hand side. Therefore, the respec-
tive variable must not appear in the expression on the right-hand side. Furthermore,
48 3 Synthesis of Reversible Logic
only a restricted set of assignment operations exists, namely increase (+ =), de-
crease (− =), and bit-wise XOR (ˆ =), since they preserve the reversibility (i.e. it is
possible to compute these operations in both directions). In particular, the bit-wise
XOR is of interest because aˆ = b is equal to an assignment a = b if a is equal to 0.
In contrast, binary operations, i.e. arithmetic (+, ∗, /, %, ∗/), bit-wise (&, |, ˆ),
logical (&&, ||), and relational (<, >, =, ! =, <=, >=) operations, may not be re-
versible. Thus, they can only be used in right-hand expressions which preserve
(i.e. do not modify) the values of the respective inputs. In doing so, all computa-
tions remain reversible since the input values can be applied to revert any operation.
For example, to specify a multiplication (i.e. a ∗ b) in Janus, a new free variable c
must be introduced which is used to store the product (i.e. cˆ = a ∗ b is applied).
In comparison, to common (irreversible) programming languages this forbids state-
ments like a = a ∗ b. Having this as basis, Janus can be used to specify reversible
programs and execute them in a reversible manner (i.e. forward and backward).
In the following, the programming language SyReC for synthesis of reversible cir-
cuits is described. Janus is thereby used as a basis and enriched by further concepts
(e.g. declaring circuit signals of different bit-width) and operations (e.g. bit-access
and shifts). Besides that, some restrictions are applied (e.g. dynamic loops are for-
bidden in hardware). Incorporating all these aspects, a syntax of a programming
language for reversible circuit synthesis as depicted in Fig. 3.6 results. More pre-
cisely, the following extensions and restrictions have been applied:
• The declaration of variables has been extended so that the designer can declare
variables with different bit-widths (line 2).
• Arrays are not allowed.
• Operators to access single bits (x.N), a range of bits (x.N:N), as well as the size
(#V) of a variable, respectively, have been added (line 3 and line 4).
• Since loops must be completely unrolled during synthesis, the number of itera-
tions has to be available before compilation. That is, dynamic loops (defined by
expressions) are not allowed (line 7).
• Macros for the SWAP operation (<=>) (line 5) as well as for the for-loop state-
ment (line 8) have been added.5
• Further operations used in hardware design (e.g. shifts <) have been added
(line 10 and line 14).
Example 3.3 Figure 3.7 shows a simple Arithmetic Logic Unit (ALU) illustrating
the core concept of the resulting hardware programming language. The basic arith-
metic operations can be thereby applied directly. Furthermore, control variables can
be defined with a lower bit-width than data variables.
5 These extensions are not necessarily needed (i.e. they can also be expressed by the existing oper-
6 Figure 3.8(a) shows the notation for a single bit operation. For larger bit-widths the notation is
extended accordingly.
3.3 SyReC: A Reversible Hardware Language 51
Finally, the mapping for the decrease operation is left (e.g. a− = b). Here, also
the realization from Fig. 3.8(c) is applied, which is fed with a negated variable value.
Binary operations include operations that are not necessarily reversible so that its
inputs have to be preserved to allow a (reversible) computation in both directions.
To denote such operations, in the following the notation depicted in Fig. 3.9(a) is
used. Again, solid lines represent the variable(s) whose values are preserved (i.e. in
this case the input variables).
Synthesis of irreversible functions in reversible logic is not new so that for most
of the respective operations already a reversible circuit realization exists. Additional
52 3 Synthesis of Reversible Logic
lines with constant inputs are thereby applied to make an irreversible function re-
versible (see e.g. Sect. 3.1.1). As an example, Fig. 3.9(b) shows a reversible cascade
that realizes an AND operation. As can be seen, this requires one additional circuit
line with a constant input 0. Similar mappings exist for all other operations.
However, since binary operations can be applied together with reversible assign-
ment operations (e.g. cˆ = a&b), sometimes a more compact realization is possible.
More precisely, additional (constant) circuit lines can be saved (at least for some
operations), if the result of a binary operation is applied to a reversible assignment
operation. As an example, Fig. 3.9(c) shows the realization for cˆ = a&b where no
constant input is needed but the circuit line representing c is used instead. However,
such a “combination” is not possible for all operations. As an example, Fig. 3.9(d)
shows a two-bit addition whose result is applied to a bit-wise XOR, i.e. cˆ = a + b.
Here, removing the constant lines and directly applying the XOR operation on the
lines representing c would lead to a wrong result. This is because intermediate re-
sults are stored at the lines representing the sum. Since these values are reused later,
performing the XOR operation “in parallel” would destroy the result. Thus, to have
a combined realization of a bit-wise XOR and an addition, a concrete embedding
for this case must be generated. Since finding and synthesizing respective embed-
dings for all affected operations and combinations, respectively, is a non-trivial task,
a more detailed consideration of this aspect is left for future work. So far, constant
lines are applied to realize the desired functionality.
In this sense, most of the binary operations (in particular the bit-wise, logical,
and relational operations as well as the addition) are synthesized. Besides that, the
realization of the multiplication is of interest. A couple of possible ways are de-
scribed in [OWDD10]. Figure 3.9(e) briefly shows how multiplication is realized by
the proposed synthesis method. As can be seen, partial products are applied. Con-
sidering one of the factors a, each time a respective bit of this factor (denoted by ai )
is equal to 1, the respective partial product is added to the product. This allows to
reuse the increase realization introduced in the previous section.
The proposed synthesis approach for the programming language SyReC has been
implemented in C++. In this section, experimental results obtained by this ap-
proach are provided. In particular, the different realizations of conditional state-
ments are evaluated in more detail. Furthermore, the results obtained by the pro-
54 3 Synthesis of Reversible Logic
7A similar comparison to further work (e.g. [GAJ06, MDM07]) was not possible since due to
memory limitations the respective benchmarks cannot be represented in terms of truth tables which
is required by these approaches.
3.3 SyReC: A Reversible Hardware Language 55
avg16 8 136 12 754 1654 7832 0.01 s 12 754 1654 7832 0.01 s
avg16 16 272 20 1602 3462 16536 0.01 s 20 1602 3462 16536 0.01 s
avg16 32 544 36 3298 7078 33944 0.01 s 36 3298 7078 33944 0.01 s
In contrast to the heuristic approaches introduced in the last chapter for synthesis
of reversible logic, exact methods determine a minimal solution, i.e. a circuit with a
minimal number of gates or quantum cost, respectively. Ensuring minimality often
causes an enormous computational overhead and thus exact approaches are only
applicable to relatively small functions. Nevertheless, it is worth to consider exact
methods, since they
• allow finding smaller circuits than the currently best known realizations,
• allow the evaluation of the quality of heuristic approaches, and
• allow the computation of minimal circuits as building blocks for larger circuits.
For example, improving heuristic results by 10% is significant, if this leads to
optimal results, but marginal if the generated results are still factors away from
the optimum. Conclusions like this are only possible, if the optimum is available.
Another aspect is the computation of building blocks that can be reused to synthesize
larger designs. For example, the substitutions used in the last chapter for the BDD-
based synthesis have been generated using exact approaches.
However, only very little research has been done in exact synthesis of reversible
logic so far. A method based on a depth-first traversal with iterative deepening
that uses circuit equivalences to rewrite a limited set of gates has been presented
in [SPMH03]. The authors of [YSHP05] introduce an exact algorithm based on
group theory. But for both approaches, only results for functions with up to three
variables are reported. Furthermore, in [HSY+06] another exact synthesis method
based on a reachability analysis has been proposed which is geared towards quan-
tum gates. However, also here only functions with three and a couple of functions
with four variables can be handled in a significant amount of run-time.
This chapter proposes methods based on Boolean satisfiability (SAT) that allow
a faster exact synthesis and that is applicable for functions with up to six variables.
The general idea is as follows: The synthesis problem is formulated as a sequence
of decision problems. Then, each decision problem is encoded as a SAT instance
and checked for satisfiability using an off-the-shelf SAT solver. If the instance is
unsatisfiable, then no realization with d gates exists and a check for another d value
R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 57
DOI 10.1007/978-90-481-9579-4_4, © Springer Science+Business Media B.V. 2010
58 4 Exact Synthesis of Reversible Logic
is performed. Otherwise, the circuit can be obtained from the satisfying assignment.
Minimality is ensured by iteratively increasing d starting with d = 1.
In the following, the main flow and the respective SAT encodings for Tof-
foli circuit synthesis [GCDD07, GWDD09a] as well as for quantum logic synthe-
sis [GWDD08, GWDD09b] are introduced in detail in Sects. 4.1 and 4.2, respec-
tively. Since nowadays very powerful techniques for solving SAT instances exist
(see Sect. 2.3), already this enables efficient exact synthesis of reversible functions.
However, further improvements are possible, if (1) the problem is formulated and
solved on the SMT level [WGSD08], (2) additional knowledge provided by the ded-
icated solving engine SWORD is exploited [WG07, GWDD09a], or (3) quantified
Boolean satisfiability is used [WLDG08]. The respective encodings and methods are
described in Sect. 4.3. The last (and most efficient) method also has been applied
and evaluated to synthesize reversible circuits including Fredkin and Peres gates.
Finally, the chapter is concluded and future work is sketched in Sect. 4.4.
Example 4.1 Consider the reversible function in Fig. 4.1(a). For this function two
Toffoli circuits are shown in Fig. 4.1(b). By exhaustive enumeration it has been
proven that, even if there are realizations including d = 2 gates and d = 4 gates, no
realization with d = 3 gates exist.
Hence, if a realization with d gates has been found, minimality cannot be shown
by only proving that there is no realization with d − 1 gates. However, for Toffoli
circuits it is sufficient to prove that there are no realizations with d − 1 and d − 2
gates as the following lemma shows:
4.1 Main Flow 59
Proof Assuming that for a reversible function f a realization with d gates and a
smaller realization with d − r gates exist (r > 0). Then, as shown in Fig. 4.2, the
smaller realization can be extended by two additional NOT gates so that the resulting
circuit still realizes f . By cascading the respective NOT gates, it follows that there
are realizations with d − r + 2 · s gates as well (s > 0). Thus, if there is a realization
with d − r gates, there has to be at least one realization with d − 1 or with d − 2
gates.
If quantum gate circuits are considered this observation can be applied as well.
Moreover, if at least a CNOT gate, a V gate, or a V+ gate occurs, a somewhat tighter
extension is possible. Then, each of these gates can be “extended” as depicted in
Fig. 4.3. This leads to valid realizations with costs d + r for any r ∈ N. As a result,
it is sufficient to check for a realization with d − 1 gates to prove that d is minimal.
Thus, the minimal d can be approached by two methods: (1) Start with d = 1
and iteratively increment d until a realization is found or (2) determine a value
for d (e.g. by heuristics or bounds) and non-iteratively modify d until a minimal
realization (approved by Lemma 4.1) is found. In this context, non-iteratively means
that if there exists a circuit with d gates, then it is tried to find a better realization
with only d < d gates; otherwise, it is tried to find a circuit with d > d gates.
However, for the considered exact synthesis problem an iterative approach is cho-
sen due to the complexity in solving the respective problem instances for large val-
ues of d. To illustrate this, Table 4.1 shows the results (R ES) as well as the run-time
60 4 Exact Synthesis of Reversible Logic
1 UNSAT 0.23 – –
2 UNSAT 1.92 – –
3 UNSAT 16.68 – –
4 UNSAT 36.62 – –
5 UNSAT 194.24 UNSAT 194.24
6 UNSAT 1625.88 UNSAT 1625.88
7 SAT 218.56 SAT 218.56
(T IME, in CPU seconds) of the respective checks that have been performed to syn-
thesize an optimal Toffoli circuit for the function mod5d1.1 The minimal circuit for
this function includes d = 7 gates. Thus, using the iterative approach seven checks
are performed in total. In contrast, assuming the best case for the non-iterative ap-
proach (i.e. the minimal depth d = 7 is determined just at the beginning and addi-
tionally the two checks for d = 6 and d = 5 are performed) only three checks are
necessary. However, since the run-time needed for the first checks of the iterative
approach are small, the total run-time for both approaches differs only slightly (less
than 3%). Hence, the non-iterative approach for reaching the minimal d is not feasi-
ble in general. This particularly holds, since this approach naturally requires checks
with a d greater than the minimal d (that obviously are harder).
As a result, an iterative approach as shown in Fig. 4.4 is used for exact synthe-
sis of reversible logic. The input is the truth table of the reversible function f to
1 Therefore, the SAT-based encoding described in the next section has been used. However, the
same behavior was observed for other encodings and functions, respectively.
4.2 SAT-based Exact Synthesis 61
be synthesized. The algorithm tries to find a circuit representation for f with one
gate only, i.e. d is initialized to 1 and a respective SAT instance is created. If no
realization with d gates exists, d is incremented. This procedure is repeated until a
realization is found.
The respective checks are thereby performed by
1. encoding the synthesis problem as an instance of Boolean satisfiability inst
(line 6) and
2. checking the instance for satisfiability using an off-the-shelf solver (line 7).
If there exists a satisfying assignment for inst a circuit representing f has been
found. This circuit is extracted from the assignment of the encoding given by the
solver. If inst is unsatisfiable, it has been proven that no realization for f with d
gates exists. By increasing d iteratively from d = 1 minimality is ensured.
Using this as the main flow, the next sections introduce concrete encodings for
Toffoli and quantum circuit synthesis, respectively.
Having the main flow as a basis, the open question still is how to encode the decision
problem “Is there a circuit with exactly d gates that realizes the given reversible
function f ?” as a SAT instance. In this section, the concrete SAT formulation as well
as first results obtained with it are presented. Section 4.2.1 addresses Toffoli circuit
synthesis, while Sect. 4.2.2 is about quantum circuit synthesis, respectively. These
encodings allow an efficient handling of the embedding problem for irreversible
functions (see Sect. 3.1.1) which is considered in Sect. 4.2.3 in more detail. Finally,
experimental results are given in Sect. 4.2.4.
The synthesis problem for Toffoli circuits is encoded so that the resulting instance
is satisfiable iff a circuit with d gates realizing a given function f exists; other-
wise the instance must be unsatisfiable. To this end, Boolean variables (for brevity
denoted by vectors in the following) and constraints are used as described in the
following.
First, the vectors defining the type of a Toffoli gate at an arbitrary depth k are
introduced:2
2 The Toffoli gates in a circuit are enumerated from left to right (starting from 0). Furthermore, the
term depth is used to refer to the respective position of a Toffoli gate in this enumeration.
62 4 Exact Synthesis of Reversible Logic
lines of the Toffoli gate at depth k. More precisely, assigning clk = 1 with (1 ≤
l < n) means that line (t k + l) mod n becomes a control line of the Toffoli gate
at depth k.
Remark 4.1 In total there are n · 2n−1 different types of Toffoli gates for a reversible
function with n variables. This holds, since a Toffoli gate has exactly one target
line resulting in n − 1 lines that are left as possible control lines. Thus, there are
n lines possible for placing a target line and 2n−1 combinations for control lines,
respectively.
Example 4.2 Figure 4.5 shows all 3 · 23−1 = 12 possible types of Toffoli gates for
a circuit with n = 3 lines. For each gate its assignments to the vectors tk and ck
are also given. For example, the assignments tk = (01) and ck = (01) state that
line [01]2 = 1 is the target line. Furthermore, because c1 is assigned to 1, line (1 + 1)
mod 3 = 2 becomes a control line. In contrast, because c2 is assigned to 0, line
(1 + 2) mod 3 = 0 does not become a control line.
Furthermore variables representing the inputs and outputs as well as the internal
signals of the circuit to be synthesized are defined:
4.2 SAT-based Exact Synthesis 63
Example 4.3 Figure 4.6 shows the variables needed to formulate the synthesis prob-
lem for an (embedded) adder function3 with n = 4 variables and depth d = 4. The
first row gives the variables for the first truth table line, the second row the variables
for the second truth table line, and so on. Thus, for each of the 24 = 16 lines in
the truth table, n = 4 circuit lines with the respective vectors for input, output, and
internal variables are considered (i.e. overall 4 · 16 = 64 lines are considered). The
positions for the Toffoli gates to be synthesized are marked by dashed rectangles.
For each depth, all possible types of Toffoli gates can be defined by assigning the
respective values to tk and ck .
Using these variables, the synthesis problem for a reversible function f with d
Toffoli gates can be formulated as follows: Is there an assignment to all variables
of the vectors tk and ck such that for each line i, x0i is equal to the left side of the
truth table, while xdi is equal to the corresponding right side? This is encoded by the
conjunction of the following three constraints:
1. The input/output constraints set the input and output of the truth table given by
the function f to the respective variables x0i and xdi (see also left-hand and right-
hand side of Fig. 4.6), i.e.
n −1
2
[x0i ]2 = i ∧ [xdi ]2 = f (i).
i=0
n −1 d−1
2
xk+1
i = t (xki , tk , ck ).
i=0 k=0
The function t (xki , tk , ck ) covers the functionality of a Toffoli gate with target line
t k = [tk ]2 and the control lines defined by ck . As an example, consider tk = (01)
and ck = (100), i.e. with c3k = 1. This assignment states that the Toffoli gate at
depth k has line t k = [01]2 = 1 as target line and line
(t k + l) mod n
= (1 + 3) mod 4
=0
∧ xi1
k+1
= xi1
k ⊕ xk
i0
∧ xi2
k+1
= xi2
k
∧ xi3
k+1
= xi3
k
are added for each truth table line i of a function with n = 4 variables. That
means, the values of the circuit lines 0, 2, and 3 are passed through, while the
output value of line 1 becomes inverted, if line 0 is assigned to 1. Similar con-
straints are added for all remaining cases.
4.2 SAT-based Exact Synthesis 65
d−1
[tk ]2 < n.
k=0
For example, for a circuit consisting of n = 3 lines the target line is represented
by two variables tk = (t2 t1 ) as shown in Fig. 4.5. Here, the assignment tk = (11)
has to be excluded, since line [11]2 = 3 does not exist.
As a result, a formulation has been constructed which is satisfiable, if there is
a valid assignment to tk and ck so that for all truth table lines the desired input-
output mapping is achieved. Then, the concrete Toffoli gates can be obtained by
the assignments to tk and ck as depicted in Fig. 4.5. If there is no such assignment
(i.e. the instance is unsatisfiable), then it has been proven that no circuit representing
the function with d gates exists.
As a last step, the proposed encoding has to be transformed from bit-vector logic
into a Conjunctive Normal Form (CNF)—the standard input format for SAT solvers
(see Sect. 2.3.1). This is a well understood process that can be done in time and
space linear in the size of the original formulation [Tse68]. A possible way is to de-
fine methods for clause generation of simple logic functions like AND, OR, etc. and
extending this scheme for more complex logic like implications or comparisons.
Then, in particular the functional constraints can be mapped to CNF. The assign-
ments of the input/output constraints can be applied by using unit clauses. Finally,
the exclusion constraints can be expressed by explicitly enumerating all values that
are not allowed in terms of a blocking clause [McM02].
Having the formulation in CNF, the satisfiability of the instance (as well as the
satisfying assignments) can be efficiently determined.
values as follows:
yijk k
zij
0 0 0
0 1 V0
1 0 1
1 1 V1
66 4 Exact Synthesis of Reversible Logic
The number g of all gates types is determined as follows: Each CNOT gate,
V gate, and V+ gate has exactly one target line and one control line leading to
3n(n−1) possible gate types for a circuit with n lines. Additionally, n NOT gates are
possible (one at each line). Thus, in total 3n(n − 1) + n different types of quantum
gates exist.
Remark 4.2 If additionally double gates are considered (see Sect. 2.1.3), for a circuit
with n lines in total g = 7n(n − 1) + n different types of quantum gates have to be
considered. This holds, since in total four double gates exist (namely the ones shown
in Fig. 2.7 on p. 16), leading to 4n(n − 1) additional types.
Example 4.4 Figure 4.7 shows the variables needed to formulate the constraints for
an (embedded) adder function. In comparison to the variables needed for Toffoli
synthesis (see Example 4.3 or Fig. 4.6, respectively) the variables defining the type
of a gate at depth k and the variables representing the circuit line values have been
changed.
n −1 n−1
2
yij0 = i[j ] ∧ zij
0
= 0 ∧ yijd = f (i)[i] ∧ zdij = 0.
i=0 j =0
That is, each yij0 (yijd ) is assigned to 1 or 0 according to the j th position in the
truth table line i of f . Furthermore, each zij0 (zd ) is assigned to 0, since Boolean
ij
functions are synthesized. As an example, consider the left-hand and the right-
hand side of Fig. 4.7.
4.2 SAT-based Exact Synthesis 67
Fig. 4.7 SAT formulation for quantum circuit synthesis with n = 4 and d = 4
2. The functional constraints are modified so that the functionality of the new gate
library is represented by a new function q(yijk , zij
k , qk ), i.e.
n −1 n−1
2
d−1
yijk+1 zij
k+1
= q(yijk , zij
k
, qk ).
k=0 i=0 j =0
Therefore, a similar formulation as described in the last section for the Toffoli
gate library is possible.
3. And finally, illegal assignments to qk are now excluded by
d−1
[qk ]2 < g,
k=0
the quantum circuit can be obtained by the assignments to qk . Even for the multi-
valued encoding, this can be done efficiently for many practically relevant func-
tions. However, before the performance of the encodings (for both, Toffoli circuit
synthesis as well quantum circuit synthesis), is considered in detail in Sect. 4.2.4,
a beneficial modification for exact synthesis of (embedded) irreversible functions is
introduced in the following section.
are added. Then, the variables for don’t care conditions are left unspecified and
are—if the instance is satisfiable—assigned by the SAT solver.
The same is done for all constant inputs. But, since a constant input must have
the same assignment in all truth table lines, an additional constraint
0
x0c = x1c
0
= · · · = x(2
0
n−|c| −1)c
0 0
or y0c z0c = y1c
0 0
z01c = · · · = y(2
0 0
n−|c| −1)c z(2n−|c| −1)c
is added for each constant input c, respectively. This restricts the SAT solver to
0 (y 0 z0 ) with the same value for each truth table line.
assign all input variables xic ic ic
Furthermore, since the constant inputs are now modeled symbolically (the value of
each constant input is not fixed to 0 or 1) only 2n−|c| truth table lines have to be
considered (where |c| is the number of constants).
Example 4.5 Consider the incompletely embedded adder function shown in Ta-
ble 4.2. The adder needs one additional variable to become reversible leading to
a function with n = 4 variables, one constant input c, and two garbage outputs g1
4 In principle, also embeddings with an arbitrary number of garbage outputs and different output
permutations are possible. However, in the following only embeddings with minimal garbage and a
fix output order are considered. Chapter 5 provides a further consideration of different embeddings.
4.2 SAT-based Exact Synthesis 69
– 0 0 0 0 0 – –
– 0 0 1 0 1 – –
– 0 1 0 0 1 – –
– 0 1 1 1 0 – –
– 1 0 0 0 1 – –
– 1 0 1 1 0 – –
– 1 1 0 1 0 – –
– 1 1 1 1 1 – –
In summary, these modifications do not only simplify the SAT encoding (since
a smaller number of truth table lines is considered), but also reduce the number
of checks that have to be performed to find a minimal circuit. Normally, to en-
sure minimality both values for each constant input have to be considered. Thus,
for a function with one constant input two checks (one with c = 0 and one with
c = 1) have to be performed. Moreover, for functions with more than one constant
input an exponential number of combinations has to be checked (e.g. values from
{00, 01, 10, 11} for a function with two constant inputs). For all these combinations,
a single instance must be encoded and separately solved by the solver. In contrast,
using the proposed modifications a single instance to be checked is sufficient to syn-
thesize a minimal result. This leads to significant speed-ups as the experiments in
the next section show.
5 Note, that only a partial truth table is shown. Depending on the assignment to the constant input,
n
2n−1 = 22 truth table lines with don’t care outputs are added either above or below the shown truth
table lines. For more details see Sect. 3.1.1.
70 4 Exact Synthesis of Reversible Logic
The proposed approaches have been implemented in C++. To solve the resulting
instances, the SAT solver MiniSAT [ES04] has been used. This section provides ex-
perimental results for both, exact synthesis of quantum circuits and exact synthesis
of Toffoli circuits. More precisely, it is shown that exact synthesis can be applied
with up to six variables. Improvements in the run-time can be obtained if irreversible
functions containing constant inputs are considered. Furthermore, a comparison to
heuristic approaches confirm the need of exact synthesis methods for both, finding
smaller circuits than the currently best known realizations and evaluating the quality
of heuristic methods.
As benchmarks a wide range of functions from different domains has been used.
This includes reversible functions as well as embedded irreversible functions. All
benchmarks have been taken from RevLib [WGT+08]. The experiments have been
carried out on an AMD Athlon 3500+ with 1 GB of memory. This section starts with
an evaluation of the quantum circuit synthesis afterwards followed by an evaluation
of exact Toffoli circuit synthesis.
For exact synthesis of quantum circuits the results of three evaluations are presented.
First the modifications for irreversible functions as introduced in Sect. 4.2.3 are
studied in detail. Next, the effect of the application of double gates on the synthesis
results is observed. Finally, the presented approach is compared to a previously
introduced method for quantum circuit synthesis.
Half-adder (n = 3)
4 1.47 4 0.85
5 2.65 – –
R EVERSIBLE FUNCTIONS
3_17 3 10 1641.49 8 280.98
miller 3 8 15.49 6 11.60
fredkin 3 7 7.04 5 3.28
peres 3 4 0.33 4 1.21
toffoli 3 5 0.71 5 2.38
peres-double 3 6 11.32 6 175.86
toffoli-double 3 7 86.75 7 1121.68
graycode6 6 5 66.50 5 608.11
q4example 4 6 9.08 5 24.83
the optimal assignment. This can be observed for all functions in Table 4.3 ex-
cept decod24. It should be noted that the constraining of constant input variables
requires some computation time (i.e. run-times may be higher than those for solv-
ing the function with a fixed constant input assignment). However, this overhead
is easily compensated by the fact that only one instance needs to be solved. Since
the proposed modifications often lead to better results (with respect to run-time and
resulting circuit size), in the following, they are also applied in the remaining exper-
iments.
Effect of Double Gates In [HSY+06], double gates (as introduced in Sect. 2.1.3)
are assumed to have unit cost. However, other synthesis methods (e.g. [BBC+95,
MYDM05]) consider the cost of a double gate to be two, since they are composed
of two quantum gates. Hence, there are compelling reasons to consider synthesis
that rely only on (single) quantum gates only. As described above, the proposed
SAT-based formulation supports both, synthesis with quantum gates (denoted by
Q UA . G ATES) only or additionally with double gates (denoted by D BL . G ATES). In
one evaluation, circuits with double gates enabled and with double gates disabled
have been considered. Disabling double gates reduces the number of possible gates
at each depth from 7n(n − 1) + n to 3n(n − 1) + n (making the instance more
compact). The results are summarized in Table 4.4. In the first two columns the name
4.2 SAT-based Exact Synthesis 73
R EVERSIBLE FUNCTIONS
miller 318.29 34.53 >9.2
fredkin 78.02 10.96 >7.1
peres 35.18 4.43 >7.9
toffoli 122.52 8.45 >14.5
of the function and its number of variables are given. The next columns provide
the number of gates (d) and the run-time in CPU seconds (T IME) for both cases
(quantum gates only and additionally with double gates).
In general, it is expected that more choices of possible gates at each level will
increase the time to find a correct solution. This can clearly be seen for the bench-
mark functions where the inclusion of double gates offers no advantage (i.e. both
values for d are the same). For example, the run-time for graycode6 increases by
one order of magnitude when double gates are considered—even though the results
are identical with respect to the costs. On the other hand for some functions where
the inclusion of double gates leads to smaller circuits (e.g. 3_17), the run-time can
be reduced, since fewer instances have to be solved.
Toffoli-double the minimal quantum gate circuits have been found in [HSY+06].
Additionally, in case of q4-example the SAT-based approach synthesizes an optimal
quantum gate representation with costs 5 instead of the non-optimal circuit of size
6 obtained in [HSY+06].
Note, again all these benchmarks are carried out on a slower system than the one
used in [HSY+06]. For absolute run-times on a faster machine see the rightmost
column of Table 4.4.
R EVERSIBLE FUNCTIONS
mod5mils 5 5 13 [MDM05] 5 13 48.28 0 0
ham3 3 5 9 [GAJ06] 5 9 0.60 0 0
ex-1 3 4 8 [MDM05] 4 8 0.12 0 0
graycode3 3 2 2 [MDM05] 2 2 0.01 0 0
graycode4 4 3 3 [MDM05] 3 3 0.64 0 0
graycode5 5 4 4 [MDM05] 4 4 22.08 0 0
graycode6 6 5 5 [GAJ06] 5 5 583.14 0 0
3_17 3 6 14 [GAJ06] 6 14 0.43 0 0
mod5d1 5 8 24 [WGT+08] 7 11 2094.13 1 13
mod5d2 5 8 16 [MDM05] 8 20 1616.07 0 −4
6 Forsome functions no results have been reported before. In this case, the approach of [MDM05]
has been applied to generate a heuristic result.
76 4 Exact Synthesis of Reversible Logic
results for these benchmarks. However, exact synthesis additionally enables the re-
alization of significantly smaller circuits. For example, the circuit for 4gt5 can be
reduced by more than two third. In absolute numbers, up to 12 gates can be saved
for some functions. Moreover, the proposed approach also improves the quantum
cost for many functions (only in one case (for mod5d2) the quantum cost increase.7
In the best case the quantum costs are reduced by 92.
It can be concluded that using SAT-based synthesis as proposed, exact results
can be produced for functions with up to six variables. The comparison to heuristic
approaches confirm the need of exact methods for both, finding smaller circuits
than the currently best known realizations and evaluating the quality of heuristic
methods. However, synthesizing exact results still requires high computing times.
Thus, in the next section improvements are proposed that accelerate the synthesis
process.
7 This is because circuits are optimally synthesized with respect to number of gates not quantum
cost. In some (few) cases circuits with larger number of gates but with lower quantum cost are
possible. For results with respect to quantum gates (and therewith with respect to quantum cost)
see the discussion above.
4.3 Improved Exact Synthesis 77
have been introduced. As can be seen, a large part of the problem formulation con-
sists of bit-vector variables and bit-vector constraints, respectively. However, most
of this high level of abstraction is lost, when the formulation is encoded as a pure
Boolean formula and afterwards solved by a Boolean SAT solver. Furthermore, this
transformation requires a high amount of auxiliary variables leading to an additional
overhead. Thus, it is worth to consider alternative encodings.
The emerging area of SAT Modulo Theories (SMT) (see Sect. 2.3.2) provides new
solving engines that directly support bit-vector logic and thus allow an encoding
that avoids the conversion to the Boolean level. As a result, all bit-vector variables
and most of the bit-vector operations are preserved; hardly any auxiliary variables
are needed. Furthermore, the formulation at this higher level of abstraction allows
stronger implications. As the experiments in Sect. 4.3.3 show, already this simple
“replacement” allows significant improvements in the resulting synthesis times.
78 4 Exact Synthesis of Reversible Logic
The input of a SAT solver is a Boolean function in terms of clauses. The input of
an SMT solver is a description in bit-vector logic. Both solvers are optimized for
their particular problem representation. For example, common SAT solvers utilize
the two literal watching scheme to carry out implications, which exploits the special
structure of clauses [MMZ+01]. SMT solvers on the other hand, use e.g. canoniz-
ing [BDL98] and term-rewriting [BB09] to efficiently handle bit-vector constraints.
Furthermore, highly optimized heuristics have been developed to decide the assign-
ment of variables if no more implications are possible. Strategies employed are
based on statistical information, for example occurrences or activities of variables
[Mar99].
All these techniques work very well if CNF formulas or bit-vector logic are con-
sidered in general. But, the respective solvers are not able to take specific properties
of the problem into account. For example, promising problem-specific strategies for
the exact Toffoli circuits synthesis would be:
• The type of the Toffoli gates (represented by tk and ck ) near to the inputs should
be defined first, because the corresponding input variables are already assigned
by the truth table. This allows for early implications and helps to determine the
types of the remaining gates or to detect conflicts faster. Thus, tk and ck with
small k should be preferred in the decision procedure. Similarly, this observation
also holds for modules near to the outputs.
• If the assignment to an input line of a Toffoli gate is not equal to the assignment
to the corresponding output line of the same gate, this line has to be the target
line. This observation allows to imply the assignment to variables in tk .
• If the target line of a Toffoli gate is known, the values of all remaining lines can
be implied if there is an assignment at the corresponding input or output.
These specific strategies cannot be provided by a standard SAT or SMT solver.
Moreover, extensions of standard solvers in this direction (e.g. by modifications of
the heuristics) are not possible in general, because most of the problem-specific
information is lost when encoding the instance. SAT and SMT solvers just have a
clause database or constraint database, respectively. Thus, strategies like the ones
described above can only be exploited with a solver that is based on a problem-
specific representation.8
8 In principle, this problem can be prevented by introducing additional constraints to the problem
instance. But then, the encoding becomes inefficient due to a very large number of constraints.
4.3 Improved Exact Synthesis 79
To overcome the limitations discussed above, the solver framework SWORD is ap-
plied. While SAT solvers provide strategies optimized for clauses and SMT solvers
for bit-vector constraints, respectively, SWORD makes problem-specific informa-
tion available by using so called modules. These modules enable the implemen-
tation of dedicated heuristic as well as implication strategies, while still utilizing
sophisticated SAT techniques such as conflict analysis or learning. In the following,
the application of SWORD to Toffoli circuit synthesis is described. Section 2.3.2.3
gives a brief overview of the underlying solving techniques (starting on p. 24).
For Toffoli circuit synthesis, dedicated modules have been developed that incor-
porate the problem-specific strategies described above. More precisely, a concrete
Toffoli synthesis instance for the reversible function f to be synthesized with d
gates includes d modules in a cascade structure—one module for each depth k.
Each module has access to its relating variables tk , ck , xki , and xk+1
i . The functional-
ity of a Toffoli gate is defined by methods of the module, i.e. a concrete Toffoli gate
function is selected by assigning tk and ck . Then, each module realizes the decision
and implication strategies as described in the following.
Fig. 4.8 Propagate routine (1) for each (truth table line i)
for module at depth k (2) for each (circuit line j )
(3) k = x k+1 // input = output
if (xij ij
(4) imply(tk ); // use value of j
(5)
(6) for each (circuit line j )
(7) if (j == [tk ]2 ) continue;
(8) k or x k+1 );
imply(xij ij
(9)
(10) flipTargetLine = true;
(11) for each (clk ∈ ck )
(12) if (clk == 1 ∧ x k k == 0)
it +l mod n)
(13) flipTargetLine = false;
(14) break;
(15) if (!flipTargetLine)
(16) imply(x k+1 ); // use value of x k k
it k it
(17) else
(18) if (ck completely defined)
(19) imply(x k+1 k
k ); // use value of x k ⊕ 1
it it
in Sect. 4.3.3. Nevertheless, the applied formulation still has to consider all truth
table lines of f , i.e. the formulation is still of exponential size. How to overcome
this drawback is described in the next section.
So far, for the respective checks “Is there a circuit with exactly d gates that realizes
the given reversible function f ?”, several encodings on different levels of abstrac-
tions, i.e. on Boolean level, bit-vector level, or problem-specific level have been
introduced. In all variations the problem is encoded for each truth table line sepa-
rately. That is, the respective constraints representing the circuit to be synthesized
are not built only for one truth table line, but they are duplicated for the remaining
2n − 1 truth table lines. Thus, the instances grow exponentially with respect to the
number n of variables.
In this section, an alternative problem formulation based on Quantified Boolean
Formulas (QBF) (see Sect. 2.3.2) is introduced. QBF allows to encode the synthesis
problem in polynomial size, i.e. the circuit to be synthesized is encoded only once
and the specification of the considered function f is enforced by quantification. In
doing so, complexity is moved from the problem description to the solving engine.
In the following, the concrete method is described using a new formulation based
on a universal gate type definition. This does not only enable to synthesize Toffoli
circuits, but also reversible circuits consisting of Fredkin and Peres gates, respec-
tively. Finally, it is shown how the resulting formulation can be solved using QBF
solvers and Binary Decision Diagrams (BDDs).
where
• X = {x1 , . . . , xn } is the set of the inputs of the gate and
• Y = {y1 , . . . , ylog q } is the set of variables representing a binary encoding of a
natural number t, which defines the type gt of the gate (in the following called
gate select inputs).
According to the assignments to the gate select inputs Y , a universal gate U GT acts
either as a gate from the given set GT or as the identity gate.
Remark 4.3 The variables Y = {y1 , . . . , ylog q } are comparable to the variables tk
and ck used in the previous sections to define the type of a Toffoli gate. However,
since now Fredkin and Peres gates are additionally considered, tk and ck cannot be
applied any longer and thus are replaced by Y . Furthermore, the identity gate has
been added to the definition of a universal gate to handle the case where the set GT
does not exactly contain a power of two gate types. In this case, GT is extended by
identity gates to fill the gap. In doing so, exclusion constraints are not needed any
longer.
Figure 4.9 shows the resulting cascade structure of the function F d for d uni-
versal gates. Using this structure, any reversible circuit containing d gates can
be obtained by assigning the respective values to each of the gate select input
variables yij ∈ Yi (0 < j ≤ log q). In other words, if a circuit realization with
at most d gates for the reversible function f exists, there has to be at least
one assignment to all variables yij ∈ Yi such that F d is equal to f . More for-
mally, if f is synthesizable with at most d gates the quantified Boolean formula
∃y11 . . . ∃ydlog q ∀x1 . . . ∀xn (F d = f ) holds. This represents the new encoding of
the synthesis problem which can be solved either by a QBF solver or by BDDs as
described in the following.
4.3 Improved Exact Synthesis 83
4.3.2.2 Implementations
Based on the proposed QBF formulation for the synthesis problem, two approaches
can be applied to solve the formula: First, the problem is encoded as an instance
of quantified Boolean satisfiability, which is given to a QBF solver. Second, the
function F d = f is constructed as a BDD and thereafter the quantification is carried
out on the BDD. A solution exists if the final BDD is not the constant 0-function.
Moreover, all solutions can be extracted by traversing all paths to the 1-terminal.
For both approaches, the incremental nature of F d is exploited during the con-
struction of the formula. That is, first the formula F 0 = (x1 , . . . , xn ) is built for
depth d = 0. Then, for each iteration, the function F d is incrementally built by ap-
plying F d = U GT (U GT (. . . (U GT (F 0 , Y1 ), Y2 ) . . . , Yd−1 ), Yd ). Finally, the equation
to f is constrained.
The next two paragraphs describe the respective steps for both approaches in
more detail.
Using QBF Solvers To use a common QBF solver, the formula F d = f is trans-
formed into CNF, i.e. a representation that consists of Boolean variables and clauses.
Then, the resulting set of clauses represents a cascade of d universal gates which
has to meet the specification of f . The complete QBF instance is formed by adding
the respective existential and universal quantifiers followed by an existential quan-
tifier for the auxiliary variables added during the transformation into CNF (de-
noted as A in the following). Overall, this leads to the following quantification:
∃y11 . . . ∃ydlog q ∀x1 . . . ∀xn ∃A. Together with the CNF, afterwards this is passed to
a QBF solver. In the case that the instance is satisfiable, a circuit realization of the
function can be obtained from the assignments to the variables yij ∈ Yi . Otherwise
it has been proven that no circuit realizing f with d gates exists.
Using BDDs As shown later in the experiments, the performance of the QBF
solver approach is poor. Therefore, BDDs are used as an alternative. That is, in-
stead of building a quantified CNF and solving this instance with a QBF solver, the
synthesis is carried out on a BDD representation.
84 4 Exact Synthesis of Reversible Logic
To this end, the BDD for the formula F d = f is build. This can be done ef-
ficiently using a state-of-the-art BDD package (e.g. CUDD [Som01]). The fixed
variable order X, Y has thereby been applied. The alternative order Y, X leads to
a blow-up of the BDD representation, since in this case, the BDD for F d would
already represent all possible functions in n variables which are synthesizable with
at most d gates. During the construction, isomorphic functions that result from the
n output functions for F d are shared.
After the computation of the equality, the resulting BDD is a single output func-
tion. For this BDD, the universal quantification of all xi variables is carried out.
This is a standard operation available in a BDD package. The idea is to compute the
product of the positive co-factor and the negative co-factor for a universally quan-
tified variable, i.e. ∀x h(. . . , x, . . .) = h(. . . , 0, . . .) · h(. . . , 1, . . .). If the final BDD
consists of the 0-terminal, then no reversible circuit with the given depth d exists
for the function f . Otherwise, there is at least one path to the 1-terminal. Each of
those paths represents an assignment to all variables yij ∈ Yi and thus can be con-
verted into a concrete circuit realization. Since the BDD represents not only one
but all 1-paths, in fact all realizations with the given depth are found in one single
step. All solutions are of interest, since one can choose the best result with respect
to quantum cost which is discussed later in the experiments.
The proposed improvements for exact synthesis have been implemented in C++.
According to the respective encoding, the SMT solver MathSAT [BBC+05], the
solver framework SWORD [WFG+07] (see also Sect. 2.3.2.3), the QBF solver
sKizzo [Ben05], and the BDD package CUDD [Som01] have been used as solv-
ing engine, respectively. In this section, the described encodings are compared to
each other as well as to the SAT-based encoding with MiniSAT [ES04] as solver.
It is shown that higher levels of abstractions significantly improve the run-time
when performing exact synthesis. Moreover, using the quantified formulation, fur-
ther speed-ups can be documented and additionally the quality of the results can be
strengthened. As benchmarks, again functions from RevLib [WGT+08] have been
applied. All experiments have been carried out on an AMD Athlon 3500+ with
1 GB of memory. The timeout was set to 3000 CPU seconds.
The original SAT encoding for exact synthesis (denoted by SAT) has been lifted to
two higher levels of abstractions. First, instead of a Boolean formulation in CNF, the
problem has been encoded in bit-vector logic that can be handled by SMT solvers
(denoted by SMT). Second, problem-specific strategies developed within the solver
framework SWORD provide an alternative (denoted by SWORD).
4.3 Improved Exact Synthesis 85
Results obtained by both approaches are summarized in Table 4.8. The first col-
umn provides the name of the function. Column n denotes the number of variables
for each function, while in Column d the minimal number of Toffoli gates necessary
to synthesize the function is given. The following columns provide the run-time of
the respective synthesis approaches in CPU seconds (denoted by T IME). Further-
more, the improvements of the SMT approach and the SWORD approach are given
in the last two columns, i.e. the run-time of MiniSAT divided by the run-time of
MathSAT/SWORD (denoted by I MPR SAT ) and the run-time of MathSAT divided by
the run-time of the SWORD approach (denoted by I MPR SMT ), respectively.
The results clearly show that the chosen encoding is crucial for the resulting run-
times. For most of the functions a corresponding Toffoli circuit can be synthesized
faster by the SMT approach than by using the SAT encoding. Only in some cases
the SAT-based approach is slightly better. However, this only holds for functions
that can be synthesized in less than one second, e.g. peres or fredkin. Overall, im-
provements of up to three orders of magnitude are achieved.
Moreover, it is evident that the problem-specific approach outperforms the other
methods. In many cases, the run-times are further reduced by a factor of ap-
prox. 30—in the best case by a factor of over 170. Furthermore, using the problem-
specific approach, Toffoli circuits for the functions alu-v2 and alu-v3 are synthesized
within the given timeout. In comparison to the SAT synthesis approach, speed-ups
of up to four orders of magnitude are reported.
Run-time Comparison The run-times of both, the proposed QBF solver encoding
(denoted by QBF S OLVER) as well as the BDD formulation (denoted by BDD S),
is compared to the SAT-based and the SWORD-based approach (denoted by SAT
and SWORD, respectively). A similar set of functions as in the previous sections
was applied. Only some trivial functions (e.g. peres, fredkin) have been omitted. In
contrast, two further functions, i.e. hwb4 and 4_49 (also taken from [WGT+08]), are
additionally considered. The results are given in Table 4.9. The first columns show
the name of the function as well as the number of lines (n) and the minimal number
of Toffoli gates (d) of the resulting circuit, respectively. In the remaining columns,
the run-times in CPU seconds (denoted by T IME) and the improvements of the new
approaches with respect to the SAT solver (denoted by I MPR SAT ) and with respect to
SWORD (denoted by I MPR SW. ) are given, respectively. The improvement is thereby
obtained by the run-time of the SAT/SWORD approach divided by the run-time of
the QBF solver/the BDD approach.
86 4 Exact Synthesis of Reversible Logic
R EVERSIBLE FUNCTIONS
peres 3 2 0.01 0.03 0.33 <0.01 >1.00 >33.00
fredkin 3 3 0.03 0.12 0.25 <0.01 >3.00 >24.00
peres-double 3 4 2.35 0.36 6.53 0.01 235.00 36.00
miller 3 5 0.23 0.22 1.05 <0.01 >23.00 >22.00
mod5mils 5 5 48.28 3.81 12.67 0.08 603.50 47.63
ham3 3 5 0.60 0.29 2.07 0.01 60.00 29.00
ex-1 3 4 0.12 0.20 0.60 <0.01 >20.00 >12.00
graycode3 3 2 0.01 0.06 0.17 <0.01 >1.00 >6.00
graycode4 4 3 0.64 0.24 2.27 <0.01 >64.00 >24.00
graycode5 5 4 22.08 1.00 22.08 <0.01 >2208.00 >100
graycode6 6 5 583.14 3.25 179.43 0.12 4859.50 27.08
3_17 3 6 0.43 0.72 0.59 0.03 14.33 24.00
mod5d1 5 7 2094.13 135.36 15.47 11.21 186.80 12.07
mod5d2 5 8 1616.07 56.72 28.49 9.06 178.37 6.26
mini_alu 4 5 27.60 3.85 7.17 0.03 920.00 123.33
From the results, it is easy to see that utilizing QBF leads to significant improve-
ments for both, the QBF solver and the BDD approach, in comparison to the com-
mon SAT solving techniques. Only if additional knowledge is utilized, as done by
SWORD, the QBF solver method is outperformed. However, the BDD approach
for QBF leads to the smallest overall synthesis time for non-trivial functions. That
is, for some functions the run-time, indeed, is higher than for SWORD, but this
only holds for functions with an overall synthesis time of less than one second
(e.g. graycode6 and decod24-v0). For all other functions, better run-times are doc-
umented. In the best case (hwb4), an improvement of more than a factor of 100 is
achieved.
Quantum Costs of Resulting Circuits After the efficiency of the BDD approach
has been shown with respect to run-time, further experiments demonstrate the qual-
ity of the obtained results. As described in the preliminaries, quantum costs provide
a good measurement of the complexity of the resulting circuits. The quantum costs
thereby depend on the used Toffoli gates. Thus, it may be an advantage to determine
not only one, but more Toffoli circuits for a given function. Then, by checking the
resulting quantum costs for each of the realizations obtained the cheapest one with
respect to quantum costs can be selected.
Previous approaches for minimal Toffoli circuit synthesis determine only one
circuit in each run. In contrast, using BDDs as described in Sect. 4.3.2 leads to
88 4 Exact Synthesis of Reversible Logic
R EVERSIBLE FUNCTIONS
all possible circuits in parallel. The differences in the resulting quantum costs are
documented in Table 4.10. Column #S OL denotes the number of solutions found by
the BDD approach, while QC denotes the minimal as well as the maximal quantum
costs for the determined realizations.
Considering the quantum costs of the obtained Toffoli circuits lead to further sig-
nificant improvements. For example, circuits representing function 4_49 have quan-
tum cost of 32 in the best case, while in the worst case quantum cost of more than
70 are required. Thus, in contrast to previous algorithms, the BDD-based synthesis
is not only faster but also another quality criterion—the resulting quantum costs—is
applicable.
Synthesis with Extended Libraries Finally, the application of further gate types
to the BDD-based synthesis is shown. This is done by extending the universal gate
formula with further gates, i.e. Fredkin and Peres gates.
4.3 Improved Exact Synthesis 89
R EVERSIBLE FUNCTIONS
mod5mils 5 12 13–13
graycode6 5 1 5–5
3_17 6 7 14–14
mod5d1 7 1208 11–15
mod5d2 8 135 12–20
hwb4 11 264 23–39
4_49 12 374 32–72
The results are shown in Table 4.11. The respective depth (d), the run-time of the
synthesis (T IME), the number of solutions (#S OL), and the quantum costs (QC) are
listed. MCT+MCF denotes the results for a set of gates including multiple control
Toffoli and multiple control Fredkin gates, MCT+P denotes the results for the set
of gates including multiple control Toffoli and Peres gates, and MCT+MCF+P
denotes the results for the set of all three gate types.
As expected, extending the gate library leads to smaller realizations as for ex-
ample the results for hwb4 show. While the minimal MCT-circuit for this function
consists of eleven gates, it can be reduced by three more gates if additionally Peres
gates are used. Furthermore, improvements with respect to the number of gates can
be achieved for alu, 3_17, mod5d2, 4_49, rd32 and decod24, respectively.
However, with an increasing number of gates to be considered, the run-times
increase as well. This can be seen e.g. for function 4_49 or 4mod5. Only for the
functions where the extension of the gate library leads to smaller circuits, the run-
times sometimes decrease (e.g. for function alu with the MCT+MCF library), since
fewer iterations of the main flow have to be performed (see Sect. 4.1).
Table 4.11 Synthesis results using other gate libraries
90
Even if they are only applicable to small functions, exact synthesis methods are im-
portant in the context of evaluating heuristic methods, determining minimal building
blocks (e.g. for the BDD-based synthesis as introduced in Sect. 3.2), and other as-
pects as well. In this chapter, several approaches based on satisfiability techniques
have been introduced enabling exact synthesis for functions with up to six vari-
ables and leading to circuits with up to twelve gates. A comparison with the results
obtained by previous approaches showed: Smaller circuits than the currently best
known ones have been synthesized or, for the first time, the minimality of existing
circuits was approved.
Furthermore, it was shown that choosing an encoding for exact Toffoli cir-
cuit synthesis is crucial to the resulting run-times. Lifting the originally proposed
Boolean SAT encoding to the SMT- or a problem-specific level accelerates the syn-
thesis time by three or four orders of magnitudes, respectively. Complementary,
applying quantifiers together with BDDs, a further speed-up by more than a factor
of 100 can be observed in the best case.
By the approaches proposed in this chapter, the number of reversible gates as well
as the number of quantum gates (and therewith also quantum cost) have been consid-
ered. Depending on the addressed physical realization, further constraints (e.g. con-
current gates, adjacent gates, etc.) are important as well. Thus, exact synthesis with
respect to further cost criteria might be a topic for future work. Chapter 6 discusses
this aspect in detail and also proposes one respective exact approach for this aim.
The results of this chapter build the basis for future investigations. The QBF
encoding leads to the best results (in particular, with respect to run-time), since
it does not need the exponential duplication of the instance for each truth table
line. However, this encoding is still on the Boolean level, since a BDD is used as
underlying solving engine. Thus, lifting the quantified encoding to higher levels of
abstraction (as done for the Boolean SAT encoding) would be a promising task for
future work. Therefore, respective solving engines efficiently supporting quantifiers
must be available first.
Additionally, the encoding itself is improvable. So far, all possible gate type com-
binations are tried by the solving engine. But, many combinations are redundant
and can be ignored. Identifying easy to detect redundancies and exclude those from
the search space may accelerate the solving process. Also, special function classes
(e.g. symmetric functions) probably can be synthesized faster if the respective prop-
erties are fully exploited in the encoding. Furthermore, the application of advanced
solving techniques like incremental SAT (see e.g. [WKS01]) seems to be promis-
ing, since several iterations of very similar instances are sequentially solved in the
proposed synthesis approach.
Chapter 5
Embedding of Irreversible Functions
Quite often reversible logic should be synthesized for irreversible functions. Thus,
the problem of embedding is an important aspect. Partially, how to handle irre-
versible functions during synthesis has already been discussed in the previous chap-
ters (see e.g. Sects. 3.1.1 and 4.2.3). Additional lines are thereby introduced and the
resulting constant inputs, garbage outputs, and don’t care conditions are arbitrar-
ily assigned to concrete values. Further options exist how (i.e. in which order) to
arrange the outputs in the circuit to be synthesized. Overall, functions can be em-
bedded in different ways whereby the concrete don’t care assignments as well as
the chosen output arrangement may have a significant impact on the resulting cir-
cuit size. As an example, for BDD-based synthesis introduced in Sect. 3.2 different
output orders are applied to the building block-functions as they lead to better sub-
stitutions of the respective nodes. Since in the last chapters synthesis approaches (in
particular the transformation-based approach and the exact synthesis method) have
been described, now they can be used to evaluate the effect of different embeddings.
In this chapter, the different aspects of embedding mentioned above are investi-
gated in detail. First, strategies for the don’t care assignment [MDW09, MWD09]
are proposed. More precisely, a greedy approach, a method based on the Hungar-
ian algorithm, and an XOR-based strategy are introduced. Even if these strategies
address don’t care assignments of the outputs only, it can be shown that the chosen
method is crucial to the synthesis results.
Afterwards, the order of outputs in the function to be synthesized is considered.
Usually, each output is set to a fix position. But since in general, the output order is
irrelevant for a given reversible function f , a new synthesis paradigm [WGDD09]
is proposed that determines an equivalent circuit realization for f modulo the out-
put permutation. That is, the result of the synthesis is a circuit whose outputs have
been permuted. Therefore, distinct methods to efficiently determine “good” output
permutations are introduced. As a result, significantly smaller circuits (even smaller
than the ones previously obtained by the exact approaches) can be synthesized if
this new synthesis paradigm is applied.
In the following, the embedding problem together with the number of possibili-
ties is described in Sect. 5.1, which therewith builds the motivation for the remain-
ing sections. Afterwards, the approaches for don’t care determination (Sect. 5.2)
R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 93
DOI 10.1007/978-90-481-9579-4_5, © Springer Science+Business Media B.V. 2010
94 5 Embedding of Irreversible Functions
0 00 0 0 0 0 0 0 0 0 – – 0 0 0 0 0 0 0 0
0 01 0 1 0 0 0 1 0 1 – – 0 0 0 1 0 1 0 0
0 10 0 1 0 0 1 0 0 1 – – 0 0 1 0 0 1 0 1
0 11 1 0 0 0 1 1 1 0 – – 0 0 1 1 1 0 0 0
1 00 0 1 0 1 0 0 0 1 – – 0 1 0 0 0 1 1 0
1 01 1 0 0 1 0 1 1 0 – – 0 1 0 1 1 0 0 1
1 10 1 0 0 1 1 0 1 0 – – 0 1 1 0 1 0 1 0
1 11 1 1 0 1 1 1 1 1 – – 0 1 1 1 1 1 0 0
1 0 0 0 – – – – 1 0 0 0 0 0 0 1
1 0 0 1 – – – – 1 0 0 1 0 0 1 0
1 0 1 0 – – – – 1 0 1 0 0 0 1 1
1 0 1 1 – – – – 1 0 1 1 0 1 1 1
1 1 0 0 – – – – 1 1 0 0 1 0 1 1
1 1 0 1 – – – – 1 1 0 1 1 1 0 1
1 1 1 0 – – – – 1 1 1 0 1 1 1 0
1 1 1 1 – – – – 1 1 1 1 1 1 1 1
and output permutation (Sect. 5.3) are proposed and evaluated. At the end of this
chapter, all results are summarized and future work is sketched.
Example 5.1 Consider the adder function shown in Table 5.1(a). This function has
three inputs (the carry-in cin as well as the two summands x and y) and two outputs
(the carry-out cout and the sum). The function is irreversible, because the number
of inputs differs from the number of outputs. Since the output pattern 01 appears
three times (as does the output pattern 10), adding one additional output (leading
to the same number of input and outputs) cannot make the function reversible. In
5.1 The Embedding Problem 95
fact, log2 (3) = 2 additional outputs (and therewith one constant input) must be
added. This is shown in Table 5.1(b). But since this incompletely specified function
is not applicable for many synthesis approaches, afterwards the don’t cares must
be assigned. One possible, albeit naive, embedding is shown in Table 5.1(c). This
embedding was found by assigning the garbage outputs to the patterns 00, 01, and
10 in order for each of the output patterns in the top half of the table and then
completing the bottom half of the table using the remaining available output patterns
in numerical order.
Remark 5.1 Not every synthesis approach requires a completely specified reversible
function. For example, the SAT-based approach introduced in the last chapter can
also handle don’t cares (see Sect. 4.2.3). However, most of the other synthesis ap-
proaches (as e.g. [Ker04, GAJ06, MDM07] and the transformation-based method
described in Sect. 3.1.2) need a completely specified function. For these approaches,
a completely specified embedding is required.
In this section, methods for don’t care assignment are presented and evaluated, re-
spectively, that complete a reversible embedding of an irreversible function. It is
assumed that always the minimal number of outputs (i.e. log2 (μ)) is added. Fur-
thermore, all constant inputs are assigned to the value 0 and are always added as the
most significant inputs in the truth table. This leads to a significant computational
advantage as shown below and results in a circuit overhead of at most one NOT gate
per constant input.
5.2.1 Methods
The first method for assigning don’t cares is motivated by the basic operation of the
transformation-based synthesis algorithms (see Sect. 3.1.2). Here, gates are chosen
so that each input value of the truth table matches its respective output value (i.e. so
that the identity is achieved). Each line of the truth table is thereby sequentially
traversed. It is thus reasonable to conjecture that assigning the don’t cares so that
the Hamming distance of the output patterns to the corresponding input patterns is
as small as possible, should help to reduce the number of gates required. This firstly
leads to a simple greedy approach.
The truth table is traversed downwards starting at the first row. In each row, the
following two steps are performed:
1. For each distinct output assignment in the embedding, identify the target set of
rows of the table containing that pattern. Then, determine the set of output as-
signments which are found by assigning the don’t cares in all possible ways. The
candidates are arranged in ascending numerical order.
2. For each row in the target set in turn, choose the first remaining candidate assign-
ment with minimal Hamming distance to the input assignment for that row.
Example 5.2 Table 5.2(a) shows the embedding obtained by the greedy method for
the full adder. The circuits synthesized from this assignment as well as from the
naive assignment given in Table 5.1(c) are shown in Fig. 5.2(a) and Fig. 5.2(c),
5.2 Don’t Care Assignment 97
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 1 0 1 0 0 0 1 0 1 1 1
0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0
0 0 1 1 1 0 1 1 0 0 1 1 1 0 0 1
0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0
0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 1
0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0
0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
1 0 0 1 0 0 0 1 1 0 0 1 1 1 1 1
1 0 1 0 0 0 1 0 1 0 1 0 1 1 1 0
1 0 1 1 0 0 1 1 1 0 1 1 0 0 0 1
1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0
1 1 0 1 1 1 0 1 1 1 0 1 0 0 1 1
1 1 1 0 1 1 1 0 1 1 1 0 0 0 1 0
1 1 1 1 0 1 1 1 1 1 1 1 0 1 0 1
respectively.1 The greedy assignment method leads to a circuit with 7 gates and
quantum cost of 27, while with the naive embedding a circuit with 20 gates and
quantum cost of 44 results.
Furthermore, the Hamming distance can be applied to formulate the don’t care as-
signing problem as an instance of the Hungarian algorithm [HL05]. To this end,
let S be the set of truth table rows sharing a common output pattern in the irre-
versible function and let T be the set of possible assignments to the don’t cares
to complete those rows. |T | is equal to 2g where g is the number of garbage lines
added to permit the embedding of the irreversible function into a reversible one.
Then, the don’t care assignment problem is to associate each element of S with a
unique element from T . Let K(Si , Tj ) be the “cost” of associating the don’t care
assignment Tj with Si , for which the hamming distance is applied. More precisely,
K(Si , Tj ) is the Hamming distance between the completely specified truth table
output pattern and the corresponding input pattern when Si is completed using Tj .
This formulation can be expressed in tabular form with a row for each Si and a col-
umn for each Tj with each K(Si , Tj ) in the corresponding table entry. Assigning the
don’t cares to minimize the total Hamming distance is then a matter of choosing one
entry in each row such that those entries appear in unique columns and such that the
sum of the chosen entries is minimal. This is a standard assignment problem. The
Hungarian algorithm is a well-known method [HL05] for solving the assignment
problem in polynomial time and thus has been applied to solve this instance. The
only issue of note here is that storing the potentially very large assignment matrix
is avoided, since Hamming distance is easily computed as needed—in fact more
quickly than a matrix access.
Example 5.3 Applying the Hungarian algorithm to the considered adder function,
the same assignment as for the greedy method results. This may happen since both
approaches use the Hamming distance as cost metric. Nevertheless, the experiments
in Sect. 5.2.2 show that both assignment methods lead to notable differences for
other (mainly larger) functions.
The third method proposed for don’t care assignments is based on the observation
that for many functions (in particular for arithmetic) a good embedding of an irre-
versible function into a reversible one is based upon setting the don’t care outputs
to XOR combinations of the primary inputs. More precisely, the following steps are
performed:
1 The transformation-based synthesis method from Sect. 3.1.2 has been used to synthesize these
circuits.
5.2 Don’t Care Assignment 99
Example 5.4 Table 5.2(b) shows the embedding obtained by the XOR-based
method for the full adder. The circuit obtained from this assignment is shown in
Fig. 5.2(b) (also synthesized using the transformation-based synthesis method). The
XOR-based method yields a circuit with five gates and quantum costs of 13, which
is significantly smaller than the circuits obtained with the greedy/Hungarian and the
naive embedding, respectively. Overall, these circuits clearly show the importance
of a good don’t care assignment.
As noted above, all constant inputs take the value 0 and are always the most sig-
nificant inputs in the truth table. This means that the irreversible function is always
embedded in the first rows of the reversible truth table, while the remaining rows are
completely don’t care (see e.g. Table 5.1(c)). In particular, if the original function
has n primary inputs and, furthermore, c constant inputs are added, only the first 2n
truth table rows of the embedding are of interest. The remaining (2c − 1)2n rows
can be ignored.
Given this construction, a synthesis method that works row by row from the top of
the truth table (as e.g. the transformation-based synthesis approach from Sect. 3.1.2
and its derivates) can stop after transforming 2n rows. Because of this, it is not
necessary to complete a don’t care assignment beyond the (2n )th row.2
In this section, experimental results obtained with the described don’t care as-
signing approaches are documented. To this end, (irreversible) functions from
2 Note that when this simplification is employed, bidirectional synthesis methods cannot be applied,
because don’t cares occur in the latter truth table lines so that no definition of the inverse function
is possible.
100 5 Embedding of Irreversible Functions
RevLib [WGT+08] have been embedded using the proposed methods. Afterwards,
the resulting embeddings have been passed to (1) the transformation-based syn-
thesis from Sect. 3.1.2 (denoted by T RANSFORMATION - BASED A LGORITHM) or
(2) to an extended version combining transformation-based synthesis with a search-
based method as proposed in [MWD09] (denoted by C OMBINED S YNTHESIS A L -
GORITHM ), respectively. An AMD Athlon 3500+ with 1 GB of memory was used
for the experiments.
Table 5.3 presents the results for each of the synthesis approaches and don’t care
assignment methods, respectively. For the resulting circuits, the gate count (denoted
by d) and the quantum cost (denoted by QC) are shown. Furthermore, for each func-
tion, the best result with respect to quantum cost is highlighted in bold. Run-times
are not documented, since every circuit in Table 5.3 was found in less than one CPU
second.
The results show that the chosen embedding is crucial to the synthesis results.
For example, the quantum costs of the circuits representing function rd73_69 range
from 1112 to 184 using the combined synthesis approach. Thus, an improvement of
nearly one order of magnitude can be achieved only by modifying the assignment
of the don’t cares.
In the next section, the second aspect of embedding, namely permutation of out-
puts, is considered in detail, where similar results have been achieved.
decod24_10 4 2 0 7 11 7 11 7 11 7 11 7 11 7 11
rd32_19 4 1 2 5 13 5 13 5 13 5 17 5 17 5 17
4gt10_22 5 1 4 3 47 3 47 4 40 3 47 3 47 3 47
4gt11_23 5 1 4 1 5 1 5 4 12 1 5 1 5 1 5
5.3 Synthesis with Output Permutation
4gt12_24 5 1 4 3 55 3 55 6 62 3 55 3 55 3 55
4gt13_25 5 1 4 1 13 1 13 3 27 1 13 1 13 1 13
4gt4_20 5 1 4 6 58 6 58 9 65 7 79 7 79 7 79
4gt5_21 5 1 4 3 19 3 19 6 30 3 19 3 19 3 19
4mod5_8 5 1 4 7 19 7 19 5 9 9 25 9 25 9 25
4mod7_26 5 1 2 21 65 21 65 21 65 15 55 15 55 15 55
alu_9 5 0 4 9 65 16 72 28 224 12 64 16 68 32 252
mini-alu_84 5 1 3 22 110 26 114 25 125 22 98 28 108 22 110
one-two-three_27 5 2 2 9 33 9 33 9 33 9 33 9 33 9 33
decod24-enable_32 6 3 2 15 39 11 35 14 42 15 39 13 37 15 39
rd53_68 7 2 4 27 228 27 228 22 137 22 187 22 187 22 187
sym6_63 7 1 6 36 485 36 485 17 133 36 777 36 777 36 777
rd73_69 9 2 6 80 1112 80 1112 40 184 100 2187 100 2187 100 2187
sym9_71 10 1 9 76 1047 76 1047 51 573 210 4368 210 4368 210 4368
rd84_70 11 3 7 104 1823 104 1823 47 446 111 2100 111 2100 111 2100
101
102 5 Embedding of Irreversible Functions
0 0 0 0 0 0
0 0 1 0 1 0
0 1 0 1 0 0
0 1 1 1 1 1
1 0 0 0 0 1
1 0 1 0 1 1
1 1 0 1 0 1
1 1 1 1 1 0
Example 5.5 Consider the function specification shown in Table 5.4. The reversible
function maps (x1 , x2 , x3 ) to (x2 , x3 , x2 x3 ⊕ x1 ) = (f1 , f2 , f3 ). A minimal Toffoli
circuit for this function is shown in Fig. 5.3(a). This circuit consists of 6 gates.
Usually, the order of the outputs is irrelevant and can be swapped. As shown in
the following example, this can lead to a much more compact circuit.
Example 5.6 In Fig. 5.3(b) a Toffoli circuit is depicted which computes the same
reversible function than the Toffoli circuit shown in Fig. 5.3(a). But in contrast, the
three output functions have been reordered to another position in the output vector.
More precisely, the Toffoli circuit shown in Fig. 5.3(b) maps the input (x1 , x2 , x3 )
to the output (x2 x3 ⊕ x1 , x2 , x3 ) = (f3 , f1 , f2 ). This reduces the overall number of
gates from 6 to 1, i.e. 5 gates have been saved.
tended in such a way that different (or all) output permutations are considered. This
causes a significant increase in complexity, since in general all possible permuta-
tions have to be checked (resulting in n! different synthesis calls in total). This can
be slightly reduced, if a function containing garbage outputs should be synthesized.
n!
Then, only g! different permutations have to be considered, since permutations of
the garbage outputs can be ignored.
Example 5.7 Figure 5.4 shows all n! possible permutations for function with n = 3
variables and g = 2 garbage outputs (denoted by g1 and g2 ). Since the garbage
outputs are left unspecified, the permutations that only swap garbage outputs can be
2! = 3 permutations
skipped (i.e. the last three permutations of Fig. 5.4). Thus, only 3!
instead of all 3! = 6 permutations are considered.
Lemma 5.1 The number of gates in a reversible circuit obtained by common syn-
thesis approaches may be up to 3 · (n − 1) higher than the number of gates in a
circuit where synthesis with output permutation is applied (with n being the number
of variables).
Proof Let d be the minimal number of gates of a circuit obtained by enabling output
permutation during synthesis. To move one output line to the position given by the
function three Toffoli gates are required (see Fig. 5.5). At most n − 1 lines need to be
moved. It follows that the cost of the minimal circuit, where no output permutation
is allowed, is less than or equal to d + 3 · (n − 1).
104 5 Embedding of Irreversible Functions
Remark 5.2 Lemma 5.1 gives a best case improvement. Because of the heuristic
nature of most of the synthesis approaches, of course also circuits with a larger
number of gates may result.
This motivates the investigation of methods that exploit output permutation dur-
ing the synthesis. The next two sections show how this is realized using an exact
synthesis approach as well as a heuristic synthesis approach, respectively.
where
• inpi is a Boolean vector representing the inputs of the circuit to be synthesized
for truth table line i,
• outi is a Boolean vector representing the outputs of the circuit to be synthesized
for truth table line i, and
• Φ is a set of constraints representing the synthesis problem as described in
Sects. 4.2.1 and 4.3.1, respectively.
Using this vector, the SAT encoding is slightly extended: According to the as-
signments to p (set by the SAT solver), a value for p is determined, which selects
the current output permutation. Depending on this permutation the respective out-
put order is set during the search. More formally, the encoding of Definition 5.1 is
extended as follows:
n −1
2
Φ∧ ([inpi ]2 = i ∧ [outi ]2 = πp (f (i))).
i=0
The extended encoding of the synthesis problem for the function specified in
Table 5.4 is illustrated in Fig. 5.6(b).
If the solver finds a satisfying assignment for this SWOP instance, one can ob-
tain the circuit from the result as described in Chap. 4 and the best permutation is
provided by the assignment to p.
Overall, this extension allows exact SWOP with only one synthesis call in con-
n!
trast to g! separate ones. Furthermore, since the variables of p are an integral part of
the search space, the permutations are checked much more efficiently. Because of
modern SAT techniques (in particular conflict analysis [MS99]), during the search
process reasons for conflicts are learned. This learned information prevents the
solver from reentering non-solution search space, i.e. large parts of the search space
are pruned. In contrast, this information is not available when each permutation is
checked by separate calls of the solver. Thus, exact synthesis with output permuta-
tion is possible in feasible run-time when learning is exploited.
This section provides experimental results for SWOP. In total, four different as-
pects are studied: (1) the reduction of the complexity of SWOP when garbage out-
puts are considered, (2) the results of exact SWOP in comparison to the common
exact synthesis, (3) the results of heuristic SWOP in comparison to the common
transformation-based approach, and (4) the quality (with respect to the number of
gates) of the circuits synthesized by SWOP in comparison to the currently best
known realizations.
For exact synthesis, the SWORD approach introduced in Sect. 4.3.1 has been
used. The SWOP extension was implemented on top of this approach. As heuristic
approach, the transformation-based approach (including template matching as de-
scribed in [MDM05]) was applied. The respective benchmark functions have been
5.3 Synthesis with Output Permutation 107
taken from RevLib [WGT+08]. All experiments have been carried out on an AMD
Athlon 3500+ with 1 GB of main memory. All run-times are given in CPU seconds.
The timeout was set to 3600 CPU seconds.
In a first series of experiments the different complexities are compared which occur
if Toffoli circuits for functions containing garbage outputs are synthesized. Here,
n!
instead of n! permutations, only g! permutations are considered.
Table 5.5 shows the results of the exact SWOP approach with n! permutations
n!
and with g! permutations for functions containing garbage outputs. The first three
columns provide the name of the function, the number of circuit lines n, and the
number of garbage outputs g, respectively. The minimal number of gates of the ob-
tained Toffoli circuits is given in column d. Then, the run-times of SWOP with n!
n!
and with g! permutations are given, respectively (denoted by T IME). Furthermore,
n!
the improvement of the optimized SWOP (i.e. the synthesis with only g! permuta-
tions) in comparison to SWOP with all n! permutation is provided (i.e. run-time of
SWOP divided by run-time of OPT. SWOP).
As expected the reduction of permutations leads to better run-times for all func-
tions. Improvements of up to a factor of 45 can be achieved in the best case.
In this section, exact SWOP is compared to the exact approach from Sect. 4.3.1.
The results are shown in Table 5.6. Here again, the first column provides the name
108 5 Embedding of Irreversible Functions
of the function, while n and g denote the number of variables and the number of
garbage outputs, respectively. The next columns give the minimal number d of gates
determined by the two approaches and the corresponding run-times. The last column
shows information relating to the complexity, i.e. the run-time overhead when output
permutation is considered ( SWOP -Time n!
Syn-Time ) compared to the factor g! .
It can be seen that for many functions, SWOP found smaller circuits than the ones
generated by the previous exact synthesis approach. Thus, removing the restriction
for the output order leads to smaller circuits for many of the well known benchmark
functions.
As expected, the run-time for SWOP is higher in comparison to the run-time of
the pure exact synthesis. This is, because the search space is obviously larger due
to the number of output permutations that can be chosen. However, the increase is
n!
not as high as the worst case complexity ( g! ). This can be seen in the last column
of Table 5.6. For all benchmarks (except 4mod5 and 3_17) the run-time of SWOP
divided by the run-time of the previous synthesis approach is significantly smaller
n!
than g! . As explained, this is due to search space pruning, possible if the encoding
is extended so that all permutations are checked in parallel. Moreover, for some
benchmarks (e.g. maj4_2 or alu) the run-time of SWOP is even smaller than for a
single exact solution. This reduction is caused by the fact that smaller circuits are
found and thus the synthesis terminates earlier.
5.3 Synthesis with Output Permutation 109
decod24 6 5 1
gt5 3 1 2
3_17 6 5 1
4_49 16 14 2
aj-e13 40 28 12
hwb4 11 10 1
served for function hwb8. Here, 35 gates are saved in total when output permutation
is applied.
But, not only the improvements are of interest. Even a comparison of the best and
the worst permutation (shown in column d for ALL PERMS) give some interesting
insight. For example, consider the function hwb5. One output permutation results
in a circuit with 38 gates, while another permutation results in 62 gates. Since a
heuristic synthesis procedure is used, the results will most likely not be optimal. In
fact, according to Lemma 5.1 the best case improvement (i.e. the difference between
the best and the worst permutation) for hwb5 cannot be greater than 3 · (5 − 1) = 12
for minimal realizations—yet it is 24. This can be explained by the heuristic nature
of the approach that does not guarantee minimality.
Finally it is shown that sifting provides good results in a fraction of the run-time.
For most functions with more than six variables it is not feasible to minimize the cir-
cuit considering all permutations. However, sifting offers significant improvements
for these cases.
Finally, the quality (with respect to the number of gates) of some circuits synthesized
by SWOP is compared to the currently best known realizations obtained by com-
mon synthesis approaches. Table 5.8 shows a selection of functions with the gate
count of the currently best known circuit realization (denoted by B EST KNOWN).
The obtained gate count when output permutation is considered is given in col-
umn SWOP.
Synthesis with output permutation enables the realization of smaller circuits than
the currently best known ones. As an interesting example, the realizations of the
hwb4 function are observed in more detail. For the original function, a minimal
realization with 11 gates has been synthesized by the exact approach. Now, using
output permutation it is possible to synthesize a smaller realization with only 10
gates using a heuristic approach.
5.4 Summary and Future Work 111
The primary task of synthesis approaches is to generate circuits that realize the
desired functions. Secondarily, it should be ensured that the resulting circuits are as
compact as possible. However, the results obtained by synthesis approaches often
are sub-optimal. For example, the transformation-based synthesis method described
in Sect. 3.1.2 tends to produce circuits with very costly Toffoli gates (i.e. Toffoli
gates with a large number of control lines). Hierarchical synthesis approaches like
the BDD-based method from Sect. 3.2 or the SyReC-based approach from Sect. 3.3
lead to circuits that are not optimal with respect to the number of lines. Besides that,
technology specific constraints are often not considered by synthesis approaches.
Consequently, in common design flows optimization approaches are applied after
synthesis.
For reversible logic, only first attempts in optimization have been made in the
last years. In particular, reducing the quantum cost of given circuits has been con-
sidered. For example, template matching [IKY02, MDM05, MYDM05] is a search
method which looks for gate sequences that can be replaced by alternative cascades
of lower cost. For many circuits, substantial improvements are achieved using this
method. But, for large circuits or a high number of applied templates, respectively,
this approach suffers from high run-time. As a second example, the work in [ZM06]
showed how analyzing cross-point faults can identify redundant control connections
in reversible circuits. Removing such control lines reduces the cost of the circuit.
However, the computation needed to determine redundant control connections is
extremely high.
In this chapter, three new optimization approaches are introduced—each with
an own focus on a particular cost metric. The first one considers the reduction of
the well-established quantum cost (used in quantum circuits) and the transistor cost
(used in CMOS implementations), respectively. Therefore, a (small) number of ad-
ditional signal lines are added to the circuit that are used to “buffer” factors of con-
trol lines [MWD10]. Then, these factors can be reused by other gates in the circuit
which reduces the size of the gates and thus decreases the cost of the circuit. A fast
algorithm is presented along with results showing that even for a small number of
additional lines (even 1) a significant amount of cost can be saved.
R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 113
DOI 10.1007/978-90-481-9579-4_6, © Springer Science+Business Media B.V. 2010
114 6 Optimization
The second approach consideres the line count in a circuit. While adding a small
number of additional lines may be worth to reduce e.g. the quantum cost of a cir-
cuit, usually this number should be kept small (in particular for quantum circuits
where circuit lines or qubits, respectively, are a limited resource). But as already
mentioned above, in particular hierarchical approaches lead to a significant amount
of additional circuit lines. To reduce these lines, a post-synthesis approach is intro-
duced which re-synthesizes parts of the circuits so that lines with constant inputs
can be merged [WSD10]. Therewith, notable line reductions can be achieved.
Finally, an optimization method is introduced which takes a cost metric be-
yond the established quantum cost, transistor cost, and line count into account.
This is motivated by new physical realizations of reversible and quantum circuits
(see e.g. [DV02, Mas07, RO08]) leading to further limitations and restrictions. By
means of so called Nearest Neighbor Cost (NNC), it is shown how reversible cir-
cuits can be optimized with respect to the resulting new cost metrics [WSD09].
NNC is important if Linear Nearest Neighbor (LNN) architectures [FDH04, Kut06,
Mas07] are addressed as target technology. Here, only adjacent gates are allowed
(i.e. gates where control line and target line are on adjacent circuit lines). Since en-
suring adjacent gates in a naive way increases the quantum cost by about one order
of magnitude, optimization approaches are introduced that significantly reduce this
increase.
At the end of this chapter, all results are summarized and future work is sketched.
This section shows how circuit cost can be significantly reduced, if additional lines
are added to the circuit. Therefore, the general idea is firstly introduced in Sect. 6.1.1
before an algorithm exploiting the made observation is proposed in Sect. 6.1.2. Fi-
nally, the experimental results in Sect. 6.1.3 demonstrate the effect of the proposed
optimization approach.
Optimization approaches, such as the two noted above preserved the number of
lines in the circuit to be optimized. In contrast, this section shows how extending
the circuit by additional signal lines can improve the cost of a reversible circuit. The
additionally added lines are thereby denoted as helper lines in the following.
Having a helper line available, values can be “buffered” on this line so that they
can be later reused by other gates. In doing so, control lines can be saved as shown
by the following definition.
Definition 6.2 Let G be a reversible circuit and h be a helper line. Then, a gate
MCT(C, t) of G can be replaced by the sequence MCT(F, h), MCT(h ∪ Ĉ, t),
MCT(F, h) where C = F ∪ Ĉ, F ∩ Ĉ = ∅, and F = ∅. In the following this re-
placement is referred as factoring the initial gate, where F is a factor of MCT(C, t).
Remark 6.1 The terminology “factoring” and “factor” are natural, since partitioning
the control set C into F and Ĉ is essentially factoring the AND function for the
control lines. This factoring depends on the fact that 0 ⊕ x1 x2 . . . xk = x1 x2 . . . xk ,
i.e. that the result of a factor can be “buffered” by a constant line assigned to 0.
Applying Definition 6.2 to gates of a circuit, control lines can be removed. Since
the number of control lines defines the amount of the circuit cost, this may lead
to less costly circuits. However, this is only the case, if the total cost of the added
gates is less than the cost saved by factorizing the control lines. By substituting a
single gate only, this cannot happen for the transistor cost model but it can for the
quantum cost model. If more than one gate can be substituted, higher cost savings
are achieved (then also reductions for the transistor cost model are observed).
These ideas are illustrated in the following example.
Example 6.1 Consider the cascade of Toffoli gates depicted in Fig. 6.1(a). The gates
in this cascade have a common control factor F = {x0 , x1 }. Hence, the cost of this
circuit can be reduced as shown in Fig. 6.1(b) by adding an additional line h (at
the top of the circuit) as well as the Toffoli gates MCT(F, h) before and after the
cascade. This leads to additional quantum cost of 2 · 5 = 10. However, the factored
gates reuse the result of F leading to a reduction of one control line per gate (dashed
116 6 Optimization
rectangle in Fig. 6.1(b)). The removed control lines are shown as white circles. In
total this reduces the quantum cost from 104 to 59 and the transistor cost from 144
to 136, respectively.
Note that the added line is set to constant input 0. Furthermore, the right most
Toffoli gate operating on the added line is only needed if the line is to be used for
another factor.
Remark 6.2 In previous work, it has already been observed that more circuit lines
usually lead to lower (quantum) cost (see e.g. [BBC+95] or also the results of the
BDD-based approach discussed in Sect. 3.2.4). Moreover, the authors of [SPMH03]
even showed that some functions cannot be synthesized for certain gate libraries
unless one additional line is added. However, here these observations are exploited
for the first time by proposing a constructive post-synthesis optimization approach
for reversible logic.
6.1.2 Algorithm
Based on the ideas presented in the last section, now an algorithm is proposed that
adds one helper line and then employs a straightforward search procedure to use
that line for optimizing the circuit. More precisely, it is shown how to extract factors
from Toffoli and Fredkin gates in the circuit (the circuit may contain other types
of gates). The algorithm can be applied repeatedly to add more than one helper
line. It can also be iterated to add lines until adding a further line results in no cost
reduction. The transistor cost model or the quantum cost model can be used and in
fact the algorithm is readily adapted to any other gate-based cost model.
Consider a reversible circuit G consisting of the cascade of gates G = g0 g2 . . .
gd−1 . Let Ci denote the set of control lines for gi and let Ti denote the set of target
lines for gi . Then, four steps are performed in total.
1. Add a single helper line h.
2. Find the highest cost reducing factor across the circuit.
Therefore, the whole circuit is traversed (i.e. every gate gi with 0 ≤ i < d is
considered). If gi is a reversible gate gi (Ci , Ti ) and the helper line h is available
(i.e. it is not used by a previously applied factor at this point in the circuit), then
for every partitioning of Ci into {F, Ĉ} with F not empty:
a. Find the lowest j ≥ i so that j = d − 1 or (F ∩ (Tj +1 ∪ h)) = ∅, i.e. find
the next gate gj that manipulates one of the factors in F so that the value of
the helper line cannot be reused any longer. If the outputs of the circuit are
reached use gd−1 instead.
b. Determine the cost reduction that would result from applying this factor to
all applicable gates between gi and gj , including the cost of introducing two
instances of the factor gate MCT(F, h).
6.1 Adding Lines to Reduce Circuit Cost 117
c. Keep a record of the factor and the gate range that leads to the largest cost
reduction.
3. If no cost reducing factor is found in Step 2, then terminate.
4. Otherwise, apply the best factor found and repeat from Step 2 on the revised
circuit.
Note that as already mentioned above, the rightmost MCT(F, h) gate operating
on the helper line is only added if the helper line is going to be used for another
factor.
Example 6.2 Figure 6.2 shows the result of applying the algorithm to the circuit rep-
resenting the function rd53 (depicted in Fig. 6.2(a)) using the quantum cost metric.
The applied factors are highlighted by brackets at the bottom of Fig. 6.2(b) (with one
helper line) and Fig. 6.2(c) (with two helper lines), respectively. While the original
circuit has quantum cost of 128, that can be reduced with one helper line to 83 or
with two helper lines to 66. Adding a third helper line does not reduce the quantum
cost of this circuit further.
The order in which factors are considered typically has an effect. As a result, the
algorithm is applied to the circuit as given and then to the circuit found by reversing
the order of the original circuit. The better of the two final circuits is taken as the
result. Thus, the presented algorithm is a heuristic. But as the experiments in the
next section show, already this leads to good results.
This section provides experimental results for the proposed approach. To this end,
the method described above has been implemented in C and was applied to all
benchmarks from RevLib [WGT+08]. All experiments have been carried out on
an AMD Athlon 3500+ with 1 GB of memory.
Since some of the circuits in RevLib already have been optimized using various
approaches (e.g. extensive template post-synthesis optimization, output permutation
optimization, and other techniques), to provide an even basis all circuits have been
previously optimized. To this end, the approach described in [MDM05] together
with a basic set of 14 templates has been used.1 Afterwards, the proposed optimiza-
tion method has been applied to the resulting circuits. In doing so, all considered
circuits already went through an optimization and it can be shown that, indepen-
dently from this, further significant reductions can be achieved if helper lines and
the algorithm introduced above are used.2
1 This took over 10 hours of computation time. Furthermore, the application to the urf series of
circuits (which are quite large) has been aborted because they required too much run-time.
2 Of course, similar results are also achieved if the proposed approach is directly applied to non-
optimized circuits.
118 6 Optimization
Fig. 6.2 Reversible circuits for rd53 with one helper line and two helper lines
Table 6.1 summarizes the obtained results for one and two helper lines, respec-
tively. The first three columns give the name of the circuit (including the unique
identifier of the circuit realization as used in RevLib), the number of circuit lines (n),
as well as the number of gates of the initial (already optimized) circuit (d), re-
spectively. In the following columns, the obtained results for quantum cost and
transistor cost models are presented. Therefore, the proposed approach has been
applied with one and with two helper lines to both, the circuit as given as well
as in the reversed order. Afterwards, the better result has been chosen. The re-
Table 6.1 Experimental results for RevLib circuits
C IRCUIT Q UANTUM C OST M ODEL T RANSISTOR C OST M ODEL M AX
I NITIAL A DD 1 L INE A DD 2 L INES I NITIAL A DD 1 L INE A DD 2 L INES T IME
C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR . C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR .
n d C OST C OST C OST C OST
cycle17_3_112 20 48 6063 1877 69.04 1233 79.66 3272 2040 37.65 1768 45.97 1.63
cycle10_2_110 12 19 1202 420 65.06 290 75.87 800 568 29.00 512 36.00 0.17
plus63mod8192_164 13 492 38462 16902 56.06 11533 70.01 21840 18576 14.95 17560 19.60 4.19
plus63mod4096_163 12 429 25843 11804 54.32 9016 65.11 16200 13896 14.22 13336 17.68 3.30
urf2_153 8 638 17027 9570 43.80 7324 56.99 18240 16352 10.35 15712 13.86 3.28
6.1 Adding Lines to Reduce Circuit Cost
hwb8_118 8 633 16510 9333 43.47 7334 55.58 17648 15920 9.79 15384 12.83 3.20
hwb8_115 8 610 14679 8302 43.44 6478 55.87 15304 13912 9.10 13424 12.28 2.83
urf5_159 9 499 24523 13979 43.00 8894 63.73 18640 15632 16.14 13704 26.48 3.16
urf2_154 8 620 16152 9298 42.43 7164 55.65 17432 15768 9.55 15184 12.90 3.16
urf3_156 10 2732 128172 75644 40.98 53584 58.19 103680 91600 11.65 83160 19.79 20.16
hwb8_114 8 614 11941 7066 40.83 5869 50.85 14488 13640 5.85 13376 7.68 2.83
urf1_150 9 1517 48952 29120 40.51 21511 56.06 48616 43040 11.47 40824 16.03 9.03
urf3_157 10 2674 121716 72562 40.38 51833 57.41 100544 88928 11.55 81296 19.14 19.64
urf1_151 9 1487 45855 27740 39.50 20765 54.72 47024 41784 11.14 39760 15.45 8.73
hwb9_123 9 1959 22482 13665 39.22 10996 51.09 28696 27760 3.26 27376 4.60 6.30
hwb8_113 8 637 13460 8251 38.70 6763 49.75 16896 15912 5.82 15648 7.39 3.19
hwb9_120 9 1538 44684 27425 38.62 20574 53.96 46400 41496 10.57 39584 14.69 8.68
hwb9_122 9 1535 44635 27402 38.61 20548 53.96 46336 41456 10.53 39544 14.66 8.62
hwb8_117 8 748 7010 4346 38.00 3776 46.13 10520 10192 3.12 10112 3.88 2.28
hwb8_116 8 749 6976 4338 37.82 3781 45.80 10536 10216 3.04 10144 3.72 2.31
119
Table 6.1 (Continued)
120
plus127mod8192_162 13 910 61425 38710 36.98 28856 53.02 39984 35304 11.70 32648 18.35 8.36
hwb9_119 9 1544 35967 23269 35.30 18939 47.34 44344 41448 6.53 40376 8.95 8.70
hwb9_121 9 1541 35973 23280 35.28 18940 47.35 44304 41408 6.54 40336 8.96 8.65
mod8-10_177 5 14 84 55 34.52 44 47.62 144 144 0.00 144 0.00 0.05
hwb7_60 7 166 1754 1153 34.26 1010 42.42 3168 2960 6.57 2912 8.08 0.82
4gt4-v0_73 5 17 57 38 33.33 38 33.33 144 144 0.00 144 0.00 0.07
hwb7_59 7 289 3939 2632 33.18 2310 41.36 6800 6480 4.71 6400 5.88 1.33
hwb7_62 7 331 2608 1775 31.94 1624 37.73 4632 4512 2.59 4496 2.94 1.01
4gt12-v0_86 5 14 47 32 31.91 32 31.91 136 104 23.53 104 23.53 0.05
alu-v2_30 5 18 111 76 31.53 62 44.14 240 208 13.33 176 26.67 0.08
ham7_104 7 23 83 58 30.12 58 30.12 272 272 0.00 272 0.00 0.07
hwb6_56 6 126 1329 932 29.87 871 34.46 2456 2392 2.61 2392 2.61 0.51
alu-v2_31 5 13 45 32 28.89 32 28.89 144 128 11.11 128 11.11 0.06
hwb7_61 7 236 3261 2319 28.89 2105 35.45 5592 5432 2.86 5408 3.29 1.04
4gt4-v0_78 5 13 53 38 28.30 38 28.30 144 112 22.22 112 22.22 0.05
rd53_136 7 15 72 52 27.78 45 37.50 200 200 0.00 200 0.00 0.05
ham15_108 15 70 403 294 27.05 257 36.23 992 968 2.42 968 2.42 0.22
4gt12-v0_87 5 10 43 32 25.58 32 25.58 104 104 0.00 104 0.00 0.05
rd53_135 7 16 68 51 25.00 51 25.00 224 216 3.57 216 3.57 0.05
hwb6_57 6 65 433 326 24.71 299 30.95 976 928 4.92 928 4.92 0.28
6 Optimization
Table 6.1 (Continued)
C IRCUIT Q UANTUM C OST M ODEL T RANSISTOR C OST M ODEL M AX
I NITIAL A DD 1 L INE A DD 2 L INES I NITIAL A DD 1 L INE A DD 2 L INES T IME
C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR . C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR .
n d C OST C OST C OST C OST
rd53_130 7 30 230 190 17.39 174 24.35 344 344 0.00 344 0.00 0.08
mod5adder_127 6 21 121 100 17.36 100 17.36 216 216 0.00 216 0.00 0.06
one-two-three-v0_97 5 11 65 54 16.92 54 16.92 200 200 0.00 200 0.00 0.04
urf2_161 8 3250 20465 17235 15.78 16594 18.92 54416 53664 1.38 53664 1.38 12.03
ham15_109 15 109 206 176 14.56 169 17.96 1008 1008 0.00 1008 0.00 0.27
hwb5_53 5 55 286 247 13.64 242 15.38 824 824 0.00 824 0.00 0.18
rd53_132 7 27 114 99 13.16 99 13.16 176 176 0.00 176 0.00 0.06
4gt13_91 5 10 31 27 12.90 27 12.90 128 96 25.00 96 25.00 0.05
urf3_155 10 26468 132340 115731 12.55 113918 13.92 423488 414016 2.24 413656 2.32 182.19
hwb5_54 5 24 72 63 12.50 63 12.50 240 240 0.00 240 0.00 0.09
cnt3-5_180 16 20 120 105 12.50 105 12.50 320 320 0.00 320 0.00 0.09
rd53_133 7 12 73 64 12.33 64 12.33 240 232 3.33 232 3.33 0.05
urf1_149 9 11554 57770 51125 11.50 50424 12.72 184864 181816 1.65 181696 1.71 51.47
hwb5_55 5 24 98 87 11.22 87 11.22 296 296 0.00 296 0.00 0.07
sym9_148 10 210 1250 1126 9.92 1074 14.08 3448 3432 0.46 3416 0.93 1.11
121
122
urf2_152 8 5030 25150 22693 9.77 22518 10.47 80480 79880 0.75 79840 0.80 18.30
urf5_158 9 10276 51380 46513 9.47 45662 11.13 164416 162496 1.17 162408 1.22 44.47
4gt5_76 5 13 26 24 7.69 24 7.69 112 104 7.14 104 7.14 0.04
urf4_187 11 32004 160020 148333 7.30 146890 8.21 512064 501824 2.00 501280 2.11 2135.51
sys6-v0_111 10 20 71 69 2.82 68 4.23 280 264 5.71 256 8.57 0.06
rd53_138 8 12 43 42 2.33 42 2.33 176 168 4.55 168 4.55 0.04
rd73_140 10 20 75 74 1.33 74 1.33 288 280 2.78 280 2.78 0.06
sym9_146 12 28 108 107 0.93 107 0.93 384 376 2.08 376 2.08 0.09
rd84_142 15 28 111 110 0.90 110 0.90 408 400 1.96 400 1.96 0.08
6 Optimization
6.1 Adding Lines to Reduce Circuit Cost 123
sulting cost and the percentage improvement are shown for each case relative to
the initial circuit cost (i.e. the cost after template application). Finally, the last
column gives the maximum CPU time (in seconds) measured for a single run
for each benchmark. Results for small circuits with less than five lines and less
than ten gates are omitted. Furthermore, the circuits 4gt11_82, 4gt13_90, decod24-
enable_126, mod5adder_128, mod5adder_129, hwb6_58, ham7_105, ham7_106,
rd73_141, sys6-v0_144, sym9_147, 0410184_169, 0410184_170, rd84_143, cnt3-
5_179, and add8_172 gave no improvement and thus are not listed in Table 6.1.
Considering quantum cost, for most of the circuits significant cost reductions
can be observed—even if only a single line is added. Over all circuits (including the
ones that gave no improvement), adding a single line reduces the quantum cost by
22.51% on average—in the best case (cycle17_3_112) by just over 69%. This can be
further improved if another line is added leading to reductions of additional 5.10%
on average. If transistor cost is considered, the reductions are somewhat smaller but
still significant. When adding a single line the transistor cost is reduced by 5.83% on
average—in the best case (cycle17_3_112) by 37%. Adding a second line reduces
the transistor cost by further 1.65%. Since the number of lines is negligible in CMOS
technologies, this is a notable reduction as well. In addition, these optimizations
can be achieved in very short run-time. Even for circuits including thousands of
gates, the approach terminates after some minutes—in most of the cases after some
seconds.
Besides that, the effect of adding a certain number of helper lines on the resulting
improvement has been evaluated in detail. More precisely, the proposed method has
been applied with one to five helper lines to all the circuits from RevLib (including
the small ones that have been omitted in Table 6.1). Again, all these circuits already
have been optimized using templates as noted above. A total of 95 of the 177 circuits
show an improvement in quantum cost when a single helper line is added. Of the
other 82 circuits, 64 ones have a very few number of lines (less than or equal to 5)
and are already highly optimized due to their relatively small size.
Figure 6.3 shows the improvement in quantum cost of the remaining circuits
(both, for the respective benchmarks in the plot diagram as well as on average in the
table). As already discussed above, a significant improvement can be observed if a
single helper line is added. This is further increased if more lines are applied. How-
ever, the improvements diminish with increasing number of helper lines. Finally, no
further improvement has been observed, if a sixth line is applied. This is the ex-
pected behavior, since multiple helper lines are only useful when multiple factors
sharing common gates are present.
Altogether, applying the proposed approach significant cost reductions can be
achieved if a single line is added to the circuit (even on already optimized realiza-
tions). Further (diminishing) improvements result if more than one helper lines are
applied. The most critical issue is the fact that additional lines must be added to
enable these optimizations. While this is negligible for reversible CMOS technolo-
gies, in design for quantum circuits the designer must trade off if these additional
expenses are worth the additional qubit(s). Since up to 70% of the quantum cost can
be saved, this may be the case for many circuits.
124 6 Optimization
While adding a small number of additional lines may be worth to reduce e.g. the
quantum cost of a circuit (as shown in the previous section), usually circuit lines are
a highly limited resource (caused by the fact that the number of circuit lines cor-
responds to the number of qubits). Furthermore, a high number of lines (or qubits,
respectively) may decrease the reliability of the resulting system. Thus, this number
should be kept as small as possible. In the best case, only the minimal number of
circuit lines should be used. However, to ensure minimality of circuit lines, the un-
deryling function must be given in terms of a truth table or similar descriptions (see
Sect. 3.1.1). But, if larger functions should be synthesized, only hierarchical meth-
ods (like the BDD-based method from Sect. 3.2 or the SyReC-based approach from
Sect. 3.3) are available so far. These often require a significant number of additional
circuit lines (with constant inputs) and, thus, lead to circuits with a large line count.
As an example, consider the reversible realization of the AND function and
the OR function as shown in Fig. 6.4(a) and (b), respectively. Composing these cir-
cuits (as done by hierarchical approaches), a realization with two additional circuit
lines (including constant inputs) results (see Fig. 6.4(c)). But, both functions com-
bined can be realized with one additional circuit line only (see Fig. 6.4(d)). Thus, the
question is how the number of additional lines in reversible circuits can be reduced.
In this section, a post-process optimization method is proposed that addresses
this problem. Garbage outputs (i.e. circuit lines whose output value is don’t care) are
thereby exploited. A multi-stage approach is introduced that (1) identifies garbage
outputs producing don’t cares, (2) re-synthesizes parts of the circuit so that instead
of these don’t cares concrete constant values are computed, and (3) connects the
resulting outputs with appropriate constant inputs. In other words, circuit structures
are modified so that they can be merged with constant inputs resulting in a line
reduction. For the respective re-synthesis step, existing synthesis methods are used.
6.2 Reducing the Number of Circuit Lines 125
Experimental results show that applying this approach, the number of circuit lines
can be reduced by 17% on average—in the best case by more than 40%. Further-
more, depending on the used synthesis approach, these line reductions are possible
only with a small increase in the number of gates and the quantum costs, respec-
tively. In some cases the costs even can be reduced. In this sense, drawbacks of
scalable but line-costly synthesis approaches are minimized.
The remainder of this section is structured as follows. Section 6.2.1 illustrates
the general idea of the proposed approach. Afterwards, the concrete algorithm ex-
ploiting the made observations is described in Sect. 6.2.2. Finally, Sect. 6.2.3 reports
experimental results.
In this section, the idea how to reduce the number of lines in large reversible circuits
is presented. As discussed above, ensuring minimality of circuit lines is only pos-
sible for small functions from which a truth table description can be derived. Thus,
line reduction is considered as a post-optimization problem. The proposed approach
thereby exploits a structure often occurring in circuits generated by scalable syn-
thesis approaches or by composed reversible sub-circuits. This is illustrated by the
following running example.
Example 6.3 Consider the circuit G = g1 . . . g12 depicted in Fig. 6.5(a) representing
a 3-bit adder that has been created by composing three single (minimal) 1-bit adders.
This circuit consists of three additional circuit lines (with constant inputs). Not all
of them are necessarily required. Furthermore, there are a couple of garbage outputs
whose values are don’t care.
In particular of interest in this circuit is the first usage of a line with a constant
input and the last usage of a line with a garbage output. For example, the constant
126 6 Optimization
input at line 4 is firstly used by the fifth gate, while at the same time the value of the
last line is not needed anymore after the second gate. Since the value of the garbage
output doesn’t matter (because it is a don’t care), this might offer the possibility
to merge the line including the constant input with the line including the garbage
output. More precisely, if it is possible to modify the circuit so that a garbage output
returns a constant value (instead of an arbitrary value), then this constant value can
be used in the rest of this circuit. At the same time, a constant input line can be
removed. More formally:
Note that the constant value of the selected line lc is thereby of no importance. If
necessary, the needed value can easily be generated by an additional NOT gate (i.e. a
Toffoli gate without any control lines). Furthermore, constant outputs can only be
produced if the considered circuit includes additional lines with constant inputs.
Example 6.4 Reconsider the adder circuit G = g1 . . . g12 in the running example.
The constant input at line 1 is firstly used by gate g9 , while the values of the garbage
outputs at line 5, line 6, line 9, and line 10, respectively, are not needed anymore
after gate g8 . Since the sub-circuit G81 = g1 . . . g8 can be modified so that e.g. the
garbage output at line 5 becomes assigned to the constant value 0 (see dashed rect-
angle in Fig. 6.5(b)), line 1 can be removed and the newly created constant value
from line 5 can be used instead. The resulting circuit is depicted in Fig. 6.5(b). Now,
this circuit consists of 9 instead of 10 lines.
Note that the respective modification of a sub-circuit is not always possible. For
example, consider the constant input at line 4 (firstly used by gate g5 ) and the
garbage outputs at line 9 and line 10 (not needed anymore after gate g4 ). This
might offer the possibility to remove one more circuit line. But, the sub-circuit
G41 = g1 . . . g4 cannot be modified accordingly since a realization of the 1-bit ad-
dition together with an additional constant output requires more garbage outputs.
Using these observations an algorithm for reducing the number of lines in re-
versible circuits can be formulated. The next section describes the respective steps
in detail. Afterwards, the experiments in Sect. 6.2.3 show that significant reductions
can be obtained with this approach.
6.2.2 Algorithm
Based on the ideas presented in the last section, now an algorithm for circuit line
reduction is proposed. The respective steps are illustrated by means of an example in
Fig. 6.6. At first, an appropriate sub-circuit is determined (a). Afterwards, it is tried
to re-synthesize the sub-circuit so that one of the (garbage) outputs returns a constant
value (b). If this was successful, the re-synthesized sub-circuit is inserted into the
original circuit (c). Finally, the newly created constant output is merged with a line
including a constant input (d). The algorithm terminates if no appropriate sub-circuit
can be determined anymore. In the following the respective steps are described in
detail.
In the considered context, appropriate sub-circuits are characterized by the fact that
they include at least one garbage output which can be later used to replace a constant
input. Therefore, it is important to know when lines of a circuit are used for the first
time and when they are not needed anymore, respectively. This is formalized by the
following two functions:
128 6 Optimization
Using these functions, the flow to determine appropriate sub-circuits can be de-
scribed as follows:
1. Traverse all circuit lines lg of the circuit G = g1 . . . gd that include a garbage
output.
2. Check if line lg can be merged with another line lc including a constant input,
i.e. if there is a constant input line lc so that firstly_used(lc )> lastly_used(lg ). If
this check fails, continue with the next garbage output line lg .
3. Check if the Gk1 = g1 . . . gk with k = lastly_used(lg ) can be modified so that
line lg outputs a constant value. If this check fails, continue with the next line lg
in Step 2. Otherwise, Gk1 is an appropriate sub-circuit.
Note that the order in which the garbage output lines lg are considered typically
has an effect. Here, the lines with the smallest value of lastly_used(lg ) is considered
first. This is motivated by the fact that firstly_used(lc ) > lastly_used(lg ) is a neces-
sary condition which, in particular, becomes true for small values of lastly_used(lg ).
Besides that, the check in Step 3 is strongly related to the re-synthesis of the sub-
circuit which is described next.
6.2 Reducing the Number of Circuit Lines 129
Given an appropriate sub-circuit Gk1 , the next task is to re-synthesize it so that one
garbage output returns a constant value (instead letting it a don’t care). Generally,
any available synthesis approach can be applied for this purpose. But since the num-
ber of circuit lines should be reduced, approaches that generate additional circuit
lines should be avoided. Thus, synthesis methods that require a truth table descrip-
tion (and therewith ensure minimality with respect to circuit lines) are used. Conse-
quently, only sub-circuits with a limited number of primary inputs are considered.
To address this issue, not the whole sub-circuit Gk1 is re-synthesized. Instead, a
bounded cascade of gates which affects the respective garbage output is considered.
More precisely, starting at the output of line lg , the circuit is traversed towards the
inputs of the circuit. Each passed gate as well the lines connected with them are
added to the following consideration.3 The traversal stops, if the number of consid-
ered lines reaches a given threshold λ (in the experimental evaluations, it turned out
that λ = 6 is a good choice).
From the resulting cascade, a truth table description is determined. Afterwards,
the truth table is modified, i.e. the former garbage output at line lg is replaced by
a constant output value. It is thereby important that the modification preserves the
reversibility of the function. If this is not possible, the sub-circuit is skipped and the
next line with a garbage output is considered (see Step 3 from above). Otherwise,
the modified truth table can be passed to a synthesis approach.
Note that the modification of the truth table is only possible, if constant values
at the primary inputs of the whole circuit are incorporated. Constant inputs restrict
the number of possible assignments to the inputs of the considered cascade. This
enables a reversible embedding with a constant output.
Example 6.6 Consider the cascade highlighted by the dashed rectangle in Fig. 6.6(a)
which is considered for re-synthesis. Incorporating the constant values at the pri-
mary inputs of the whole circuit, only the patterns shown in Table 6.2(a) have to be
considered. The outputs for the remaining patterns are not of interest. This function
can be modified so that one of the garbage outputs returns a constant value, while
still reversibility of the overall function is preserved (see Table 6.2(b)). Synthesizing
the modified function, the circuit shown on the right-hand side of Fig. 6.6(b) results.
This circuit can be used to remove a constant line.
3 In other words, the cone of influence of the garbage output line lg is considered.
130 6 Optimization
0 0 0 0 – 0 – 0 0 0 0 0 0 0 – 0
... – – – – ... – – – –
0 1 0 0 – 1 – 1 0 1 0 0 0 1 – 1
... – – – – ... – – – –
1 0 0 0 – 1 – 0 1 0 0 0 0 1 – 0
... – – – – ... – – – –
1 1 0 0 – 0 – 1 1 1 0 0 0 0 – 1
... – – – – ... – – – –
If re-synthesis was successful, the last two steps are straightforward. At first, the
considered sub-circuit is replaced by the newly synthesized one. Afterwards, the
considered garbage output line lg is merged with the respective constant input line lc ,
i.e. the respective gate connections as well as possible primary outputs are adjusted.
Finally, line lc which is not needed anymore is removed.
Example 6.7 Consider the circuit shown in Fig. 6.6. Replacing the highlighted
sub-circuit with the re-synthesized one from Example 6.6, the circuit shown in
Fig. 6.6(c) results. Here, line 1 and line 5 can be merged leading to the circuit
depicted in Fig. 6.6(d) where line 5 has been removed.
The proposed approach for line reduction has been implemented in C++ and eval-
uated using a set of reversible circuits with a large number of constant inputs. As
synthesis method for step (b) of the optimization (see Sect. 6.2.2.2), two different
approaches have been evaluated, namely
1. an exact synthesis approach (based on the principles described in Chap. 4 and
denoted by exact synthesis in the following) that realizes a circuit with minimal
number of gates but usually requires a significant amount of run-time and
2. a heuristic synthesis approach (namely the transformation-based method intro-
duced in Sect. 3.1.2; in the following denoted by heuristic synthesis) that does
not ensure minimality but is very efficient regarding run-time.
As benchmarks, reversible circuits obtained by the BDD-based synthesis ap-
proach (from Sect. 3.2) were used. These circuits include a significant number of
constant inputs that originated from the synthesis and thus cannot be easily removed.
6.3 Optimizing Circuits for Linear Nearest Neighbor Architectures 131
The experiments have been carried out on an Intel Core 2 Duo 2.26 GHz with 3 GB
of main memory.
The results of the evaluation are presented in Table 6.3. The first four columns
give the name (Benchmark), the number of circuit lines4 (Lines), the gate count (d),
and the quantum cost (QC) of the original circuits. In the following columns, the
respective values after line reduction as well as the run-time needed for optimization
(in CPU seconds) are reported. It is thereby distinguished between results obtained
by applying exact synthesis and results obtained by applying heuristic synthesis in
Step (b).
As can be seen by the results, the number of lines can be significantly reduced for
all considered reversible circuits. On average, the number of lines can be reduced
by 17%—in the best case (spla with exact synthesis) by more than 40%.5 As already
mentioned in Sect. 6.2.2.2, reducing the circuit lines might lead to an increase in the
number of gates as well as in the quantum costs. This is also observable in the
results.
In this sense, the differences between the applied synthesis approaches provide
interesting insights. While the application of exact synthesis leads to larger run-
times (in the worst case more than 3 CPU hours are required), results from the
heuristic method are available within minutes. But, the differences in the respec-
tive number of gates and the quantum costs, respectively, are significant. If exact
synthesis is applied, the increase in number of gates and quantum cost can be kept
small—for some circuits (e.g. cordic and spla) even reductions have been achieved.
4 Including both, the number of primary inputs/outputs as well as the number of additional circuit
lines.
5 Note that thereby still the number of primary inputs/outputs are considered which cannot be re-
duced.
132
significantly increases the quantum cost of the resulting circuits (an important cost
criteria for LNN architectures as well), improvements are suggested in Sect. 6.3.2.
Finally, the effect of this new optimization method is experimentally evaluated in
Sect. 6.3.3.
As described in Sect. 2.1.3, quantum circuits can be obtained using reversible cir-
cuits as a basis which are afterwards mapped to a cascade of quantum gates. Al-
ternatively, quantum circuits can be directly addressed as e.g. done by the BDD-
based synthesis approach described in Sect. 3.2 or by the exact method described
in Sect. 4.2.2. However, the resulting quantum circuits often include non-adjacent
gates and thus are not applicable to LNN architectures. To have a distinct measure-
ment of this, the following definition introduces the new NNC cost metric.
Definition 6.4 Consider a 2-qubit quantum gate q where its control and target are
placed at the cth and tth line (0 ≤ c, t < n), respectively. The Nearest Neighbor Cost
(NNC) of q is defined as |c − t − 1|, i.e. the distance between control and target lines.
The NNC of a single qubit gate is defined as 0. The NNC of a circuit is defined as
the sum of the NNCs of its gates. Optimal NNC for a circuit is 0 when all quantum
gates are either 1-qubit gates or 2-qubit gates performed on adjacent qubits.
Example 6.8 Figure 6.7(a) shows the standard decomposition of a Toffoli gate lead-
ing to an NNC value of 1.
Lemma 6.1 Consider a quantum gate q where its control and target are placed at
the cth and tth line, respectively. Using adjacent SWAP gates as proposed, addi-
tional quantum cost of 6 · |c − t − 1| are needed.
Proof In total, |c −t −1| adjacent SWAP operations are required to move the control
line to the target line so that both become adjacent. Another |c − t − 1| SWAP
operations are needed to restore the original order. Considering quantum cost of
3 for each SWAP operation, this leads to the additional quantum cost of 6 · |c −
t − 1|.
In the rest of this section, this method is denoted by naive NNC-based decom-
position. Scenarios like this also have been applied to construct circuits for LNN
architectures so far (see e.g. [CS07, Kha08]). However, this naive method might
lead to a significant increase in quantum cost. Thus, in the next section more elabo-
rated approaches for synthesizing NNC-optimal circuits are proposed.
Two improved approaches for NNC-optimal quantum circuit generation are intro-
duced. The first one exploits exact synthesis techniques, while the second one ma-
nipulates the circuit and specification, respectively.
In Chap. 4, exact synthesis approaches have been introduced that ensure minimality
of the resulting circuits. The synthesis problem is thereby expressed as a sequence of
Boolean satisfiability (SAT) instances. For a given function f , it is checked whether
6.3 Optimizing Circuits for Linear Nearest Neighbor Architectures 135
where
• inpi is a Boolean vector representing the inputs of the circuit to be synthesized
for truth table line i,
• outi is a Boolean vector representing the outputs of the circuit to be synthesized
for truth table line i, and
• Φ is a set of constraints representing the synthesis problem for quantum circuit
as described in Sect. 4.2.2.
Applying this formulation for synthesis of quantum circuits, NNC-optimality can
be ensured by modifying the constraints in Φ so that they do not represent the whole
set of quantum gates, but only adjacent gates. In doing so, exact synthesis is per-
formed that determines minimal circuits not only with respect to quantum gates, but
also with respect to NNC. Consequently, significant better NNC-optimal decompo-
sition than the one from Fig. 6.7(b) can be synthesized.
However, the applicability of such an exact method is limited to relatively small
functions. In this sense, the proposed method is sufficient to construct minimal de-
compositions for a set of Toffoli and Peres gate configurations as shown in Table 6.4.
But nevertheless, these results can be exploited to improve the naive NNC-based de-
composition: Once an exact NNC-optimal quantum circuit for a reversible gate is
available (denoted by macro in the following), it can be reused as shown by the
following example.
(with respect to both quantum cost and NNC) as shown in Fig. 6.7(c) is determined.
In comparison to the naive method (see Fig. 6.7(b)), this reduces the quantum cost
from 11 to 9 while still ensuring NNC optimality. Furthermore, the realization can
be reused as a macro while decomposing larger reversible circuits. For example,
consider the circuit shown in Fig. 6.8. Here, for the second gate the naive method
is applied (i.e. standard decomposition is performed and SWAPs are added), while
for the remaining ones the obtained macro is used. This enables a quantum cost
reduction from 96 to 92.
In total, 13 macros have been generated as listed in Table 6.4 together with the
respective costs in comparison to the costs obtained by using the naive method. As
can be seen, exploiting these macros reduces the cost for each gate by up to 63%.
The effect of these macros on the decomposition of reversible circuits is considered
in detail in the experiments.
Applying the approaches introduced so far always leads to an increase in the quan-
tum cost for each non-adjacent gate. In contrast, by modifying the order of the circuit
lines (similar to the SWOP approach introduced in Sect. 5.3), some of the additional
costs can be saved. As an example, consider the circuit in Fig. 6.9(a) with quantum
cost of 3 and an NNC value of 6. By reordering the lines as shown in Fig. 6.9(b),
the NNC value can be reduced to 1 without increasing the total quantum cost. To
determine which lines should be reordered, two heuristic methods are proposed in
the following. The former one changes the order of the circuit lines according to a
global view, while the latter one applies a local view to assign the line order.
operations as described in the previous sections are added for each non-adjacent
gate.
Example 6.11 Consider the circuit depicted in Fig. 6.10(a). After calculating the
NNC contributions, impx0 = 1.5, impx1 = 0, impx2 = 0.5, and impx3 = 1 result.
Thus, line x0 (highest impact) and line x2 (middle line) are swapped. Since further
swapping does not improve the NNC value, reordering terminates and SWAP gates
are added for the remaining non-adjacent gates. The resulting circuit is depicted in
Fig. 6.10(b) and has quantum cost of 9 in comparison to 21 that results if the naive
method is applied.
Local Reordering In order to save SWAP gates, line reordering can also be ap-
plied according to a local schema as follows. The circuit is traversed from the inputs
to the outputs. As soon as there is a gate q with an NNC value greater than 0, a
SWAP operation is added in front of q to enable an adjacent gate. However, in con-
trast to the naive NNC-based decomposition, no SWAP operation is added after q.
Instead, the resulting order is used for the rest of the circuit (i.e. propagated through
the remaining circuit). This process is repeated until all gates are traversed.
Example 6.12 Reconsider the circuit depicted in Fig. 6.10(a). The first gate is not
modified, since it has an NNC of 0. For the second gate, a SWAP operation is applied
to make it adjacent. Afterwards, the new line order is propagated to all remaining
gates resulting in the circuit shown in Fig. 6.10(c). This procedure is repeated until
the whole circuit has been traversed. Finally, again a circuit with quantum cost of 9
(in contrast to 21) results.
138 6 Optimization
In this section, experimental results obtained with the introduced approaches are
presented. The methods introduced in Sects. 6.3.1 and 6.3.2, respectively, are evalu-
ated by measuring the overhead needed to synthesize circuits with an optimal NNC
value of 0. The approaches have been implemented in C++ and applied to bench-
mark circuits from RevLib [WGT+08] using an AMD Athlon 3500+ with 1 GB of
main memory.
The results are shown in Table 6.5. The first column gives the names of the
circuits. Then, the number of circuit lines (n), the gate count (d), the quantum
cost (QC), and the NNC value of the original (reversible) circuits are shown. The fol-
lowing columns denote the quantum cost of the NNC-optimal circuits obtained by
using the naive method (NAIVE), by additionally exploiting macros (+Macros), and
by applying reordering (G LOBAL, L OCAL, or both), respectively. The next column
gives the percentage of the best quantum cost reduction obtained by the improve-
ments in comparison to the naive method (B EST I MPR). The last column shows the
smallest overhead in terms of quantum cost needed to achieve NNC-optimality in
comparison to the original circuit (OVERHEAD FOR NNC OPTIMALITY). All run-
times are less than one CPU minute and thus are omitted in the table.
As can be seen, decomposing reversible circuits to have NNC-optimal quantum
circuits is costly. Using the naive method, the quantum cost increases by one order
of magnitude on average. However, this can be significantly improved if macros
or reordering are applied. Even if reordering may worsen the results in some few
cases (e.g. for local reordering in 0410184_169 or add64_184), in total this leads
to an improvement of 50% on average—in the best case 83% improvement was
observed. If the respective methods are separately considered, it can be concluded
that the combination of global and local reordering (i.e. G LOB .+L OC .) leads to the
best improvements over all benchmarks. As a result, NNC-optimal circuits can be
synthesized with a moderate increase of quantum cost.
Since synthesis results often are not optimal, optimization is an established part of
today’s design flows. In this chapter, three optimization approaches for reversible
logic have been introduced. While the first one reduces the circuit cost in general
(i.e. the sizes of the respective gates and therewith quantum or transistor cost, re-
spectively), the second one reduces the number of lines, and the third one addresses
a more dedicated technology specific cost criterion.
These approaches clearly show that post-synthesis optimization often has to be
done with respect to the desired technology. For example, if quantum circuits are ad-
dressed the designer has to trade-off if an up to 70% cost reduction justifies to add a
new circuit line (and therewith to spend one more qubit). Moreover, the NNC-based
Table 6.5 Results of NNC-optimal synthesis
C IRCUIT D ECOMPOSED (NNC- OPTIMAL ) C IRCUITS OVERHEAD
NAIVE +M ACROS R EORDERING B EST FOR NNC
I MPR OPTIMALITY
G LOBAL L OCAL G LOB .+L OC .
n d QC NNC QC QC QC QC QC
optimization is only needed, if quantum circuits for the mentioned LNN architec-
tures should be designed. Thus, optimization approaches should be available that
can be applied by the designer according to his current needs.
In future work, optimization approaches for further cost metrics are needed. The
constraints considered in this chapter are not complete. Besides quantum cost, tran-
sistor cost, number of lines, and nearest neighbor cost, many more cost metrics exist
(see e.g. [WSD09]). But so far, previous synthesis and optimization approaches only
take these metrics into account. Thus, extending these methods so that further crite-
ria are considered is a promising task for the future.
Chapter 7
Formal Verification and Debugging
This chapter introduces approaches for formal verification and debugging and there-
with completes the proposed approaches towards a design flow for reversible logic.
Verification is an essential step that ensures whether obtained designs in fact realize
the desired functionality or not. This is important as with increasing complexity also
the risk of errors due to erroneous synthesis and optimization approaches as well as
imprecise specifications grows. Considering traditional circuits, verification has be-
come one of the most important steps in the design flow. As a result, in this domain
very powerful approaches have been developed, ranging from simulative verifica-
tion (see e.g. [YSP+99, Ber06, YPA06, WGHD09]) to formal equivalence checking
(see e.g. [Bra93, DS07]) and model checking (see e.g. [CGP99, BCCZ99]), respec-
tively. For a more comprehensive overview, the reader is referred to [Kro99, Dre04].
For reversible logic, verification is still at the beginning. Even if first approaches
in this area exist (e.g. [VMH07, GNP08, WLTK08]), they are often not applicable
(e.g. circuits representing incompletely-specified functions are not supported). Fur-
thermore, with new synthesis approaches (e.g. the BDD-based or the SyReC-based
method introduced in Chap. 3), circuits can be designed that contain 100 and more
circuit lines and ten thousands of gates—with upward trend. This increase in cir-
cuit size and complexity cannot be handled manually and thus efficient automated
verification approaches are required. At the same time, existing approaches for tra-
ditional circuits cannot be directly applied, since they do not support the specifics of
reversible logic such as new gate libraries, quantum values, or different embeddings
of the same target function.
In the first part of this chapter, equivalence checkers [WGMD09] are proposed
that fulfill these requirements. More precisely, two approaches are introduced that
check whether two circuits realize the same target function regardless of how the
target function has been embedded or whether the circuit contains quantum logic
or pure Boolean logic. The first approach employs decision diagram techniques,
while the second one uses Boolean satisfiability. Experimental results show that for
both methods, circuits with up to 27,000 gates, as well as adders with more than
100 inputs and outputs, are handled in under three minutes with reasonable memory
requirements.
R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 143
DOI 10.1007/978-90-481-9579-4_7, © Springer Science+Business Media B.V. 2010
144 7 Formal Verification and Debugging
However, while verification methods can be used to detect the existence of er-
rors, they do not provide any information about the source of the error. Thus, in
the second part of this chapter, it is shown how the debugging process can be ac-
celerated by using an automatic debugging method (where first results have been
published in [WGF+09]). This method gets an erroneous circuit as well as a set of
counterexamples as input, and determines a set of gates (so called error candidates)
whose replacements with other gates fixes the counterexamples. Having this set of
error candidates, the debugging problem is significantly simplified, since only small
parts of the circuit must be considered to find the error. The proposed debugging
approach thereby also uses Boolean satisfiability and is inspired by traditional cir-
cuit debugging [SVAV05]. Moreover, this approach is further extended so that also
the concrete error locations are determined, i.e. gate replacements that do not only
fix the counterexamples but additionally ensures that the specification is preserved.
Experiments show and discuss the effect of these methods.
The following two sections describe both approaches in detail, i.e. equivalence
checking is addressed in Sect. 7.1 and automated debugging in Sect. 7.2, respec-
tively. Finally, the chapter is summarized and future work is sketched in Sect. 7.3.
In this section, two approaches to the equivalence checking problem are presented.
Realizations of both, completely or incompletely specified functions are supported.
The circuits can be composed of reversible gates and quantum gates, and can thus
assume multiple internal values, but the primary inputs and outputs of the circuits
are restricted to pure (non-quantum) logic states.
The proposed approaches build on well-known proof techniques for formal ver-
ification of irreversible circuits, i.e. decision diagrams and satisfiability. The first
approach employs Quantum Multiple-valued Decision Diagrams (QMDDs) (see
Sect. 2.2.2). It involves the manipulation of unitary matrices describing the circuits
and additional matrices specifying the total or partial don’t cares. The second ap-
proach is based on Boolean satisfiability (SAT) (see Sect. 2.3.1). It is shown that
the equivalence checking problem can be transformed to a SAT instance supporting
constant inputs and garbage outputs. Additional constraints are added to deal with
total and partial don’t cares. Experiments on a large set of benchmarks show that
both approaches are very efficient, i.e. circuits with up to 27,000 gates, as well as
adders with more than 100 inputs and outputs, are handled in less than three minutes
with reasonable memory requirements.
The remainder of this section is structured as follows. The circuit equivalence
checking problem is defined in Sect. 7.1.1. Sections 7.1.2 and 7.1.3 present the
QMDD-based and the SAT-based approach, respectively. Finally, experimental re-
sults are given in Sect. 7.1.4.
7.1 Equivalence Checking 145
The goal of equivalence checking is to prove whether two reversible (or quantum)
circuits designed to realize the same target functionality are equivalent or not. In
the latter case, additionally a counterexample is generated, i.e. an input assignment
showing the different output values of the two circuits. It is thereby assumed that
the two circuits have the same labels for the primary inputs and primary outputs,
respectively.
Since all considered circuits are reversible, circuits representing irreversible
functions (e.g. any n-input m-output function with n = m) might contain constant
inputs, garbage outputs, and don’t care conditions (see Sect. 3.1.1). Thus, five types
of functions must be considered:
• Completely specified: A completely specified reversible function is given.
• Constant input: At least one input is assigned to a fixed logic value. For the other
assignments to these inputs, all respective outputs are don’t cares.
• Garbage output: At least one output is unspecified for all input assignments.
• Total don’t care condition: The values of all outputs are unspecified for a given
assignment to the inputs.
• Partial don’t care conditions: The values of a proper subset of the outputs are
unspecified for a given assignment to the inputs.
Table 7.1 shows truth tables of a completely specified function, a function with
a constant input, a function with a garbage output, a function with total don’t care
conditions, and a function with partial don’t care conditions, respectively. A specifi-
cation with constant inputs, garbage outputs, or any don’t care conditions is denoted
as incompletely specified function. Total and partial don’t cares are inherited from
the irreversible function whereas constant input and garbage output don’t cares usu-
ally arise from embedding the irreversible function in a reversible specification.
In the next sections both, a QMDD-based and a SAT-based approach for checking
the equivalence of two circuits with respect to constant inputs, garbage outputs, and
don’t care conditions is proposed, respectively.
Given a reversible circuit with gates g0 g1 . . . gd−1 , the matrix describing the cir-
cuit is given by M = Md−1 × · · · × M1 × M0 where Mi is the matrix for gate gi .
146 7 Formal Verification and Debugging
0 0 0 1 1 1 0 0 0 – – – 0 0 0 1 1 –
0 0 1 1 1 0 0 0 1 – – – 0 0 1 1 1 –
0 1 0 1 0 1 0 1 0 – – – 0 1 0 1 0 –
0 1 1 1 0 0 0 1 1 – – – 0 1 1 1 0 –
1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 –
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 –
1 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 –
1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 –
(a) Completely spec. (b) Constant input (c) Garbage output
x0 x1 x2 f0 f1 f2 x0 x1 x2 f0 f1 f2
0 0 0 1 1 1 0 0 0 1 1 1
0 0 1 1 1 0 0 0 1 1 1 –
0 1 0 – – – 0 1 0 1 0 1
0 1 1 1 0 0 0 1 1 1 – 0
1 0 0 0 1 1 1 0 0 0 1 1
1 0 1 – – – 1 0 1 0 1 0
1 1 0 0 0 1 1 1 0 – 0 1
1 1 1 0 0 0 1 1 1 0 0 0
(d) Total don’t cares (e) Part. don’t cares
As reviewed in Sect. 2.2.2, QMDDs are a data-structure for the representation and
manipulation of r n × r n complex-valued matrices with r pure logic states. More-
over, for a given order the QMDDs of two identical functions are canonical [MT06,
MT08].
Thus, for the completely specified case, two reversible circuits that realize the
same function and adhere to the same variable order have the same matrix descrip-
tion. Because of this uniqueness of QMDDs, to check the equivalence of two circuits
it is sufficient to verify that the top edges of the two QMDDs point to the same node
with the same weight. A traversal of the QMDDs is not required. Note that sorting is
required when the inputs or outputs, respectively, are not aligned in the two circuits.
Therefore, swap gates are added to achieve the same order for both circuits.
A constant input means that the input space is restricted to those assignments con-
taining that value, all others do not occur. To support this, the matrix is adjusted.
Consider the case when the constant input is the top-level partition variable with
7.1 Equivalence Checking 147
constant value j . The following equations show the transformation of the input
space (denoted by γ ) to the output space (denoted by δ) for the general case and
for the case with a constant input, respectively.1
δ0 M0 M1 γ0 δ0 Mj φ γj
= , = .
δ1 M2 M3 γ1 δ1 Mj +2 φ φ
where δ̂ denotes the output after removal of the garbage output. To explain the ad-
dition of sub-matrices, recall that the circuit inputs and outputs are assumed to be
in pure logic states, i.e. one element of γ is 1 and the others are 0. The same is true
for δ. Further, M is a permutation matrix (a special case of a unitary matrix).
In general, constant inputs and garbage outputs can correspond to any variables
in the circuit’s QMDD. This can be handled by performing a depth-first traversal
of the QMDD applying the above reductions to each node as it is encountered. In
a depth-first traversal, the reductions are applied to a node’s descendants before
applying them to the node itself. Note that a variable can be both a constant input
and a garbage output. The order of applying the two reductions is unimportant. This
traversal reduces sub-matrices as required throughout the full matrix.
Let M̂ denote the matrix for a circuit after the constant input and garbage output re-
ductions are applied. To deal with total don’t cares in the target function, a diagonal
matrix D is constructed such that Di,i = 0 if the corresponding output position is a
total don’t care, and Di,i = 1 otherwise. Then M̂ × D is computed. The effect is to
force all total don’t care situations to 0 by ignoring the input states corresponding
to don’t care output assignments. This ensures that when the reduced matrices are
compared for two circuits, differences cannot arise in total don’t care positions. Note
that the easiest way to construct a QMDD for D is to start from a diagonal matrix
and then use a depth-first traversal to zero the diagonal elements corresponding to
total don’t cares.
1 Note that both matrices already are partitioned to correspond to the QMDD partitioning.
148 7 Formal Verification and Debugging
Partial don’t care conditions can be handled in a similar fashion. The difference
is that partial don’t care conditions apply only to a subset and not to all outputs. The
simplest approach is to treat the outputs for which a set of partial don’t cares does
not apply as pseudo-garbage outputs and construct a new matrix for this situation by
reducing the pseudo-garbage. A diagonal matrix is then constructed for those don’t
cares and the equivalence check proceeds as above. This is repeated for each subset
of the outputs that have shared partial don’t cares.
In this section, the SAT-based equivalence checker for reversible logic is described.
The general idea is to encode the problem as an instance of Boolean satisfiability
to be solved by a SAT solver (see Sect. 2.3.1). If the SAT solver returns unsatisfi-
able, then the checked circuits are equivalent. Otherwise, a counterexample can be
extracted from the satisfying assignment of the instance.
Example 7.1 The miter structure for two circuits containing three lines is shown in
Fig. 7.1. Note that the added XOR and OR operations are only used in formulating
circuit equivalence checking as a SAT instance. They are not actually added to the
circuits.
To encode this miter as a SAT instance, a new free variable is introduced for each
signal in the circuit. Furthermore, each reversible gate as well as the additional XOR
and OR operations of the miter structure are represented by a set of clauses. Finally,
the output of the OR gate is constrained to the value 1. These transformations can be
performed in linear time and space with respect to the given circuit sizes, since only
local operations (adding a few number of clauses per gate) are required. In doing
so, the SAT instance becomes satisfiable iff there exists an input assignment to the
circuits where at least one pair of the corresponding outputs assumes different val-
ues. In this case, from the satisfying assignment a counterexample can be extracted
just by obtaining the assignments of all SAT variables representing circuit lines. If
both circuits are equal, then no such assignment exists and thus the SAT solver must
return unsatisfiable.
If quantum circuits are considered, V and V+ gates may produce non-Boolean
values. Thus, the variables for the associated signals employ a multiple-valued rather
7.1 Equivalence Checking 149
than a Boolean encoding. More precisely, each signal of the circuit is represented
by two Boolean variables y and z, while
• 00 represents the Boolean value 0,
• 01 represents the non-Boolean value V0 ,
• 10 represents the Boolean value 1, and
• 11 represents the non-Boolean value V1 .
So, both reversible as well as quantum circuits can be checked.
Example 7.2 Figure 7.2 shows the extended miter for two circuits realizing an in-
completely specified function. A truth table showing the garbage output, the don’t
care conditions, as well as the resulting values for t and pfi is given in the left part
of Fig. 7.2. Note that the first half of the truth table includes don’t cares due to the
constant input.
150 7 Formal Verification and Debugging
This section provides experimental results. The QMDD package from [MT08] has
been used with the default sizes for the computational tables, a garbage collection
limit of 250,000, and a maximum of 200 circuit lines. For the SAT-based approach
the SAT solver MiniSAT [ES04] has been applied. In total, two kinds of experi-
ments have been conducted (typically using different gate libraries): experiments
with equivalent circuits and with non-equivalent circuits, respectively. The experi-
ments have been carried out on an AMD Athlon 3500+ with 1 GB of memory with
a timeout of 500 CPU seconds. All considered benchmark circuits were taken from
RevLib [WGT+08].
Table 7.2 shows the results obtained by comparing equivalent and non-equivalent
circuits, respectively. The first column gives the name of the circuit. For equivalent
circuits, two numbers following the name give the unique identifier of the circuit
realizations as used in RevLib. For non-equivalent circuits, only one number is given
which identifies the circuit from the corresponding equivalent test with the larger
number of gates. That circuit is used as given in RevLib and in a modified form by
arbitrarily altering, adding or deleting gates. Column DC shows the types of don’t
cares (see table note a for the coding). In column GT the gate types used in each
circuit are provided (see table note b for the coding). Column n presents the number
of inputs and column d gives the number of gates for the first and second circuit,
respectively. In the next three columns, the data for the QMDD-based approach is
shown, i.e. peak number of QMDD nodes, run-time in CPU seconds, and memory
in MByte. The peak number of nodes is the maximum number of active nodes at
any point in building the circuit QMDDs and checking for equivalence. Finally, the
last four columns provide the data for the SAT-based method. First, the number of
variables and the number of clauses of the SAT instance are shown. Then, the run-
time as well as the required memory are given.
Both approaches prove or disprove the equivalence for all considered bench-
marks (except one for the SAT-based approach) very quickly. The maximum run-
time observed was less than three minutes. Several experiments with more than
10,000 gates are included. The largest has nearly 27,000 gates. Even for these cases,
the proof times are very fast. The largest circuit in terms of the number of inputs
Table 7.2 Experimental results
C IRCUITS QMDD- BASED SAT- BASED
NAME DCa GTb n d N ODES T IME M EM . VARS C LSES T IME M EM .
E QUIVALENT C IRCUITS
0410184 (170, 169) none NCV/NCT 14 74/46 2924 0.01 12.54 2843 4640 0.05 3.70
add16 (175, 174) CG CV/CT 49 96/64 7841 0.02 12.72 12788 18356 0.08 6.31
7.1 Equivalence Checking
add32 (185, 183) CG CV/CT 97 192/128 28761 0.03 13.82 50148 70500 0.29 15.14
add64 (186, 184) CG CV/CT 193 384/256 109769 0.12 17.53 198596 276164 1.12 48.10
alu-v0 (26, 27) G NCT/NCT 5 6/6 171 0.01 12.53 72 159 <0.01 2.70
alu-v1 (28, 29) G NCT/NCT 5 7/7 188 <0.01 12.53 82 180 <0.01 2.70
alu-v2 (30, 33) G NCT/NCT 5 18/7 295 <0.01 12.53 145 340 <0.01 2.70
alu-v3 (34, 35) G NCT/NCT 5 7/7 177 <0.01 12.53 83 185 <0.01 2.70
alu-v4 (36, 37) G NCT/NCT 5 7/7 223 <0.01 12.53 84 189 <0.01 2.70
c2 (182, 181) none NCV/NCT 35 305/116 40767 0.08 14.36 26018 40774 0.52 10.67
urf1 (149, 151) none T/CT 9 11554/1487 250311 5.05 23.91 130419 302871 32.10 55.24
urf2 (152, 154) none T/CT 8 5030/620 250020 1.09 23.89 50849 119592 3.82 23.86
urf3 (155, 157) none T/CT 10 26468/2674 250441 22.71 23.91 320578 735809 86.81 134.67
urf5 (158, 159) none T/CT 9 10276/499 250061 3.32 23.91 107764 249191 19.15 48.18
urf6 (160, 160) none T/T 15 10740/10740 258140 97.95 24.25 343712 751876 >500.00 –
cnt3-5 (179, 180) CGT CT/CT 16 25/20 204874 0.64 21.95 2338 21976 0.43 9.56
decod24-v2 (43, 44) CT NCT/NCV 4 6/9 192 <0.01 12.52 115 214 <0.01 2.70
hwb4 (52, 49) none CT/CT 4 11/17 208 <0.01 12.52 133 336 0.01 2.82
hwb5 (55, 53) none NCT/CT 5 24/55 972 <0.01 12.52 448 1111 0.02 2.94
hwb6 (56, 58) none CT/CT 6 126/42 3588 0.01 12.53 1146 2846 0.03 3.21
151
Table 7.2 (Continued)
152
hwb7 (62, 60) none NCT/F 7 331/166 22010 0.04 13.44 3730 9249 0.14 4.49
hwb8 (116, 115) none NCT/CTP 8 749/614 82942 0.26 16.19 11599 27786 1.02 7.42
hwb9 (123, 122) none NCT/CTP 9 1959/1541 250162 1.89 23.88 33454 79798 6.28 16.73
mod10 (171, 176) T NCT/NCT 4 10/7 122 <0.01 12.53 97 265 <0.01 2.70
mod8-10 (178, 177) GTP NCT/NCT 5 9/14 235 <0.01 12.53 161 491 0.01 2.70
rd53 (135, 134) CGT CT/TP 7 16/12 593 <0.01 12.53 224 523 0.01 2.83
sym9 (146, 147) CGT CT/CTP 12 28/28 8825 0.02 12.89 727 1582 0.02 3.37
N ON - EQUIVALENT C IRCUITS
0410184 (170) none NCV/NCV 14 74/73 2550 <0.01 12.59 4318 6385 0.04 4.16
add16 (175) CG CV/NCV 49 96/97 9516 0.02 13.22 19270 23732 0.10 7.23
add32 (185) CG CV/NCV 97 192/193 21892 0.04 14.11 75398 90522 0.38 19.31
add64 (186) CG CV/NCV 193 384/386 91417 0.15 17.93 298632 353478 1.53 63.91
alu-v0 (26) G NCT/NCT 5 6/5 116 <0.01 12.52 67 148 <0.01 2.70
alu-v1 (28) G NCT/CT 5 7/6 119 <0.01 12.52 77 172 <0.01 2.70
alu-v2 (30) G NCT/NCT 5 18/16 346 <0.01 12.52 198 480 0.01 2.70
alu-v3 (34) G NCT/CT 5 7/6 202 <0.01 12.52 79 178 <0.01 2.70
alu-v4 (36) G NCT/CT 5 7/6 204 0.01 12.52 81 186 <0.01 2.70
c2 (182) none NCV/NCV 35 305/304 34107 0.15 14.21 43648 64262 0.32 15.84
urf1 (149) none T/CT 9 11554/11554 250311 9.53 23.91 231099 531527 4.58 97.09
urf2 (152) none T/T 8 5030/5029 229154 2.00 22.97 90549 211280 2.12 39.08
urf3 (155) none T/T 10 26468/26464 250441 44.14 23.91 582274 1323351 13.72 243.19
7 Formal Verification and Debugging
Table 7.2 (Continued)
C IRCUITS QMDD- BASED SAT- BASED
NAME DCa GTb n d N ODES T IME M EM . VARS C LSES T IME M EM .
7.1 Equivalence Checking
urf5 (158) none T/CT 9 10276/10276 250163 6.91 23.90 205539 472739 4.85 84.53
urf6 (160) none T/CT 15 10740/10740 260196 144.93 24.50 343711 751873 16.48 143.23
cnt3-5 (179) CGT CT/NCT 16 25/26 204745 0.63 21.94 2429 22158 0.43 9.74
decod24-v2 (43) CT NCT/NCT 4 6/5 89 <0.01 12.52 60 143 <0.01 2.70
hwb (4) none CT/NCT 4 11/18 216 <0.01 12.52 137 344 0.01 2.83
hwb5 (55) none NCT/CT 5 24/54 961 <0.01 12.53 442 1094 0.01 2.94
hwb6 (56) none CT/CT 6 126/125 3922 0.01 12.53 1733 4357 0.03 3.50
hwb7 (62) none NCT/NCT 7 331/331 22541 0.06 13.45 4865 11553 0.08 4.94
hwb8 (116) none NCT/NCT 8 749/750 61990 0.31 15.28 12438 28977 0.16 8.05
hwb9 (123) none NCT/NCT 9 1959/1958 250162 2.55 23.91 36244 83568 0.57 17.39
mod (10) T NCT/NCT 4 10/10 107 <0.01 12.53 109 296 <0.01 2.70
mod8 (10) GTP NCT/NCT 5 9/15 269 <0.01 12.52 166 501 0.01 2.70
rd53 (135) CGT CT/CT 7 16/16 611 <0.01 12.54 252 582 0.01 2.82
sym9 (146) CGT CT/CT 12 28/27 10045 0.02 12.93 714 1553 0.01 3.36
a Don’t-care: none = completely-specified, C = constant input, G = garbage output, T = total don’t-care, P = partial don’t-care
b Gate type: N = NOT, C = controlled-NOT, F = multiple control Fredkin, P = Peres, T = multiple control Toffoli, V = V or V+
153
154 7 Formal Verification and Debugging
and outputs, add64 with n = 193, is a 64 bit ripple carry adder which includes 64
constant inputs and 128 garbage outputs to achieve reversibility. Comparing the run-
times of the two approaches, the QMDD method is faster in the case of equivalent
circuits, while in the non-equivalent case the SAT-based approach often is faster.
Regarding memory usage it can be seen that both approaches do not blow up even
for instances containing tens of thousands gates.
To keep the following descriptions self-contained, this section defines the debugging
problem and briefly reviews the SAT-based method for debugging of traditional cir-
cuits.
As their classic counterparts, reversible circuits may contain errors e.g. because of
bugs in synthesis as well as optimization tools or imprecise specifications, respec-
tively. These errors can be detected e.g. by verification as introduced in the last sec-
tion. However, to find the source of an error, the circuit must be debugged—often a
manual and time-consuming process.
Thus, automatic approaches are desired that support the designer to reduce the
possible error locations. Therefore, error models have been defined that represent
frequently occurred errors. Possible error models for reversible logic include:
Remark 7.1 Note that some error models are supersets of other models. For exam-
ple, all control line errors are also wrong gate errors. As shown later, the automatic
debugging approaches can be improved, if the designer is able to restrict the error
to a particular model.
Example 7.3 Figure 7.3 shows an erroneous circuit G together with a counterexam-
ple (applied to the inputs of G). At the outputs, the wrong values (determined by
the counterexamples) as well as the expected (i.e. the correct) values are annotated.
For this example, {g5 } is an error candidate, since replacing g5 with another gate
(namely a gate with one more control line) would correct the output values. In this
case, the counterexample detects a missing control error.
To determine error candidates, automatic approaches have been introduced for tradi-
tional circuits (see e.g. [LCC+95, VH99, HK00, SVAV05, ASV+05, FSVD06]). In
particular, methods based on Boolean satisfiability (SAT) have been demonstrated
to be very effective for debugging irreversible logic [SVAV05]. Here, the erroneous
circuit and a set of counterexamples are used to create a SAT instance. Solving this
instance using well engineered SAT solvers (see e.g. [ES04]) returns solutions from
which the desired set of error candidates can be determined.
The general structure for the debugging problem that is encoded as a SAT in-
stance is shown in Fig. 7.4. For each counterexample, a copy of the circuit is cre-
ated, whose inputs are assigned to values provided by the given counterexamples
(denoted by cex0 , . . . , cex|CEX| ). The outputs are assigned to the correct values. Fur-
thermore, each gate gi is extended by additional logic, i.e. a multiplexor with select
7.2 Automated Debugging and Fixing 157
In this section, the SAT-based debugging formulation for reversible circuits is pre-
sented. It is shown that the formulation for traditional circuits cannot be directly
applied. Nevertheless, a similar concept is exploited. Furthermore, for specific error
models (namely all control line errors) improvements are introduced.
The debugging approach described above has been demonstrated to be very effective
for determining error candidates of irreversible circuits. One-output gates for AND,
158 7 Formal Verification and Debugging
OR, XOR, etc. are thereby considered. Thus, only a single multiplexor as shown
in Fig. 7.5(a) is added to express whether a gate gi can become part of an error
candidate or not.
In contrast, reversible logic builds on a different gate library where each gate
always has n outputs. Indeed, it is possible to convert Toffoli gates into respective
AND gate and OR gate combinations (see Fig. 7.5(b) for a Toffoli gate with two
control lines) and apply traditional debugging methods afterwards. But, this would
lead to several drawbacks. First, more than one multiplexor with different select
lines are needed to express whether a Toffoli gate gi can become part of an error
candidate. Thus, the value k does not represent the number of error candidates any
longer. For example, a single missing control line error may show up on two multi-
plexors and hence complicates debugging. Furthermore, only one single line of the
Toffoli gate is considered with this formulation. Thus, errors like misplaced target
lines cannot be detected.
Alternatively, n multiplexors with the same select si might be added to the de-
bugging formulation as shown in Fig. 7.5(c). However, also this lead to meaningless
results as the following lemma shows.
Lemma 7.1 Let G be an erroneous circuit. Using the traditional debugging ap-
proach with the additional logic formulation depicted in Fig. 7.5(c) and an arbitrary
set of counterexamples, for each gate gi (0 ≤ i < d) of G a satisfying solution with
si = 1 exists.
As a result, each gate will be identified as an error candidate. This lemma shows
that the existing SAT-based debugging formulation for irreversible circuits is too
general for reversible circuits. In fact, assigning si to 1 should imply that the output
values of gate gi cannot be chosen arbitrarily, but with respect to the functionality
of Toffoli gates. The two main properties of Toffoli gates are:
• At most one line (the target line) is inverted if the respective control lines are
assigned to 1 and
• all remaining lines are passed through.
A new formulation respecting these properties is given in Fig. 7.6(a). For each
output of a gate gi , a second multiplexor with a new select sib is added (0 ≤ b < n).
By restricting the sum si0 + · · · + sin−1 to 1 it is ensured that the value of at most
one line is modified, if si is set to 1. All remaining values are passing through.
Therewith, the multi-output behavior including the reversibility is reflected in the
debugging formulation.2 In the following example the application of the debugging
formulation is demonstrated.
Example 7.4 Consider the circuit realization of function 3_17 with an injected miss-
ing control error at gate g5 depicted in Fig. 7.7. The missing control error leads to
four counterexamples shown in the first four rows below the circuit in Fig. 7.7. For
k = 1, besides {g5 } the proposed debugging formulation also returns {g4 } as an error
candidate (marked by a dashed rectangle). This is, because replacing g4 with a NOT
gate at line c leads to correct output values for all counterexamples as shown in the
first four rows: The bold values will be inverted such that they match the propagated
values from the output of the circuit.
However, as in traditional debugging an error candidate always is an approxima-
tion and thus may not necessarily be the error location. In fact, g4 is not an error
location, since for the NOT gate replacement an incorrect output (with respect to
2 Note that using multiplexors obviously makes the considered circuits non-reversible. However,
the specification of 3_17) using 011 as input is computed as can be seen in the fifth
line of the figure. Nevertheless, the number of gates that have to be considered to
detect the error is reduced from 6 to 2.
Lemma 7.2 Let G be a reversible circuit with a single missing control error and
|CEX| the total number of counterexamples for this error. Then, the erroneous gate
includes c = n − 1 − log2 |CEX| control lines.
Proof Let G be a reversible circuit with a missing control error in gate gi containing
c control lines. To detect the erroneous behavior, (1) all control lines of gi have to be
assigned to 1 and (2) another line of gi (the missing control line) has to be assigned
to 0. Due to the reversibility, these values can be propagated to the inputs of the
circuit leading to |CEX| = 2n−c−1 different counterexamples in total. From this,
one can conclude
|CEX| = 2n−c−1 ,
log2 |CEX| = log2 2n−c−1 ,
log2 |CEX| = n − c − 1,
c = n − 1 − log2 |CEX|.
Remark 7.2 If the total number of counter-examples is not available, Lemma 7.2
can still be used as an upper bound. This is, because the value of CEX can only
be increased leading to a smaller value of c. Thus, all gates including more than
n − 1 − log2 |CEX| control lines don’t have to be considered during debugging
(where in this case |CEX| is the rounded up number of available counterexamples).
Exploiting Lemma 7.2, the number of gates that have to be considered can be
reduced significantly. In some cases, this reduction already leads to a single gate
and therewith to the desired error location (see experiments in Sect. 7.2.5). But, even
if additionally the SAT-based part to determine error candidates has to be invoked,
improvements can be observed, since additional logic as depicted in Fig. 7.6(b) has
to be added only to gates containing exactly c = n − 1 − log2 |CEX| control lines.
The debugging approach proposed in the last section basically uses a modified mul-
tiplexor formulation compared to debugging for irreversible circuits. However, tra-
ditional debugging approaches suffer from the problem that the obtained error can-
didates only approximate the error location(s), i.e. the verification engineer may be
pinpointed to misleading parts of the circuit which cannot be used for repair. In con-
trast, for reversible logic the debugging formulation can be extended to overcome
these limitations. As a result, the real source of an error can be calculated—the error
location.
In this section, first the limits of traditional debugging and error candidates are
discussed in more detail, followed by a more detailed description of error locations.
Then, the new debugging algorithm for computing error locations is presented.
162 7 Formal Verification and Debugging
Example 7.5 In Fig. 7.8, a circuit realization of the function alu is depicted. In this
circuit two missing control line errors have been injected: at gate g2 and at gate g3 ,
respectively. If the proposed debugging approach is applied, already for k = 1 a so-
lution (namely {g2 }) is returned. However, by exhaustive enumeration it has been
checked that no replacement for gate g2 exists such that the circuit realizes the func-
tion specification. In fact, an appropriate replacement of gate g2 only fixes the coun-
terexamples (similar to Example 7.4), while the correct behavior according to the
specification of the circuit is not preserved. Thus, this error candidate is misleading.
The example clearly demonstrates the need for strengthened error candidates.
This results in the formalization of error locations. An error location is an error
candidate where for each gate of the error candidate there is a single gate replace-
ment which not only fixes all counterexamples, but also preserves the overall spec-
ification. Having a single error location available, the real error is automatically
highlighted in the circuit and no further manual inspection is necessary. Since error
locations are strengthened error candidates, this concept guarantees to give better re-
sults than just determining error candidates. In the following, an automatic approach
for determining error locations is described.
7.2.3.2 Approach
The general idea of the debugging approach for calculating error locations is as
follows. For increasing sizes k of error candidates, it is checked whether an error
candidate is an error location or not. To determine the error candidates, first the
debugging formulation of Sect. 7.2.2 is applied. Then, a second SAT instance is
created that checks whether there are gate replacements such that the specification
7.2 Automated Debugging and Fixing 163
is fulfilled or not. In the following, first the SAT formulation for this check is de-
scribed. Afterwards, the overall algorithm (that uses this formulation) is introduced
and illustrated by means of an example.
Usually, in the debugging process a reference circuit F (used to obtain the coun-
terexamples) is available. Having this, a method is applied which is inspired by
SAT-based equivalence checking (as introduced in Sect. 7.1.3) and exact synthesis
of reversible logic (as introduced in Chap. 4). To check the existence of appropriate
gates which replace each gate of a given error candidate of size k, a miter structure
as depicted in Fig. 7.9 is built. Note that this figure illustrates the structure only for
a concrete example, i.e. a circuit with three inputs/outputs and an error candidate of
size k = 2 containing the gates gp and gq .
By applying the inputs to both, to the reference circuit F as well as to the er-
roneous circuit G, the identity for corresponding outputs must result which is en-
forced by XNOR gates. An additional AND gate, where the output is set to value 1,
constrains that both circuits must produce the same outputs for the same input as-
signment.
Then, it is allowed that each gi of the current error candidate can be of any arbi-
i
trary type. To this end, free variables tlog , ti , . . . , t1i and c1i , c2i , . . . , cn−1
i
2 n log2 n−1
are introduced (for brevity denoted by ti and ci in the following). According to the
assignment to ti and ci , the gate gi is modified. Thereby, ti is used as a binary
encoding of a natural number t i ∈ {0, . . . , n − 1} which defines the chosen target
line. In contrast, ci denotes the control lines. More precisely, assigning cli = 1 (with
1 ≤ l ≤ n − 1) means that line (t i + l) mod n becomes a control line of the Toffoli
gate gi . The same encoding has already been used in Chap. 4 for exact synthesis.
Figure 4.5 on p. 62 gives some examples for assignments to ti and ci with their
respective Toffoli gate representation.
Finally, this formulation is duplicated for each possible input of the circuit, i.e. 2n
times. The same variables ti and ci are thereby used for each duplication. In doing
so, a functional description is constructed which is satisfiable, iff there is a valid
assignment to ti and ci (i.e. iff there is a gate replacement of all gates gi ) such that
164 7 Formal Verification and Debugging
for all inputs the same input-output mapping results. Then, a fix can be extracted
from the assignments to ti and ci . If there is no such assignment (i.e. the instance is
unsatisfiable), it has been proven that the considered error candidate is not an error
location.
Note that despite this SAT formulation, it is also possible to exhaustively enu-
merate all gate combinations for an error candidate. However, using modern SAT
solvers sophisticated techniques (in particular search space pruning by means of
conflict analysis [MS99] as well as efficient implication techniques [MMZ+01]) are
exploited that significantly accelerates the solving process. In doing so, the worst
case complexity still remains exponential but as the experiments in Sect. 7.2.5 show,
error locations can be determined for many reversible circuits.
Having this SAT formulation as a basis, the overall approach to determine error
locations is summarized in Fig. 7.10. First, it is aimed to find an error location
including one gate only, i.e. error candidates with k = 1 are determined while EC
contains the current error candidate (line 5 and 6). Then, it is checked whether EC
is an error location using the SAT formulation as described above. If this is the case
(line 8), then EC is an error location and thus is returned (line 9). Otherwise, the
remaining error candidates of same size are checked. If no error location of size k
has been found, then k is increased and the respective steps are repeated (line 10).
Example 7.6 Consider again the circuit shown in Fig. 7.8 with two injected errors.
The debugging approach described in Sect. 7.2.2 first identifies EC = {g2 } as an
error candidate. However, using the SAT formulation introduced above it can be
verified that there is no gate replacement for EC that preserves the original circuit
specification. Thus, further error candidates are generated. Since this is not possible
for k = 1, k is increased leading to EC = {g2 , g3 }. Because an appropriate gate
replacement can be found for this candidate, EC is the desired error location.
Note that sometimes more than one error location is possible. As an example,
consider the circuit given in Fig. 7.11. Here, a single missing control error has been
injected at gate g1 . Nevertheless, in total there are two repairs for the erroneous
circuit: g0 and g1 , respectively. Thus, if the designer wants to know if more than a
single error location exists, the algorithm in Fig. 7.10 must not terminate at line 9
but iterated as long as the desired number of checks are performed.
7.2 Automated Debugging and Fixing 165
So far, the goal was to determine error candidates or locations that explain the erro-
neous behavior in a circuit, respectively. However, the reversibility of the considered
circuits additionally allows an easy computation of fixes (even easier than creating
a fixing formulation and solving the corresponding SAT instance as described in
the previous section). Therefore, a single gate must be replaced by a fixing cascade
which—due to reversibility—can be computed in time linear in the size of the cir-
cuit. More precisely, fixes can be automatically generated by applying the following
lemma.
G−1 −1
= G−1 −1
fix
1 G1 Gi G2 G2 1 F G2 ⇔
Gi = G−1 −1
fix
1 F G2 .
fix
Thus, replacing gi with Gi fixes the erroneous circuit G.
At a first glance, applying this lemma for fixing the erroneous circuit G, i.e. re-
fix
placing some gi by Gi (which includes the circuit F ), leads to a larger circuit than
fix
F itself. However, in many cases Gi can be reduced to a few gates. In particular,
fix
if the chosen gate gi is a location of a single error, Gi can be simplified to a single
166 7 Formal Verification and Debugging
fix
Table 7.3 Sizes of Gi G ATE |Gi |
fix fix
Simplified |Gi |
g0 13 6
g1 13 3
g2 13 1
g3 13 3
g4 13 3
gate. As a consequence, Lemma 7.3 can also be applied to determine error locations
of single errors.
The application of Lemma 7.3 is illustrated by the following example.
Example 7.7 Consider the circuits F and G for function ham3 as depicted in
Figs. 7.12(a) and (b), respectively. While F realizes the desired function, G is an er-
roneous optimization. Applying Lemma 7.3 for each gate gi gives the results shown
in Table 7.3. The first column gives the considered gate. In the second column the
fix
number of gates of Gi after applying the lemma is shown, and the last column pro-
fix
vides the same information after simplification of Gi . As can be seen, nearly all
fixes can be significantly reduced. For gate g2 even a reduction to a single gate can
be observed.
The proposed methods have been implemented in C++ and were evaluated on a set
of reversible circuits taken from RevLib [WGT+08]. In this section, the results of
the experimental studies are presented. First, the behavior of the various debugging
methods applied to single errors is discussed, followed by a consideration for mul-
tiple errors. Afterwards, the results of the automatic fixing approach are presented.
For all benchmark circuits, single and multiple errors have been randomly in-
jected to circuits taken from [WGT+08]. More precisely, a gate has been replaced
with another gate (leading to a wrong gate error) or a control line has been re-
moved (leading to a missing control error), respectively. Counterexamples showing
the errors were generated using the SAT-based equivalence checker introduced in
Sect. 7.1.3.
For solving the respective instances, the SAT solver MiniSAT [ES04] has been
used. The documented run-times include the times for instance generation and solv-
ing. All experiments have been carried out on an AMD Athlon 3500+ with 1 GB of
main memory. The timeout was set to 5000 CPU seconds.
In a first series of experiments, the debugging approaches are considered for deter-
mining single error candidates. For debugging wrong gate errors, the approach pro-
posed in Sect. 7.2.2 has been used (denoted by D BG EC). For missing control errors
additionally the improvements are applicable, namely the consideration of target
lines only (denoted by TARGET L INES ONLY) and the application of Lemma 7.2
(denoted by |CEX|-based Reduction).
The results are summarized in Table 7.4. Column C IRCUIT gives the circuit
name. Column d, column n, and column |CEX| give the number of gates, the num-
ber of lines, and the number of counterexamples (the number within the brackets de-
notes the number of counterexamples for which the circuit has been duplicated), re-
spectively. Furthermore, for each approach the number of obtained error candidates
(denoted by C AND .) and the overall run-time in CPU seconds (denoted by T IME)
are provided. Column C AND . for |CEX|-BASED R EDUCTION includes two values.
The first denotes the remaining gates after applying Lemma 7.2, the second gives
the final number of error candidates after running the SAT-based debugging ap-
proach. Finally, column R ED . lists the best reduction obtained by the approaches,
i.e. the percentage of gates that are identified as non-relevant (meaning the error is
not located at these gates).
As shown in the table, a significant amount of gates can be automatically iden-
tified as non-relevant for debugging the error. Reductions of at least two third—for
larger circuits of more than 90%—are achieved. As an example, for the wrong gate
error in circuit hwb9 with 1544 gates, two error candidates are obtained in less
than 100 CPU seconds. The quality of the resulting set of error candidates often
depends on the used strategy. For example, to identify the missing control error
168 7 Formal Verification and Debugging
in circuit hwb7 still 250 (out of 289) error candidates have to be considered af-
ter applying the D BG EC approach. Here, restricting the error model and using the
improvements not only leads to a speed-up, but also to a smaller number of error
candidates. This is also effective for other circuits (e.g. hwb4 and urf1). Here, just
by applying Lemma 7.2 the set of error candidates is reduced to the single erroneous
gate and no further SAT call is required.
For determining error locations (instead of error candidates), under the single
error assumption two approaches can be used: The SAT-based formulation intro-
duced in Sect. 7.2.3 (using the extended miter formulation) and the approach based
fix
on Lemma 7.3 (i.e. generate Gi for an error candidate determined by the debug-
ging approach a nd try to simplify it to a single gate). Table 7.5 summarizes the
results of both approaches where the former one is denoted by D BG EL and the lat-
ter one by D BG EC+F IX. Again, it is distinguished between wrong gate errors and
missing control errors (where |CEX|-BASED R EDUCTION can be applied). Columns
C IRCUIT, d, n, and C AND . denote the name, the number of gates, the number of
lines, and the number of obtained error candidates of each benchmark, respectively.
The documented overall run-times of the respective approaches are given in col-
umn T IME.3
As can be clearly seen, error locations for single errors can be determined
even for circuits consisting of more than 25.000 gates. Applying Lemma 7.3
(i.e. D BG EC+F IX) is thereby more efficient than using the miter-formulation
(i.e. D BG EL). In contrast, the D BG EL-approach is also applicable to multiple errors
which is considered in the next section.
If multiple errors occur, error candidates may be misleading. This has been observed
in a second series of experiments, where circuits including multiple errors have been
applied to the method for error candidate determination described in Sect. 7.2.2 (de-
noted by D BG EC) and the method for error location determination described in
Sect. 7.2.3 (denoted by D BG EL), respectively. Results for multiple missing con-
trol errors injected to a set of circuits are given in Table 7.6. Results are thereby
presented for the G ENERAL C ASE (without an error assumption) and for the case,
where a control line error is assumed so that TARGET LINES ONLY can be consid-
ered. The denotation of the respective columns is analog to the ones in the previous
tables.
First of all, it can be seen that determining error candidates only (D BG EC), in fact
is misleading for many cases. As examples, for the benchmarks 3_17, ham7, hwb5,
3_17-3, hwb5-3, and hwb7-3 error candidates with lower cardinality k than for the
approved error location result. Thus, replacing these gates would fix the counterex-
amples but does not preserve the correct behavior according to the specification of
3 Note that in both cases this also includes the debugging run-time needed to obtain the error can-
didates.
170 7 Formal Verification and Debugging
Table 7.6 Determining error locations and error candidates for multiple errors
C IRCUIT D BG EL D BG EC
G ENERAL C ASE TARGET TARGET
LINES ONLY LINES ONLY
d n |CEX| P OS . k T IME P OS . k T IME C AND . k T IME
2 injected errors
3_17 6 3 4(2) 1 2 0.00 1 2 0.00 2 1 <0.01
4_49 16 4 10(4) 1 2 0.03 1 2 0.00 3 2 <0.01
4gt4 6 5 8(4) 1 2 0.04 1 2 0.00 3 2 <0.01
ham3 5 3 4(2) 1 2 0.00 1 2 0.00 1 2 <0.01
ham7 23 7 30(12) 1 2 0.68 1 2 0.08 4 1 <0.01
hwb4 17 4 12(5) 1 2 0.04 1 2 <0.01 2 2 <0.01
hwb5 55 5 3(2) 1 2 0.52 1 2 0.04 2 1 <0.01
hwb6 126 6 22(9) 1 2 27.77 1 2 0.37 2 2 0.29
hwb7 289 7 80(32) 1 2 1204.28 1 2 5.49 13 2 0.87
hwb8 637 8 42(17) – – >5000.00 1 2 59.18 4 2 45.39
hwb9 1544 9 36(15) – – >5000.00 1 2 348.51 2 2 332.51
urf1 1517 9 30(12) – – >5000.00 – – >5000.00 4 2 1007.09
urf3 2674 10 141(57) – – >5000.00 – – >5000.00 4 2 2932.35
3 injected errors
3_17-3 6 3 4(2) 1 3 0.00 1 3 <0.01 2 1 <0.01
4_49-3 16 4 10(4) 1 3 1147 1 3 0.01 79 3 <0.01
4gt4-3 6 5 18(8) 1 3 0.64 1 3 0.00 6 3 <0.01
ham7-3 23 7 46(19) 1 3 8.94 1 3 0.09 4 3 <0.01
hwb4-3 17 4 12(5) 1 3 0.48 1 3 0.05 65 3 <0.01
hwb5-3 55 5 24(10) 1 3 4241.11 1 3 1.03 2 2 0.18
hwb6-3 126 6 41(17) – – >5000.00 1 3 7.38 4 3 6.82
hwb7-3 289 7 94(38) – – >5000.00 1 3 311.04 7 2 1.10
hwb8-3 637 8 97(39) – – >5000.00 1 3 2575.31 12 3 2606.05
the circuit (as also discussed in Example 7.5). These examples confirm the need for
determining error locations.
However, as also shown in Table 7.6, determining error locations is expensive.
For some benchmarks (urf1 and urf3) it was not possible to obtain the error locations
within the given timeout. For other benchmarks (hwb8, hwb9, hwb6-3, hwb7-3, and
hwb8-3), this was only possible if the TARGET LINES ONLY improvement has ad-
ditionally been applied. Nevertheless, for all other benchmarks error locations can
be determined. In comparison to traditional debugging (where only approaches that
determine error candidates exist), this is a significantly stronger result.
172 7 Formal Verification and Debugging
Single errors
4_49 11 16 4 1 24 0.01
4gt4 6 17 5 1 3 0.01
hwb5 23 55 5 1 75 0.01
hwb6 41 126 6 1 157 0.02
hwb7 235 331 7 1 384 0.05
hwb8 613 749 8 1 678 0.14
hwb9 1543 1959 9 1 2645 0.84
urf1 1516 11554 9 1 2229 0.95
urf2 3249 5030 8 1 1176 0.41
Double errors
4_49 11 16 4 3 22 0.01
4gt4 5 17 5 16 22 0.01
hwb5 23 55 5 3 77 0.01
hwb6 41 126 6 24 190 0.02
hwb7 235 331 7 69 431 0.05
hwb8 613 749 8 3 1202 0.21
hwb9 1543 1959 9 489 2689 0.83
urf1 1516 11554 9 361 2655 1.03
urf2 3249 5030 8 904 1251 0.40
Traditional technologies more and more start to suffer from the increasing miniatur-
ization and the exponential growth of the number of transistors in integrated circuits.
To face the upcoming challenges, alternatives are needed. Reversible logic provides
such an alternative that may replace or at least enhance traditional computer chips.
In the area of quantum computation and low-power design, first very promising re-
sults have been obtained already today. Nevertheless, research in reversible logic is
still at the beginning. No continuous design flow exists so far. Instead, approaches
only for individual steps (e.g. for synthesis) have been proposed. But, most of these
methods are applicable to very small functions or circuits, respectively. This is not
sufficient to design complex reversible systems.
In this book, first steps towards a design flow for reversible logic have been pro-
posed. That is, methods ranging from synthesis, embedding, optimization, verifica-
tion, and debugging have been introduced and experimentally evaluated. By BDD-
based synthesis, it is possible to synthesize functions with more than 100 variables.
More complex reversible systems can be realized using the SyReC language. There-
fore, also techniques for exact synthesis as well as for embedding have been utilized
to determine the respective building blocks. Three optimization approaches have
been proposed that lead to more compact circuits with respect to different criteria
(i.e. quantum cost, transistor cost, number of lines, or nearest neighbor cost) and
therewith with respect to different technologies. To prove that e.g. the optimization
was correct, equivalence checking with the help of decision diagrams or Boolean
satisfiability, respectively, has been introduced. In case of a failed verification, ap-
proaches have been proposed that help the designer to find the error or to fix the
circuit. Altogether, using the approaches introduced in this book, complex functions
can be synthesized and circuits with thousands of gates can be optimized, verified,
and debugged—everything in a very efficient automated way.
Combining the respective approaches, a first design flow results that already
can handle functions and circuits of notable size. The uniform RevLib-format
for reversible functions and circuits (see www.revlib.org) thereby builds the ba-
sis to link the respective steps together. The resulting tools can be obtained under
www.revkit.org. So, designer of reversible circuits have a first continuous and con-
sistent flow to create their circuits.
R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 175
DOI 10.1007/978-90-481-9579-4_8, © Springer Science+Business Media B.V. 2010
176 8 Summary and Conclusions
Besides that, the methods proposed in this book build the basis for further exten-
sions towards a design flow that covers more elaborated designs needs. In particular,
extensions “on the top” and “on the bottom” of the flow are promising.
More precisely, synthesis of reversible logic should reach the system level.
Therefore, it is vital to have appropriate hardware description languages as well
as corresponding synthesis approaches. Only then, design of complex reversible cir-
cuits is possible. The SyReC language proposed in Sect. 3.3 is a first promising step
in this direction. Followed by this, also new verification issues will emerge. In par-
ticular for complex circuits specified using e.g. hardware description languages, it
often cannot be ensured that the design was implemented as intended. Thus, devel-
oping methods for property checking is a promising next step.
Furthermore, questions related to test of reversible circuits will emerge in future.
Already today, first models and approaches in this area exist (see e.g. [PHM04,
PBL05, PFBH05]). But due to the absence of large physical realizations, it is hard to
evaluate the suitability of them. Additionally, existing approaches cover only some
possible technologies. With ongoing progress in the development of further (and
larger) physical quantum computing or reversible CMOS realizations, new models
and approaches are needed to efficiently test them. Then, at the latest, also the design
flow for reversible logic needs a comprehensive consideration of testing issues.
Besides this “global view” on upcoming challenges in this domain, further ideas
for future work already have been discussed in the respective chapters. Overall,
the development of an elaborated design flow that is comparable to the one for
traditional circuit design (that has been developed in the last 25 years) will take
further years of research. In this context, the contributions in this book provide a
good starting point.
References
[CDKM05] S.A. Cuccaro, T.G. Draper, S.A. Kutin, D.P. Moulton, A new quantum ripple-carry
addition circuit, in Workshop on Quantum Information Processing (2005)
[CGP99] E.M. Clarke, O. Grumberg, D. Peled, Model Checking (MIT Press, Cambridge,
1999)
[Coo71] S.A. Cook, The complexity of theorem proving procedures, in Symposium on The-
ory of Computing (1971), pp. 151–158
[CS07] A. Chakrabarti, S. SurKolay, Nearest neighbour based synthesis of quantum
boolean circuits. Eng. Lett. 15, 356–361 (2007)
[DB98] R. Drechsler, B. Becker, Binary Decision Diagrams—Theory and Implementation
(Kluwer Academic, Dordrecht, 1998)
[DBW+07] S. Deng, J. Bian, W. Wu, X. Yang, Y. Zhao, EHSAT: An efficient RTL satisfiability
solver using an extended DPLL procedure, in Design Automation Conf. (2007),
pp. 588–593
[DDG00] R. Drechsler, N. Drechsler, W. Günther, Fast exact minimization of BDDs. IEEE
Trans. CAD 19(3), 384–389 (2000)
[DEF+08] R. Drechsler, S. Eggersglüß, G. Fey, A. Glowatz, F. Hapke, J. Schloeffel, D. Tille,
On acceleration of SAT-based ATPG for industrial designs. IEEE Trans. CAD 27,
1329–1333 (2008)
[DLL62] M. Davis, G. Logeman, D. Loveland, A machine program for theorem proving.
Commun. ACM 5, 394–397 (1962)
[DM06a] B. Dutertre, L. Moura, A fast linear-arithmetic solver for DPLL(T), in Computer
Aided Verification. LNCS, vol. 4114 (Springer, Berlin, 2006), pp. 81–94
[DM06b] B. Dutertre, L. Moura, The YICES SMT solver (2006). Available at http://
yices.csl.sri.com/
[DP60] M. Davis, H. Putnam, A computing procedure for quantification theory. J. ACM 7,
506–521 (1960)
[Dre04] R. Drechsler, Advanced Formal Verification (Kluwer Academic, Dordrecht, 2004)
[DS07] S. Disch, C. Scholl, Combinational equivalence checking using incremental SAT
solving, output ordering, and resets, in ASP Design Automation Conf. (2007),
pp. 938–943
[DV02] B. Desoete, A. De Vos, A reversible carry-look-ahead adder using control gates.
Integr. VLSI J. 33(1–2), 89–104 (2002)
[EFD05] R. Ebendt, G. Fey, R. Drechsler, Advanced BDD Optimization (Springer, Berlin,
2005)
[ES04] N. Eén, N. Sörensson, An extensible SAT solver, in SAT 2003. LNCS, vol. 2919
(Springer, Berlin, 2004), pp. 502–518
[FDH04] A.G. Fowler, S.J. Devitt, L.C.L. Hollenberg, Implementation of Shor’s algorithm on
a linear nearest neighbour qubit array. Quantum Inf. Comput. 4, 237–245 (2004)
[FSVD06] G. Fey, S. Safarpour, A. Veneris, R. Drechsler, On the relation between simulation-
based and SAT-based diagnosis, in Design, Automation and Test in Europe (2006),
pp. 1139–1144
[FT82] E.F. Fredkin, T. Toffoli, Conservative logic. Int. J. Theor. Phys. 21(3/4), 219–253
(1982)
[FWD10] S. Frehse, R. Wille, R. Drechsler, Efficient simulation-based debugging of re-
versible logic, in Int’l Symp. on Multi-Valued Logic (2010), pp. 156–161
[GAJ06] P. Gupta, A. Agrawal, N.K. Jha, An algorithm for synthesis of reversible logic cir-
cuits. IEEE Trans. CAD 25(11), 2317–2330 (2006)
[GCDD07] D. Große, X. Chen, G.W. Dueck, R. Drechsler, Exact SAT-based Toffoli network
synthesis, in ACM Great Lakes Symposium on VLSI (2007), pp. 96–101
[GD07] V. Ganesh, D.L. Dill, A decision procedure for bit-vectors and arrays, in Computer
Aided Verification (2007), pp. 519–531
[GLMS02] T. Grötker, S. Liao, G. Martin, S. Swan, System Design with SystemC (Kluwer
Academic, Dordrecht, 2002)
[GN02] E. Goldberg, Y. Novikov, BerkMin: a fast and robust SAT-solver, in Design, Au-
tomation and Test in Europe (2002), pp. 142–149
References 179
[GNP08] S. Gay, R. Nagarajan, N. Papanikolaou, QMC: A model checker for quantum sys-
tems, in Computer Aided Verification (2008), pp. 543–547
[GWDD08] D. Große, R. Wille, G.W. Dueck, R. Drechsler, Exact synthesis of elementary quan-
tum gate circuits for reversible functions with don’t cares, in Int’l Symp. on Multi-
Valued Logic (2008), pp. 214–219
[GWDD09a] D. Große, R. Wille, G.W. Dueck, R. Drechsler, Exact multiple control Toffoli net-
work synthesis with SAT techniques. IEEE Trans. CAD 28(5), 703–715 (2009)
[GWDD09b] D. Große, R. Wille, G.W. Dueck, R. Drechsler, Exact synthesis of elementary quan-
tum gate circuits. J. Mult.-Valued Log. Soft Comput. 15(4), 270–275 (2009)
[HK00] D.W. Hoffmann, T. Kropf, Efficient design error correction of digital circuits, in
Int’l Conf. on Comp. Design (2000), pp. 465–472
[HL05] F.S. Hillier, G.J. Lieberman, Introduction to Operations Research (McGraw-Hill,
New York, 2005)
[HSY+06] W.N.N. Hung, X. Song, G. Yang, J. Yang, M. Perkowski, Optimal synthesis of
multiple output Boolean functions using a set of quantum gates by symbolic reach-
ability analysis. IEEE Trans. CAD 25(9), 1652–1663 (2006)
[IKY02] K. Iwama, Y. Kambayashi, S. Yamashita, Transformation rules for designing
CNOT-based quantum circuits, in Design Automation Conf. (2002), pp. 419–424
[JFWD10] J.C. Jung, S. Frehse, R. Wille, R. Drechsler, Enhancing debugging of multiple miss-
ing control errors in reversible logic, in Great Lakes Symp. VLSI (2010)
[JSWD09] J.C. Jung, A. Sülflow, R. Wille, R. Drechsler, SWORD v1.0, Satisfiability Modulo
Theories Competition (2009)
[Ker04] P. Kerntopf, A new heuristic algorithm for reversible logic synthesis, in Design
Automation Conf. (2004), pp. 834–837
[Kha08] M.H.A. Khan, Cost reduction in nearest neighbour based synthesis of quantum
boolean circuits. Eng. Lett. 16, 1–5 (2008)
[Kro99] T. Kropf, Introduction to Formal Hardware Verification (Springer, Berlin, 1999)
[Kut06] S.A. Kutin, Shor’s algorithm on a nearest-neighbor machine, in Asian Conference
on Quantum Information Science (2006). arXiv:quant-ph/0609001v1
[Lan61] R. Landauer, Irreversibility and heat generation in the computing process. IBM J.
Res. Dev. 5, 183 (1961)
[Lar92] T. Larrabee, Test pattern generation using Boolean satisfiability. IEEE Trans. CAD
11, 4–15 (1992)
[LCC+95] C.-C. Lin, K.-C. Chen, S.-C. Chang, M. Marek-Sadowska, K.-T. Cheng, Logic syn-
thesis for engineering change, in Design Automation Conf. (1995), pp. 647–651
[LL92] H. Liaw, C. Lin, On the OBDD-representation of general Boolean functions. IEEE
Trans. Comput. 41, 661–664 (1992)
[LSU89] R. Lipsett, C. Schaefer, C. Ussery, VHDL: Hardware Description and Design
(Kluwer Academic, Dordrecht, 1989)
[Mar99] J.P. Marques-Silva, The impact of branching heuristics in propositional satisfiability
algorithms, in 9th Portuguese Conference on Artificial Intelligence (EPIA) (1999)
[Mas07] D. Maslov, Linear depth stabilizer and quantum fourier transformation circuits with
no auxiliary qubits in finite neighbor quantum architectures. Phys. Rev. 76, 052310
(2007)
[McM02] K.L. McMillan, Applying SAT methods in unbounded symbolic model checking,
in Computer Aided Verification (2002), pp. 250–264
[MD04a] D. Maslov, G.W. Dueck, Improved quantum cost for n-bit Toffoli gates. IEE Elec-
tron. Lett. 39, 1790 (2004)
[MD04b] D. Maslov, G.W. Dueck, Reversible cascades with minimal garbage. IEEE Trans.
CAD 23(11), 1497–1509 (2004)
[MDM05] D. Maslov, G.W. Dueck, D.M. Miller, Toffoli network synthesis with templates.
IEEE Trans. CAD 24(6), 807–817 (2005)
[MDM07] D. Maslov, G.W. Dueck, D.M. Miller, Techniques for the synthesis of reversible
Toffoli networks. ACM Trans. Des. Autom. Electron. Syst. 12(4), 42 (2007)
180 References
[MDW09] D.M. Miller, G.W. Dueck, R. Wille, Synthesising reversible circuits from irre-
versible specifications using Reed-Muller spectral techniques, in Int’l Workshop on
Applications of the Reed-Muller Expansion in Circuit Design (2009), pp. 87–96
[Mer93] R.C. Merkle, Reversible electronic logic using switches. Nanotechnology 4, 21–40
(1993)
[Mer07] N.D. Mermin, Quantum Computer Science: An Introduction (Cambridge University
Press, Cambridge, 2007)
[MK04] M.M. Mano, C.R. Kime, Logic and Computer Design Fundamentals (Pearson Ed-
ucation, Upper Saddle River, 2004)
[ML01] J.P. McGregor, R.B. Lee, Architectural enhancements for fast subword permuta-
tions with repetitions in cryptographic applications, in Int’l Conf. on Comp. Design
(2001), pp. 453–461
[MMD03] D.M. Miller, D. Maslov, G.W. Dueck, A transformation based algorithm for re-
versible logic synthesis, in Design Automation Conf. (2003), pp. 318–323
[MMZ+01] M.W. Moskewicz, C.F. Madigan, Y. Zhao, L. Zhang, S. Malik, Chaff: Engineering
an efficient SAT solver, in Design Automation Conf. (2001), pp. 530–535
[Moo65] G.E. Moore, Cramming more components onto integrated circuits. Electronics
38(8), 114–117 (1965)
[MS99] J.P. Marques-Silva, K.A. Sakallah, GRASP: A search algorithm for propositional
satisfiability. IEEE Trans. Comput. 48(5), 506–521 (1999)
[MT06] D.M. Miller, M.A. Thornton, QMDD: A decision diagram structure for reversible
and quantum circuits, in Int’l Symp. on Multi-Valued Logic (2006), p. 6
[MT08] D.M. Miller, M.A. Thornton, Multiple-Valued Logic: Concepts and Representa-
tions (Morgan and Claypool, San Rafael, 2008)
[MWD09] D.M. Miller, R. Wille, G. Dueck, Synthesizing reversible circuits for irreversible
functions, in EUROMICRO Symp. on Digital System Design (2009), pp. 749–756
[MWD10] D.M. Miller, R. Wille, R. Drechsler, Reducing reversible circuit cost by adding
lines, in Int’l Symp. on Multi-Valued Logic (2010)
[MYDM05] D. Maslov, C. Young, G.W. Dueck, D.M. Miller, Quantum circuit simplification
using templates, in Design, Automation and Test in Europe (2005), pp. 1208–1213
[NC00] M. Nielsen, I. Chuang, Quantum Computation and Quantum Information (Cam-
bridge Univ. Press, Cambridge, 2000)
[OWDD10] S. Offermann, R. Wille, G.W. Dueck, R. Drechsler, Synthesizing multiplier in re-
versible logic, in IEEE Symp. on Design and Diagnostics of Electronic Circuits and
Systems (2010), pp. 335–340
[Pap93] C.H. Papadimitriou, Computational Complexity (Addison Wesley, Reading, 1993)
[PBG05] M.R. Prasad, A. Biere, A. Gupta, A survey of recent advances in SAT-based formal
verification. Softw. Tools Technol. Transf. 7(2), 156–173 (2005)
[PBL05] M. Perkowski, J. Biamonte, M. Lukac, Test generation and fault localization for
quantum circuits, in Int’l Symp. on Multi-Valued Logic (2005), pp. 62–68
[Per85] A. Peres, Reversible logic and quantum computers. Phys. Rev. A 32, 3266–3276
(1985)
[PFBH05] I. Polian, T. Fiehn, B. Becker, J.P. Hayes, A family of logical fault models for
reversible circuits, in Asian Test Symp. (2005), pp. 422–427
[PHM04] K.N. Patel, J.P. Hayes, I.L. Markov, Fault testing for reversible circuits. IEEE Trans.
CAD 23(8), 1220–1230 (2004)
[PHW06] A. De Pierro, C. Hankin, H. Wiklicky, Reversible combinatory logic. Math. Struct.
Comput. Sci. 16(4), 621–637 (2006)
[Pit99] A.O. Pittenger, An Introduction to Quantum Computing Algorithms (Birkhauser,
Basel, 1999)
[PMH08] K. Patel, I. Markov, J. Hayes, Optimal synthesis of linear reversible circuits. Quan-
tum Inf. Comput. 8(3–4), 282–294 (2008)
[RO08] M. Ross, M. Oskin, Quantum computing. Commun. ACM 51(7), 12–13 (2008)
[Rud93] R. Rudell, Dynamic variable ordering for ordered binary decision diagrams, in Int’l
Conf. on CAD (1993), pp. 42–47
References 181
[SD96] J.A. Smolin, D.P. DiVincenzo, Five two-bit quantum gates are sufficient to imple-
ment the quantum fredkin gate. Phys. Rev. A 53(4), 2855–2856 (1996)
[SDF04] S. Sutherland, S. Davidmann, P. Flake, System Verilog for Design and Modeling
(Kluwer Academic, Dordrecht, 2004)
[SFBD08] A. Sülflow, G. Fey, R. Bloem, R. Drechsler, Using unsatisfiable cores to debug
multiple design errors, in Great Lakes Symp. VLSI (2008), pp. 77–82
[Sha38] C.E. Shannon, A symbolic analysis of relay and switching circuits. Trans. AIEE 57,
713–723 (1938)
[Sho94] P.W. Shor, Algorithms for quantum computation: discrete logarithms and factoring,
in Foundations of Computer Science (1994), pp. 124–134
[SL00] Z. Shi, R.B. Lee, Bit permutation instructions for accelerating software cryptogra-
phy, in Int’l Conf. on Application-Specific Systems, Architectures, and Processors
(2000), pp. 138–148
[Som01] F. Somenzi, CUDD: Cu Decision Diagram Package Release 2.3.1 (University of
Colorado at Boulder, Boulder, 2001)
[SPMH03] V.V. Shende, A.K. Prasad, I.L. Markov, J.P. Hayes, Synthesis of reversible logic
circuits. IEEE Trans. CAD 22(6), 710–722 (2003)
[SSL+92] E. Sentovich, K. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj,
P. Stephan, R. Brayton, A. Sangiovanni-Vincentelli, SIS: A system for sequential
circuit synthesis. Technical Report, University of Berkeley (1992)
[SVAV05] A. Smith, A.G. Veneris, M.F. Ali, A. Viglas, Fault diagnosis and logic debugging
using boolean satisfiability. IEEE Trans. CAD 24(10), 1606–1621 (2005)
[TG08] M.K. Thomson, R. Glück, Optimized reversible binary-coded decimal adders.
J. Syst. Archit. 54, 697–706 (2008)
[TK05] Y. Takahashi, N. Kunihiro, A linear-size quantum circuit for addition with no ancil-
lary qubits. Quantum Inf. Comput. 5, 440–448 (2005)
[Tof80] T. Toffoli, Reversible computing, in Automata, Languages and Programming, ed.
by W. de Bakker, J. van Leeuwen (Springer, Berlin, 1980), p. 632. Technical Memo
MIT/LCS/TM-151, MIT Lab. for Comput. Sci.
[TS05] H. Thapliyal, M.B. Srinivas, The need of DNA computing: reversible designs of
adders and multipliers using fredkin gate, in SPIE (2005)
[Tse68] G. Tseitin, On the complexity of derivation in propositional calculus, in Studies
in Constructive Mathematics and Mathematical Logic, Part 2 (Nauka, Leningrad,
1968), pp. 115–125. (Reprinted in: J. Siekmann, G. Wrightson (eds.), Automation
of Reasoning, vol. 2, Springer, Berlin, 1983, pp. 466–483)
[VH99] A. Veneris, I.N. Hajj, Design error diagnosis and correction via test vector simula-
tion. IEEE Trans. CAD 18(12), 1803–1816 (1999)
[VMH07] G.F. Viamontes, I.L. Markov, J.P. Hayes, Checking equivalence of quantum circuits
and states, in Int’l Conf. on CAD (2007), pp. 69–74
[VSB+01] L.M.K. Vandersypen, M. Steffen, G. Breyta, C.S. Yannoni, M.H. Sherwood,
I.L. Chuang, Experimental realization of Shor’s quantum factoring algorithm us-
ing nuclear magnetic resonance. Nature 414, 883 (2001)
[WD09] R. Wille, R. Drechsler, BDD-based synthesis of reversible logic for large functions,
in Design Automation Conf. (2009), pp. 270–275
[WD10] R. Wille, R. Drechsler, Effect of BDD optimization on synthesis of reversible and
quantum logic. Electron. Notes Theor. Comput. Sci. 253(6), 57–70 (2010). Pro-
ceedings of the Workshop on Reversible Computation (RC 2009)
[Weg00] I. Wegener, Branching Programs and Binary Decision Diagrams: Theory and Ap-
plications (Society for Industrial and Applied Mathematics, Philadelphia, 2000)
[WFG+07] R. Wille, G. Fey, D. Große, S. Eggersglüß, R. Drechsler, SWORD: A SAT like
prover using word level information, in VLSI of System-on-Chip (2007), pp. 88–93
[WFG+09] R. Wille, G. Fey, D. Große, S. Eggersglüß, R. Drechsler, SWORD: A SAT like
prover using word level information, in VLSI-SoC: Advanced Topics on Systems
on a Chip: A Selection of Extended Versions of the Best Papers of the Fourteenth
182 References
A Decomposition
ALU, 48 NNC-optimal (exact), 134
Arithmetic Logic Unit, see ALU NNC-optimal (improved), 134
NNC-optimal (naive), 133
B quantum, 16
BDD, 17 Shannon, 17
Binary Decision Diagram, see BDD Double gates, 16
Bit-vector logic, 23
Boolean function, 7 E
multi-output, 8 Embedding, 28, 93
reversible, 8 Equivalence checking, 145
single-output, 7 QMDD-based, 145
Boolean satisfiability, 21 SAT-based, 148
Error candidate, 155
C Error location, 162
Circuit Error models, 155
quantum, 14 Exact synthesis, 57
reversible, 10 QBF-based, 81
traditional, 9 SAT-based, 58, 61
SMT-based, 77
Circuit composition, 124
SWORD-based, 79
Circuit cost, 12
CNF, 21 F
CNOT gate, 10 Factoring reversible circuits, 115
Complement edge, 18, 36 Fixing, 165
Conjunctive Normal Form, see CNF Fredkin gate, 10
Constant input, 9
Control line, 10 G
Cost, 12 Garbage output, 9, 29
Counterexample, 145 Gate
CNOT, 10, 14
D double, 16
Debugging, 155 Fredkin, 10
reversible, 157 NOT, 10, 14
traditional, 156 Peres, 10