You are on page 1of 192

Towards a Design Flow for Reversible Logic

Robert Wille  Rolf Drechsler

Towards a
Design Flow for
Reversible Logic
Robert Wille Rolf Drechsler
Institute of Computer Science Institute of Computer Science
University of Bremen University of Bremen
Bibliothekstr. 1 Bibliothekstr. 1
28359 Bremen 28359 Bremen
Germany Germany
rwille@informatik.uni-bremen.de drechsle@informatik.uni-bremen.de

ISBN 978-90-481-9578-7 e-ISBN 978-90-481-9579-4


DOI 10.1007/978-90-481-9579-4
Springer Dordrecht Heidelberg London New York

Library of Congress Control Number: 2010932404

© Springer Science+Business Media B.V. 2010


No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by
any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written
permission from the Publisher, with the exception of any material supplied specifically for the purpose
of being entered and executed on a computer system, for exclusive use by the purchaser of the work.

Cover design: eStudio Calamar S.L.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)


Preface

The development of computing machines found great success in the last decades.
But the ongoing miniaturization of integrated circuits will reach its limits in the
near future. Shrinking transistor sizes and power dissipation are the major barriers
in the development of smaller and more powerful circuits. Reversible logic pro-
vides an alternative that may overcome many of these problems in the future. For
low-power design, reversible logic offers significant advantages since zero power
dissipation will only be possible if computation is reversible. Furthermore, quantum
computation profits from enhancements in this area, because every quantum circuit
is inherently reversible and thus requires reversible descriptions.
However, since reversible logic is subject to certain restrictions (e.g. fanout and
feedback are not directly allowed), the design of reversible circuits significantly
differs from the design of traditional circuits. Nearly all steps in the design flow
(like synthesis, verification, or debugging) must be redeveloped so that they become
applicable to reversible circuits as well. But research in reversible logic is still at the
beginning. No continuous design flow exists so far.
In this book, contributions to a design flow for reversible logic are presented. This
includes advanced methods for synthesis, optimization, verification, and debugging.
Formal methods like Boolean satisfiability and decision diagrams are thereby ex-
ploited. By combining the techniques proposed in the book, it is possible to synthe-
size reversible circuits representing large functions. Optimization approaches en-
sure that the resulting circuits are of small cost. Finally, a method for equivalence
checking and automatic debugging allows to verify the obtained results and helps
to accelerate the search for bugs in case of errors in the design. Combining the
respective approaches, a first design flow for reversible circuits of significant size
results.
This book addresses computer scientists and computer architects and does not
require previous knowledge about the physics of reversible logic or quantum com-
putation. The respective concepts as well as the used models are briefly introduced.

v
vi Preface

All approaches are described in a self-contained manner. The content of the book
does not only conveys a coherent overview about current research results, but also
builds the basis for future work on a design flow for reversible logic.

Bremen Robert Wille


Rolf Drechsler
Acknowledgements

This book is the result of more than three years of intensive research in the area
of reversible logic. During this time, we experienced many support from different
people for which we would like to thank them very much.
In particular, the Group of Computer Architecture at the University of Bremen
earns many thanks for providing a comfortable and inspirational environment. Many
thanks go to Stefan Frehse, Daniel Große, Lisa Jungmann, Hoang M. Le, Sebastian
Offermann, and Mathias Soeken who actively helped in the development of the
approaches described in this book.
Sincere thanks also go to Prof. D. Michael Miller from the University of Victo-
ria, Prof. Gerhard W. Dueck from the University of New Brunswick, and Dr. Mehdi
Saeedi from the Amirkabir University of Technology in Tehran for very fruitful col-
laborations. In this context, we would like to thank the German Academic Exchange
Service (DAAD) which enabled the close contact with the groups in Canada.
Special thanks go to the German Research Foundation (DFG) which funded parts
of this work under the contract number DR 287/20-1.
Finally, we would like to thank Marc Messing who did a great job of proofreading
as well as Christiane and Shawn Mitchell who closely checked the manuscript for
English style and grammar.

vii
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Reversible Functions . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Reversible Circuits . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Quantum Circuits . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Decision Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Binary Decision Diagrams . . . . . . . . . . . . . . . . . . 17
2.2.2 Quantum Multiple-valued Decision Diagrams . . . . . . . 19
2.3 Satisfiability Solvers . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 Boolean Satisfiability . . . . . . . . . . . . . . . . . . . . 21
2.3.2 Extended SAT Solvers . . . . . . . . . . . . . . . . . . . . 22

3 Synthesis of Reversible Logic . . . . . . . . . . . . . . . . . . . . . . 27


3.1 Current Synthesis Steps . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.1 Embedding Irreversible Functions . . . . . . . . . . . . . . 28
3.1.2 Transformation-based Synthesis . . . . . . . . . . . . . . . 30
3.2 BDD-based Synthesis . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 General Idea . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2 Exploiting BDD Optimization . . . . . . . . . . . . . . . . 34
3.2.3 Theoretical Consideration . . . . . . . . . . . . . . . . . . 37
3.2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . 39
3.3 SyReC: A Reversible Hardware Language . . . . . . . . . . . . . 46
3.3.1 The SyReC Language . . . . . . . . . . . . . . . . . . . . 47
3.3.2 Synthesis of the Circuits . . . . . . . . . . . . . . . . . . . 50
3.3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . 53
3.4 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . 56

4 Exact Synthesis of Reversible Logic . . . . . . . . . . . . . . . . . . . 57


4.1 Main Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 SAT-based Exact Synthesis . . . . . . . . . . . . . . . . . . . . . 61
ix
x Contents

4.2.1 Encoding for Toffoli Circuits . . . . . . . . . . . . . . . . 61


4.2.2 Encoding for Quantum Circuits . . . . . . . . . . . . . . . 65
4.2.3 Handling Irreversible Functions . . . . . . . . . . . . . . . 68
4.2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . 70
4.3 Improved Exact Synthesis . . . . . . . . . . . . . . . . . . . . . . 76
4.3.1 Exploiting Higher Levels of Abstractions . . . . . . . . . . 77
4.3.2 Quantified Exact Synthesis . . . . . . . . . . . . . . . . . 81
4.3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . 84
4.4 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . 91

5 Embedding of Irreversible Functions . . . . . . . . . . . . . . . . . . 93


5.1 The Embedding Problem . . . . . . . . . . . . . . . . . . . . . . 94
5.2 Don’t Care Assignment . . . . . . . . . . . . . . . . . . . . . . . 96
5.2.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.2.2 Experimental Results . . . . . . . . . . . . . . . . . . . . 99
5.3 Synthesis with Output Permutation . . . . . . . . . . . . . . . . . 100
5.3.1 General Idea . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3.2 Exact Approach . . . . . . . . . . . . . . . . . . . . . . . 104
5.3.3 Heuristic Approach . . . . . . . . . . . . . . . . . . . . . 105
5.3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . 106
5.4 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . 111

6 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.1 Adding Lines to Reduce Circuit Cost . . . . . . . . . . . . . . . . 114
6.1.1 General Idea . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.1.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.1.3 Experimental Results . . . . . . . . . . . . . . . . . . . . 117
6.2 Reducing the Number of Circuit Lines . . . . . . . . . . . . . . . 124
6.2.1 General Idea . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2.3 Experimental Results . . . . . . . . . . . . . . . . . . . . 130
6.3 Optimizing Circuits for Linear Nearest Neighbor Architectures . . 131
6.3.1 NNC-optimal Decomposition . . . . . . . . . . . . . . . . 133
6.3.2 Optimizing NNC-optimal Decomposition . . . . . . . . . . 134
6.3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . 138
6.4 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . 138

7 Formal Verification and Debugging . . . . . . . . . . . . . . . . . . . 143


7.1 Equivalence Checking . . . . . . . . . . . . . . . . . . . . . . . . 144
7.1.1 The Equivalence Checking Problem . . . . . . . . . . . . . 145
7.1.2 QMDD-based Equivalence Checking . . . . . . . . . . . . 145
7.1.3 SAT-based Equivalence Checking . . . . . . . . . . . . . . 148
7.1.4 Experimental Results . . . . . . . . . . . . . . . . . . . . 150
7.2 Automated Debugging and Fixing . . . . . . . . . . . . . . . . . . 154
7.2.1 The Debugging Problem . . . . . . . . . . . . . . . . . . . 155
7.2.2 Determining Error Candidates . . . . . . . . . . . . . . . . 157
Contents xi

7.2.3 Determining Error Locations . . . . . . . . . . . . . . . . 161


7.2.4 Fixing Erroneous Circuits . . . . . . . . . . . . . . . . . . 165
7.2.5 Experimental Results . . . . . . . . . . . . . . . . . . . . 167
7.3 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . 173

8 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . 175

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Acronyms

BDDs Binary Decision Diagrams


CMOS Complementary Metal Oxide Semiconductor
CNF Conjunctive Normal Form
CNOT Controlled-NOT
d Number of gates (depth) of a circuit
HDL Hardware Description Language
LNN Linear Nearest Neighbor
NNC Nearest Neighbor Cost
MCF Multiple control Fredkin
MCT Multiple control Toffoli
P Peres
QMDD Quantum Multiple-valued Decision Diagram
QF_BV Quantifier free bit-vector logic
QBF Quantified Boolean Formulas
SAT Boolean satisfiability
SMT SAT Modulo Theories
SWOP Synthesis with Output Permutation

xiii
Chapter 1
Introduction

In the last decades, great achievements have been made in the development of com-
puting machines. While computers consisting of a few thousands of components
filled whole rooms in the early 70’s, nowadays billions of transistors are built on
some square millimeters. This is a result of the achievements made in the domain
of semiconductors which are still holding on: The number of transistors in a cir-
cuit doubles every 18 months (which is also known as Moore’s Law according
to the founder of Intel, Gordon E. Moore, who formulated this as a prediction in
1965 [Moo65]).1 Until today, this prediction has not lost any of its validity—each
year more complex systems and chips are introduced.
However, it is obvious that such an exponential growth must reach its limits in
the future—at least when the miniaturization reaches a level, where single transistor
sizes are approaching the atomic scale. Besides that, power dissipation more and
more becomes a crucial issue for designing high performance digital circuits. In the
last decades, the amount of power dissipated in the form of heat to the surrounding
environment of a chip increased by orders of magnitudes. Since excessive heat may
decrease the reliability of a chip (or even destroys it), power dissipation is one of
the major barriers to progress the development of smaller and faster computer chips.
Due to these reasons, some researchers expect that from the 2020s on, duplication
of transistor density will not be possible any longer.
To further satisfy the needs for more computational power, alternatives are
needed that go beyond the scope of “traditional” technologies like CMOS.2 Re-
versible logic marks a promising new direction where all operations are performed
in an invertible manner. That is, in contrast to traditional logic, all computations
can be reverted (i.e. the inputs can be obtained from the outputs and vice versa).
A simple standard operation like the logical AND already illustrates that reversibil-
ity is not guaranteed in traditional circuit systems. Indeed, it is possible to obtain the
inputs of an AND gate if the output is assigned to 1 (then both inputs must be as-

1 Originally, Moore predicted a doubling every 12 months; ten years later he updated to 18 months.
2 CMOS is the abbreviation for Complementary Metal Oxide Semiconductor, the technology
mainly used for today’s integrated circuits.

R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 1


DOI 10.1007/978-90-481-9579-4_1, © Springer Science+Business Media B.V. 2010
2 1 Introduction

signed to 1 as well). But, it is not possible to determine the input values if the AND
outputs 0. In contrast, reversible logic allows bijective operations only, i.e. n-input
n-output functions that map each possible input vector to a unique output vector.
This reversibility builds the basis for emerging technologies that may replace or at
least enhance the traditional computer chip.
Two examples of such technologies making use of reversible logic are sketched
in the following:
• Reversible Logic for Low-Power Design
As mentioned above, power dissipation and therewith heat generation is a serious
problem for today’s computer chips. A significant part of energy dissipation is
due to the non-ideal behaviors of transistors and materials. Here, higher levels of
integration and new fabrication processes reduced the heat generation in the last
decade. However, a more fundamental reason for power dissipation arises from
the observations made by Landauer in 1961 [Lan61]. Landauer proved that using
traditional (irreversible) logic, gates always lead to energy dissipation regardless
of the underlying technology. More precisely, exactly k · T · log 2 Joule of energy
is dissipated for each “lost” bit of information during the irreversible operation
(where k is the Boltzmann constant and T is the temperature). While this amount
of power currently does not sound significant, it may become crucial additionally
considering that (1) today millions of operations are performed in some seconds
(i.e. increasing processor frequency multiplies this amount) and (2) more and
more operations are performed with smaller and smaller transistor sizes (i.e. in a
smaller area).
In contrast, Bennett showed that energy dissipation is reduced or even eliminated
if computation becomes information-lossless [Ben73]. This holds for reversible
logic, since data is bijectively transformed without losing any of the original infor-
mation. Bennett proved that circuits with zero power dissipation are only possible
if they are built from reversible gates. In 2002, first reversible circuits have been
built that exploit this observation [DV02]. In fact, these circuits were powered
by their input signals only (i.e. without additional power supplies). In the future,
such circuits may be an alternative that can cope with the heat generation prob-
lem of traditional chips. Furthermore, since reversible circuits already work with
low power, applications are also possible in domains where power is a limited
resource (e.g. for mobile computation).
• Reversible Logic as a Basis for Quantum Computation
Quantum circuits [NC00] offer a new kind of computation. Instead of logic sig-
nals 0 and 1, quantum circuits make use of qubits. A qubit is a two level quantum
system, described by a two dimensional complex Hilbert space. The resulting
tools can be obtained under www.revkit.org. This allows to represent not only 0
and 1 but also a superposition of both. As a result, qubits may represent multi-
ple states at the same time enabling enormous speed-ups in computations. For
example, it has been shown that using a quantum circuit it is possible to solve
the factorization problem in polynomial time, while for traditional circuits only
exponential methods exist [Sho94, VSB+01].
1 Introduction 3

But, research in the area of quantum circuits is still at the beginning. Neverthe-
less, first promising results exist: At the University of Innsbruck one of the first
quantum circuits consisting of 8 qubits was built in 2005. This has been further
improved so that today circuits with dozens of qubits exists—with upward trend.
Even first commercial realizations of quantum circuits (e.g. a random number
generator) are available. Reversible logic is important in this area because ev-
ery quantum operation is inherently reversible. Thus, progress in the domain of
reversible logic can be directly applied to quantum logic.
Besides that, reversible logic additionally finds application in domains like
optical computing [CA87], DNA computing [TS05], as well as nanotechnolo-
gies [Mer93]. Also, cryptography or encoding/decoding methods (e.g. for music
and videos) can profit from enhancements in this area (see e.g. [ML01]). Further-
more, already today reversible operations are used in instruction sets for micropro-
cessors [SL00].
The basic concepts of reversible logic are thereby not new and have already
been introduced in the 60’s by Landauer [Lan61] and were further refined by Ben-
nett [Ben73] and Toffoli [Tof80]. They observed that due to the reversibility, a
straightforward usage of fanouts and feedback is not possible in reversible logic.
Furthermore, new libraries of (reversible) gates have been introduced to represent
invertible operations [Tof80, FT82, Per85, NC00] and it was stated that each re-
versible circuit must be a cascade of these reversible gates.
Even if this still represents the basis for research in the area of reversible logic,
the topic was not intensively studied by computer scientists before the year 2000.
The main reason for that may lie in the fact that applications of reversible logic (in
particular in the domain of quantum computation) have been seen as “dreams of
the future”. But, this changed as with factorization a very important problem (fac-
torization builds the basis for most of the today’s encrypting methods) was solved
on a physically implemented quantum circuit [Sho94, VSB+01]. Therewith, a proof
of concept was available showing that quantum computing, in fact, may be one so-
lution for future computational problems. In particular, this achievement (together
with further ones e.g. in reversible CMOS design as mentioned above) significantly
moved the topic forward so that nowadays reversible logic is seen as a promising re-
search area. As a consequence, in the last years computer scientists have also started
to develop new methods e.g. for synthesis of reversible circuits.
However, no real design flow for reversible logic exists until today. This is crucial
since, due to the mentioned restrictions (e.g. no fanout and feedback), the design of
reversible circuits significantly differs from the design of traditional circuits. Nearly
all elaborated methods for synthesis, verification, debugging, and test available for
traditional circuit design must be redeveloped so that they become applicable to
reversible circuits as well. Now, while applications of reversible logic are starting
to become feasible and traditional technologies more and more suffer from the in-
creasing miniaturization, it is even more necessary to work towards such a flow.
Moreover, considering the traditional design flow, it can be concluded that until to-
day, computer scientists cannot fully exploit the technical state-of-the-art. That is,
the number of transistors that can be physically implemented on a chip grows faster
4 1 Introduction

Fig. 1.1 Proposed design flow

than the ability to design them in a useful manner (also known as the design gap).
This becomes even more crucial if, additionally, the ability to verify the correct-
ness of the designed circuits is considered (known as the verification gap). Once
reversible logic becomes feasible for large designs in the future, researchers will be
faced with similar challenges. Thus, it is worth working towards a design flow for
reversible logic already today.
First steps in this direction have been made in the domain of synthesis (see
e.g. [SPMH03, MDM05]), verification (see e.g. [VMH07, GNP08]), and test (see
e.g. [PHM04, PBL05, PFBH05]). However, they are all still far away from covering
real design needs. As an example, most synthesis approaches are only applicable
for small functions and often produce circuits with relatively high cost. In contrast,
design methods to create complex circuits and to efficiently verify their correctness
are needed.
This book makes contributions to a future design flow for reversible logic by
proposing advanced methods for synthesis, optimization, verification, and debug-
ging. Figure 1.1 shows the interaction of the proposed steps in an integrated flow.
The left-hand side sketches the restrictions or challenges, respectively, to be solved
in comparison to traditional methods. By combining the techniques proposed in the
book, it is possible to synthesize reversible circuits representing large functions.
Optimization methods ensure that the resulting circuits are of small cost. Finally,
methods for equivalence checking and automatic debugging allow to verify the ob-
tained results and help to accelerate the search for bugs in case of errors in the
design. In the following, the respective contributions are briefly introduced in the
order they appear in this book. A more detailed description of the problems as well
as the proposed solutions is given at the beginning of each chapter.
1 Introduction 5

As a starting point, synthesis is considered in Chap. 3. Currently, the synthesis of


reversible logic or quantum circuits, respectively, is limited. In the past, only meth-
ods have been proposed that are applicable for relatively small functions, i.e. func-
tions with at most 30 variables. In addition, these methods often require an enor-
mous amount of run-time. After reviewing the reasons for these limitations, a new
synthesis method based on Binary Decision Diagrams (BDDs) is proposed. This en-
ables synthesis of functions containing over 100 variables and thus is a major step
towards design of complex systems in reversible logic. Additionally, a hardware
description language is introduced that allows to specify and afterwards synthesize
complex circuits in reversible logic.
The problem of exact synthesis is considered in Chap. 4. Exact synthesis meth-
ods generate minimal circuits for a given function. Naturally, exact synthesis ap-
proaches are only applicable for very small functions. But, the resulting circuit re-
alizations can be used later e.g. as building blocks for heuristic approaches. Never-
theless, run-time is the limiting factor here. The chapter describes how techniques
of Boolean satisfiability (SAT) can be exploited for efficient exact synthesis of re-
versible circuits. Further approaches incorporating problem-specific knowledge as
well as quantification are then introduced. These methods allow further accelera-
tions of the exact synthesis.
With these different synthesis approaches as a basis, the problem of embedding is
addressed in Chap. 5. Usually, most of the synthesis approaches require a reversible
function as input. But, basic functions like AND or addition are inherently irre-
versible. Thus, before synthesis, these functions must be embedded into reversible
ones. This requires the addition of extra circuit signals and therewith constant in-
puts, garbage outputs, as well as don’t care conditions at the outputs. Furthermore,
the order of outputs may be chosen arbitrarily. All this affects the generated syn-
thesis result. In Chap. 5, methods for finding good embeddings are proposed and
evaluated.
After synthesis, the resulting circuits often are of high cost. In particular, dedi-
cated technology-specific constraints are not considered by synthesis approaches. To
address this, in Chap. 6 three different optimization methods are introduced—each
with an own focus on a particular cost metric. The first one considers the reduction
of the well-established quantum cost (used in quantum circuits) and the transistor
cost (used in CMOS implementations), respectively. The second one addresses the
number of lines in a circuit which particularly is important for all quantum real-
izations. Finally, an approach is introduced that takes a new cost metric based on
a dedicated physical realization of quantum circuits into account. This allows that
designers can automatically optimize their circuits with respect to the special needs
of the addressed technology.
To ensure that the respective results (e.g. obtained by optimization) still repre-
sent the desired functionality, verification is applied. For this purpose, in Chap. 7
equivalence checkers are introduced that can handle circuits with several thousands
of gates in a very short time. Furthermore, an automatic approach for debugging is
proposed. Instead of manually searching for the source of an error, this method al-
lows a fast calculation of a reduced set of error candidates to be considered or even
to automatically fix the erroneous circuit, respectively.
6 1 Introduction

Fig. 1.2 Structure of the book

Altogether, the contributions of this book to the design flow for reversible logic
can be summarized as follows:
• Synthesis methods for large functions (i.e. functions with more than 100 vari-
ables)
• A hardware description language for reversible logic
• Exact approaches for synthesizing minimal circuits that can later be used as build-
ing blocks
• Embedding methods to automatically realize circuits for irreversible functions
• Optimization approaches to reduce the cost with respect to the addressed technol-
ogy
• Equivalence checking of large circuits (i.e. circuits with several thousands of
gates)
• Automatic debugging and fixing of erroneous circuits
All proposed methods have been implemented and experimentally evaluated.
To this end, a uniform format for specifying reversible functions as well as re-
versible circuits has been defined which was used in all experiments throughout
this book (see also the note on benchmarks on p. 26). Furthermore, all benchmark
functions as well as the circuits have been made online available at RevLib under
www.revlib.org. The resulting tools can be obtained under www.revkit.org. This al-
lows other researchers to compare their results with the ones obtained in this work.
However, the results together with a discussion, related work, and future research
directions are, of course, also given in the respective chapters.
According to the outline sketched above, the remainder of this book is structured
as depicted in Fig. 1.2. The next chapter gives a more detailed introduction into
both reversible as well as quantum logic and provides the basic notations and def-
initions as used in the rest of this book. Afterwards, the chapters about synthesis
(Chap. 3), optimization (Chap. 6), as well as verification and debugging (Chap. 7)
can be read independently of each other. Only for Chap. 4 about exact synthesis and
Chap. 5 about embedding irreversible functions is it recommended to read the pre-
vious chapters beforehand. Chapter 8 summarizes all findings and gives directions
for future work.
Chapter 2
Preliminaries

This chapter provides the basic definitions and notations to keep the remaining book
self-contained. The chapter is divided into three parts. In the first section, Boolean
functions, reversible functions, and the respective circuit descriptions are intro-
duced. This builds the basis for all approaches described in this book. Since many
of the proposed techniques exploit decision diagrams and satisfiability solvers, re-
spectively, the basic concepts of these core techniques are also introduced in the last
two sections. All descriptions are thereby kept brief. For a more in-depth treatment,
references to further reading are given in the respective sections.

2.1 Background

Reversible logic realizes bijective Boolean functions. Thus, first the basics regard-
ing Boolean functions are revisited and further extended by a description of the
properties specifically applied to reversible functions. Then, reversible circuits as
well as quantum circuits are introduced which are used as realizations of reversible
functions.

2.1.1 Reversible Functions

Every logic computation can be defined as a function over Boolean variables


B ∈ {0, 1}. More precisely:

Definition 2.1 A Boolean function is a mapping f : Bn → B with n ∈ N. A func-


tion f is defined over its input variables X = {x1 , x2 , . . . , xn } and hence is also
denoted by f (x1 , x2 , . . . , xn ). The concrete mapping is described in terms of
Boolean expressions which are formed over the variables from X and the opera-
tions ∧ (AND), ∨ (OR), as well as · (NOT).
R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 7
DOI 10.1007/978-90-481-9579-4_2, © Springer Science+Business Media B.V. 2010
8 2 Preliminaries

Table 2.1 Boolean functions (a) AND (b) OR (c) NOT


x1 x2 x1 ∧ x2 x1 x2 x1 ∨ x2 x1 x1

0 0 0 0 0 0 0 1
0 1 0 0 1 1 1 0
1 0 0 1 0 1
1 1 1 1 1 1

Example 2.1 Table 2.1 shows the truth tables of the operations AND, OR, and NOT,
respectively. Each truth table has 2n rows, showing the mapping of each input pat-
tern to the respective output pattern.

Taking AND, OR, and NOT as a basis, every Boolean function can be derived.
For example, the often used functions XOR, implication, and equivalence are de-
rived as follows:
• XOR: x1 ⊕ x2 := (x1 ∧ x 2 ) ∨ (x 1 ∧ x2 )
• Implication: x1 ⇒ x2 := x 1 ∨ x2
• Equivalence: x1 ⇔ x2 := x1 ⊕ x2
So far, single-output functions have been introduced. However, in practice also
multi-output functions are widely used.

Definition 2.2 A multi-output Boolean function is a mapping f : Bn → Bm with


n, m ∈ N. More precisely, it is a system of Boolean functions fi (x1 , x2 , . . . , xn )
with 1 ≤ i ≤ m. In the following multi-output functions are also termed as n-input,
m-output functions or n × m functions, respectively.

Example 2.2 Table 2.2(a) shows the truth table of a 3-input, 2-output function rep-
resenting the adder function.

This book considers reversible functions. Reversible functions are a subset of


multi-output functions and are defined as follows:

Definition 2.3 A multi-output function f : Bn → Bm is a reversible function iff


• its number of inputs is equal to the number of outputs (i.e. n = m) and
• it maps each input pattern to a unique output pattern.

In other words, each reversible function is a bijection that performs a permutation


of the set of input patterns. A function that is not reversible is termed irreversible.

Example 2.3 Table 2.2(c) shows a 3-input, 3-output function. This function is re-
versible, since each input pattern maps to a unique output pattern. In contrast the
function depicted in Table 2.2(a) is irreversible, since n
= m. Moreover, also the
2.1 Background 9

Table 2.2 Multi-output functions


(a) Irrev. (Adder) (b) Irreversible (c) Reversible
x1 x2 x3 f1 f2 x1 x2 x3 f1 f2 f3 x1 x2 x3 f1 f2 f3

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 0
0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0
0 1 1 1 0 0 1 1 0 1 1 0 1 1 1 0 1
1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 0 1
1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1
1 1 0 1 0 1 1 0 1 1 1 1 1 0 1 1 0
1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1

function in Table 2.2(b) is irreversible. Here, the number n of inputs indeed is equal
to the number m of outputs, but there is no unique input-output mapping. For exam-
ple, both inputs 000 and 001 map to the output 000.

Quite often, (irreversible) multi-output Boolean functions should be represented


by reversible circuits. This necessitates the irreversible function to be embedded into
a reversible one which requires the addition of constant inputs and garbage outputs
defined as follows:

Definition 2.4 A constant input of a reversible function is an input that is set to a


fixed value (either 0 or 1).

Definition 2.5 A garbage output of a reversible function is an output which is a


don’t care for all possible input conditions.

The problem of embedding is an integral part of synthesis which is described


later in this book. In particular, Sect. 3.1.1 and Chap. 5 cover the respective aspects
in detail.
Reversible functions can be realized by reversible logic. Due to its special prop-
erties, reversible logic found large interest in several domains like low-power design
or quantum computation (see Chap. 1). As a result, synthesis of reversible functions
has become an intensively studied topic in the last years. Therefore, new kinds of
circuits have been proposed that are introduced and compared to traditional circuits
in the next section.

2.1.2 Reversible Circuits

A circuit realizes a Boolean function. Usually, a circuit is composed of signal lines


and a set of basic gates (called gate library). For traditional circuits, often the gate
10 2 Preliminaries

Fig. 2.1 Traditional circuit elements

library depicted in Fig. 2.1 is used. This includes gates for the operations AND,
OR, and NOT, based on which any Boolean function can be realized. Furthermore,
fanouts are applied to use signal values more than once.
In contrast, to realize reversible logic some restrictions must be considered:
fanouts and feedback are not directly allowed, since they would destroy the re-
versibility of the computation [NC00]. Also, the gate library from above as well
as the traditional design flow cannot be utilized. As a result, a cascade structure
over reversible gates is the established model to realize reversible logic.

Definition 2.6 A reversible circuit G over inputs X = {x1 , x2 , . . . , xn } is a cas-


cade of reversible gates gi , i.e. G = g0 g1 · · · gd−1 where d is the number of gates.
A reversible gate has the form g(C, T ), where C = {xi1 , . . . , xik } ⊂ X is the set of
control lines and T = {xj1 , . . . , xjl } ⊂ X with C ∩ T = ∅ is the set of target lines.
C may be empty. The gate operation is applied to the target lines iff all control lines
meet the required control conditions. Control lines and unconnected lines always
pass through the gate unaltered.

In the literature, three types of reversible gates have been established:


• A (multiple control) Toffoli gate (MCT) [Tof80] has a single target line xj and
maps (x1 , x2 , . . . , xj , . . . , xn ) to (x1 , x2 , . . . , xi1 xi2 · · · xik ⊕ xj , . . . , xn ). That is,
a Toffoli gate inverts the target line iff all control lines are assigned to 1.
• A (multiple control) Fredkin gate (MCF) [FT82] has two target lines xj1 and
xj2 . The gate interchanges the values of the target lines iff the conjunction of all
control lines evaluates to 1.
• A Peres gate (P) [Per85] has a control line xi , a target line xj1 , and a line xj2
that serves as both, control and target. It maps (x1 , x2 , . . . , xj1 , . . . , xj2 , . . . , xn )
to (x1 , x2 , . . . , xi xj2 ⊕xj1 , . . . , xi ⊕xj2 , . . . , xn ) and thus is a cascade of two MCT
gates.

Example 2.4 Figure 2.2 shows a Toffoli gate (a), a Fredkin gate (b), and a Peres
gate (c) together with a truth table of its functionality. A ● is used to indicate a
control line, while an ⊕ (×) is used for denoting the target line of a Toffoli and
Peres gate (Fredkin gate).

Remark 2.1 These definitions also provide the basis for other gate types. For exam-
ple, the Toffoli gate builds the basis for the NOT gate (a Toffoli gate with no control
lines, i.e. with C = ∅), for the controlled-NOT gate (a Toffoli gate with one control
2.1 Background 11

Fig. 2.2 Reversible gates

Fig. 2.3 Reversible circuits

line),1 as well as for the Toffoli gate as originally proposed in [Tof80]. In contrast,
the Fredkin gate builds the basis for a SWAP gate (a Fredkin gate with C = ∅, i.e. an
interchanging of two lines).

In the following, the notations MCT(C, xj ), MCF(C, xj1 , xj2 ), and P (xi , xj1 , xj2 )
are used to denote a Toffoli, Fredkin, and Peres gate, respectively. The number of
control lines, a Toffoli (Fredkin) gates consists of, defines the size of the gate.
Using these gate types, universal libraries can be composed. A gate library is
called universal, if it enables the realization of any reversible function. For example,
it has been proven that every reversible function can be realized using MCT gates
only [MD04b]. Also, the gate library consisting of NOT, CNOT, and two-controlled
Toffoli gates is universal [SPMH03]. In contrast, a library including only CNOT
gates allows the realization of linear reversible functions only [PMH08].

Example 2.5 Figure 2.3 shows reversible circuits realizing the function depicted in
Table 2.2(c) with the help of Toffoli and Fredkin gates, respectively.

As for their traditional counterparts, the complexity of reversible circuits is mea-


sured by means of different cost metrics. More precisely, the cost of the respective
circuits is defined as follows:

1 The controlled-NOT gate is also known as CNOT or Feynman gate.


12 2 Preliminaries

Table 2.3 Quantum cost for Toffoli and Fredkin gates


N O . OF Q UANTUM COST
CONTROL LINES OF A T OFFOLI GATE OF A F REDKIN GATE

0 1 3

1 1 7

2 5 15

3 13 28, if at least 2 lines are unconnected


31, otherwise

4 26, if at least 2 lines 40, if at least 3 lines are


are unconnected unconnected
29, otherwise 54, if 1 or 2 lines are unconnected
63, otherwise

5 38, if at least 3 lines are 52, if at least 4 lines are


unconnected unconnected
52, if 1 or 2 lines are 82, if 1, 2 or 3 lines are
unconnected unconnected
61, otherwise 127, otherwise

6 50, if at least 4 lines are 64, if at least 5 lines are


unconnected unconnected
80, if 1, 2 or 3 lines are 102, if 1, 2, 3 or 4 lines are
unconnected unconnected
125, otherwise 255, otherwise

Definition 2.7 A reversible circuit G = g0 g1 · · · gd−1 has cost of


d−1
c= ci ,
i=0

where ci denotes the cost of gate gi .

The concrete cost for a single gate of course depends on the respective type but
also on the addressed technology. In this book, the following cost metrics are used:
• Gate count denotes the number of gates the circuit consists of (i.e. ci = 1 and
c = d).
• Quantum cost denotes the effort needed to transform a reversible circuit to a quan-
tum circuit (see also next section). Table 2.3 shows the quantum cost for a selec-
tion of Toffoli and Fredkin gate configurations as introduced in [BBC+95] and
further optimized in [MD04a] and [MYDM05]. As can be seen, gates of larger
size are considerably more expensive than gates of smaller size. The Peres gate
2.1 Background 13

represents a special case, since it has quantum cost of 4, while the realization with
two Toffoli gates would imply a cost of 6.
• Transistor cost denotes the effort needed, to realize a reversible circuit in CMOS
according to [TG08]. The transistor cost of a reversible gate is 8 · s where s is the
number of control lines.

Example 2.6 Consider the circuits from Example 2.5 depicted in Fig. 2.3. The Tof-
foli circuit has a gate count of 6, quantum cost of 10, and transistor cost of 56, while
the Fredkin circuit has a gate count of 3, quantum cost of 13, and transistor cost
of 8, respectively.

As can be seen, the costs significantly differ depending on the applied cost model.
Even if the number of gates in a cascade is a very simple measure of its complexity,
it is the most technology-independent metric. Thus, the gate count is often used to
evaluate the quality of a reversible circuit. Besides that, also the quantum cost metric
is popular because it represents a measure for the most intensely studied application
(namely quantum computation) and considers larger gates to be more costly. The
transistor cost model is a relatively new model that arose with the application of
reversible circuits to the area of low-power CMOS design. In this book, gate count
and quantum cost are primarily considered as it allows a fair comparison of synthesis
results with respect to previous work. Transistor costs are additionally addressed
where appropriate.
Finally, a special property of reversible logic is reviewed:

Lemma 2.1 If the cascade of MCT gates G = g0 g1 · · · gd−1 realizes a reversible


function f , then the reverse cascade G = gd−1 gd−2 · · · g0 realizes the inverse func-
tion f −1 .

Proof Each reversible gate realizes a reversible function. That is, for each input
pattern a unique output pattern, i.e. a one-to-one mapping, exists. Thus, calculating
the inverse of the function f for an output pattern is essentially the same operation
as propagating this pattern backwards through the circuit. 

This lemma is particularly exploited during synthesis of reversible logic as de-


scribed later in this book.
The next section considers quantum circuits and how they are derived from re-
versible logic.

2.1.3 Quantum Circuits

Quantum computation [NC00] is a promising application of reversible logic. Every


quantum circuit works on qubits instead of bits. In contrast to Boolean logic, qubits
do not only allow to represent Boolean 0’s and Boolean 1’s, but also the superposi-
tion of both. More formally:
14 2 Preliminaries

Definition 2.8 A qubit is a two level quantum system, described by a two dimen-
sional complex Hilbert space. The two orthogonal quantum states
   
1 0
|0 ≡ and |1 ≡
0 1
are used to represent the Boolean values 0 and 1. Any state of a qubit may be written
as |Ψ  = α|0 + β|1, where α and β are complex numbers with |α|2 + |β|2 = 1.
The quantum state of a single qubit is denoted by the vector
 
α
.
β

The state of a quantum system with n > 1 qubits is given by an element of the
tensor product of the respective state spaces and can be represented as a normalized
vector of length 2n , called the state vector. The state vector is changed through mul-
tiplication of appropriate 2n × 2n unitary matrices. Thus, each quantum computation
is inherently reversible but manipulates qubits rather than pure logic values. At the
end of the computation, a qubit can be measured. Then, depending on the current
state of the qubit a 0 (with probability of |α|2 ) or a 1 (with probability of |β|2 )
returns, respectively. After the measurement, the state of the qubit is destroyed.
In other words, using quantum computation and qubits in superposition, func-
tions can be evaluated with different possible input assignments in parallel. But,
it is not possible to obtain the current state of a qubit. Instead, if a qubit is mea-
sured, either 0 or 1 is returned depending on the respective probability. Nevertheless,
researchers exploited quantum computation (in particular superposition) to solve
many practically relevant problems faster than by traditional computing machines.
For example, it was possible to solve the factorization problem in polynomial time—
for traditional machines only exponential algorithms are known. Even if the research
in this area is still at the beginning (so far, quantum algorithms with only up to 28
qubits have been implemented), these first promising results motivate further re-
search in this area.
The focus of this book is how to design reversible and quantum circuits, respec-
tively. Thus, in the following the model for quantum circuits as used in this book
is introduced. For a more detailed treatment of the respective physical background,
the reader is referred to [Pit99, NC00, Mer07].

Definition 2.9 A quantum circuit Q is a cascade of quantum gates qi , i.e. Q =


q0 · · · qd−1 .

In this book, the following quantum gates are considered:


• Inverter (NOT): A single qubit is inverted.
• Controlled inverter (CNOT): The target qubit is inverted if the control qubit is 1.
• Controlled V gate: A V operation is performed on the target qubit if the control
qubit is 1. The V operation is also known as the square root of NOT, since two
consecutive V operations are equivalent to an inversion.
2.1 Background 15

Fig. 2.4 Quantum gates

Fig. 2.5 State transitions for


NOT, CNOT, V, and V+
operations

• Controlled V+ gate: A V+ operation is performed on the target qubit if the


control qubit is 1. The V+ gate performs the inverse operation of the V gate,
i.e. V + ≡ V −1 .
The notation for these gates along with their corresponding 2n × 2n unitary ma-
trices is shown in Fig. 2.4.
In the following, the input to a quantum circuit as well as to each control line
of a gate is restricted to 0 and 1. This has the effect that the value of each qubit is
restricted to one value of the set {0, 1, V0 , V1 }, i.e. a 4-valued logic with
   
1+i 1 1 + i −i
V0 = and V1 =
2 −i 2 1
is applied. Figure 2.5 shows the resulting transitions with respect to the possible
NOT, CNOT, V, and V+ operations. By restricting the quantum circuit model in this
way, physical effects like superposition (and entanglement [NC00]) are excluded
from the following consideration so that automated approaches (e.g. for synthe-
sis, optimization, verification, etc.) become applicable. Nevertheless, the restricted
model remains realistic for many applications. As an example, many of today’s
quantum algorithms (e.g. Deutsch’s algorithm or Grover’s algorithm [NC00]) in-
clude quantum realizations of reversible (Boolean) functions. Thus, the mentioned
restrictions are common in the design of quantum circuits.
16 2 Preliminaries

Fig. 2.6 Quantum circuit

Fig. 2.7 Pairs of quantum


gates with unit cost

Example 2.7 Figure 2.6 shows a quantum circuit realizing the reversible function
depicted in Table 2.2(c).

All quantum gates are assumed to be the basic blocks of each quantum computa-
tion. This is also reflected in the cost metric.

Definition 2.10 Each quantum gate has cost of 1. Thus, the cost of a quantum circuit
is defined by the number d of its gates.

Remark 2.2 In previous work, also an extended cost metric has been applied: When
a CNOT and a V (or V+) gate are applied to the same two qubits, the cost of the
pair can be considered unit as well [SD96, HSY+06]. The possible pairs (denoted
by double gates in the following) are shown in Fig. 2.7. In this book, primarily the
cost metric from Definition 2.10 is applied. However, all approaches can also be
extended to consider unit cost of double gates. Exemplarily, this is shown for exact
synthesis of quantum circuits in Sect. 4.2.2.

Since quantum circuits are inherently reversible, every reversible circuit can be
transformed to a quantum circuit. To this end, each gate of the reversible circuit is
decomposed into a cascade of quantum gates.

Example 2.8 Figure 2.8(a) (Fig. 2.8(b)) shows the quantum gate cascade which can
be used to transform a Toffoli (Fredkin) gate to a quantum circuit. As can be seen,
the number of required quantum gates is equal to the quantum cost of the Toffoli
(Fredkin) gate as introduced in Table 2.3.

Exploiting these decompositions, synthesis of quantum circuits can be ap-


proached from two different angles: (1) Targeting quantum gates directly during the
synthesis process or (2) synthesizing reversible circuits first that are later mapped
into quantum circuits.

2.2 Decision Diagrams


To represent Boolean (including reversible) functions and circuits, decision dia-
grams can be applied. They provide an efficient data-structure that can represent
2.2 Decision Diagrams 17

Fig. 2.8 Decomposition of reversible gates to quantum circuits

large functions in a more compact way than truth tables. In the past, several types
of decision diagrams have been introduced. In this book, Binary Decision Dia-
grams (BDDs) [Bry86] are considered to represent Boolean functions. Quantum
Multiple-valued Decision Diagrams (QMDDs) [MT06, MT08] are used to repre-
sent reversible functions that may include quantum operations. Both are briefly in-
troduced in this section.

2.2.1 Binary Decision Diagrams

A Boolean function f : Bn → B can be represented by a graph-structure defined as


follows:

Definition 2.11 A Binary Decision Diagram (BDD) over Boolean variables X with
terminals T = {0, 1} is a directed acyclic graph G = (V , E) with the following prop-
erties:
1. Each node v ∈ V is either a terminal or a non-terminal.
2. Each terminal node v ∈ V is labeled by a value t ∈ T and has no outgoing edges.
3. Each non-terminal node v ∈ V is labeled by a Boolean variable xi ∈ X and rep-
resents a Boolean function f .
4. In each non-terminal node (labeled by xi ), the Shannon decomposition [Sha38]

f = x i fxi =0 + xi fxi =1

is carried out, leading to two outgoing edges e ∈ E whose successors are denoted
by low(v) (for fxi =0 ) and high(v) (for fxi =1 ), respectively.
The size of a BDD is defined by the number of its (non-terminal) nodes.

Example 2.9 Figure 2.9 shows a BDD representing the function f = x1 ⊕ x2 · x3 .


Edges leading to a node fxi =0 (fxi =1 ) are marked by a 0 (1). This BDD has a size
of 5.
18 2 Preliminaries

Fig. 2.9 BDD representing


f = x1 ⊕ x2 · x3

A BDD is called free if each variable is encountered at most once on each path
from the root to a terminal node. A BDD is called ordered if in addition all variables
are encountered in the same order on all such paths. The respective order is defined
by π : {1, . . . , n} → {1, . . . , n}. Finally, a BDD is called reduced if it does neither
contain isomorphic sub-graphs nor redundant nodes. To achieve reduced BDDs,
reduction rules as depicted in Fig. 2.10 are applied. Applying the reduction rules
leads to shared nodes, i.e. nodes that have more than one predecessor.

Example 2.10 Figure 2.11 shows two reduced ordered BDDs representing the func-
tion f = x1 ·x2 +x3 ·x4 +· · ·+xn−1 ·xn . For the order x1 , x2 , . . . , xn−1 , xn , the BDD
depicted in Fig. 2.11(a) has a size of O(n), while the BDD depicted in Fig. 2.11(b)
with the order x1 , x3 , . . . , xn−1 , x2 , x4 , . . . , xn has size of O(2n ).

Remark 2.3 In the following, reduced ordered binary decision diagrams are called
BDDs for brevity. BDDs are canonical representations, i.e. for a given Boolean func-
tion and a fixed order, the BDD is unique [Bry86].

As shown by Example 2.10, BDDs are very sensitive to the chosen variable order.
It has been shown in [BW96] that proving the existence of a BDD with a lower
number of nodes (i.e. proving that no other order leads to a smaller BDD size) is
NP-complete. As a consequence, several heuristics to find good orders have been
proposed. In particular, sifting [Rud93] has been shown to be quite effective.
Further reductions of the BDD size can be achieved, if complement edges
[BRB90] are applied. They allow to represent a function as well as its comple-
ment by one single node only. BDDs can also be used to represent multi-output
functions. Then, all BDDs for the respective functions are shared, i.e. isomorphic
sub-functions are represented by a single node as well.
For a more comprehensive introduction into BDDs, the reader is refereed
to [DB98, EFD05]. For the application of BDDs in practice, many well-engineered
BDD packages (e.g. CUDD [Som01]) are available.
2.2 Decision Diagrams 19

Fig. 2.10 Reduction rules for BDDs

Fig. 2.11 BDDs with different variable orders

2.2.2 Quantum Multiple-valued Decision Diagrams

As described in Sect. 2.1.3, quantum operations are defined by 2n × 2n unitary


matrices (consider again Fig. 2.4 on p. 15 for examples). Thus, to represent func-
tions including quantum operations, an adjusted data-structure is needed. Quantum
Multiple-valued Decision Diagrams (QMDDs) [MT06, MT08] provide for the rep-
resentation and manipulation of r n × r n complex-valued matrices with r pure logic
states. This includes unitary matrices and thus QMDDs can be applied to represent
quantum gates and circuits. Since in this book QMDDs are used as black box only
(in contrast to BDDs), a formal definition of QMDDs is omitted and instead they
are introduced by exemplarily describing the general idea.
20 2 Preliminaries

Fig. 2.12 QMDD representing the matrix of a single V gate

A QMDD structure is based on partitioning an r n × r n matrix M into r 2 sub-


matrices, each of dimension r n−1 × r n−1 as shown in the following equation:
⎛ ⎞
M0 M1 · · · Mr−1
⎜ ⎟
⎜ Mr Mr+1 · · · M2r−2 ⎟
⎜ ⎟
M =⎜ . .. .. ⎟ .
⎜ .. ..
. . ⎟
⎝ . ⎠
Mr 2 −r Mr 2 −r+1 · · · Mr 2 −1
In the following, the concepts of QMDDs are briefly presented by way of the
following example of a single V gate.

Example 2.11 Figure 2.12(a) shows a V gate in a 3-line circuit. The unitary ma-
trix describing the behavior of this gate is given in Fig. 2.12(b) where v = 1+i2 and
v  = 1−i
2 . The QMDD for this matrix is given in Fig. 2.12(c). The edges from each
non-terminal node point to four sub-matrices indexed 0, 1, 2, 3 from left to right.
Each edge has a complex-valued weight. For clarity, edges with weight 0 are indi-
cated as stubs. In fact, they point to the terminal node.

The key features of QMDD are evident in this example. There is a single termi-
nal node. Furthermore, each edge has a complex-valued weight. Each non-terminal
node represents a matrix partitioning. For example, the top node in Fig. 2.12(c) rep-
resents the partitioning shown in Fig. 2.12(b). The non-terminal nodes lower in the
diagram represent similar partitioning of the resulting sub-matrices. The represen-
tation of common sub-matrices is shared. To ensure the uniqueness of the represen-
tation, edges with weight 0 must point to the terminal node and normalization is
applied to non-terminal nodes so that the lowest indexed edge with non-zero weight
has weight 1.
As for BDDs, an efficient implementation exists also for QMDDs. However,
since QMDD involve multiple edges from nodes and are applicable to both binary
2.3 Satisfiability Solvers 21

and multiple-valued problems, the QMDD package is not built using a standard
decision diagram package. Nevertheless, the implementation employs well-known
decision diagram techniques like sharing, reordering, and so on. For a more com-
prehensive introduction into QMDDs, the reader is referred to [MT08].

2.3 Satisfiability Solvers

The methods described in this book make use of techniques for solving the Boolean
satisfiability problem (SAT problem). The SAT problem is one of the central
NP-complete problems. In fact, it was the first known NP-complete problem that
was proven by Cook in 1971 [Coo71]. Despite this proven complexity, efficient solv-
ing algorithms have been developed that found great success as proof engines for
many practically relevant problems. Today there exists algorithms exploiting SAT
that solve many practical problem instances, e.g. in the domain of automatic test pat-
tern generation [Lar92, DEF+08], logic synthesis [ZSM+05], debugging [SVAV05],
and verification [BCCZ99, CBRZ01, PBG05].
In this section, the SAT problem, the respective solving algorithm, and its ap-
plication are introduced. Furthermore, extended SAT solvers additionally exploiting
bit-vector logic, quantifiers, or problem-specific modules, respectively, are briefly
reviewed. These engines are used later as core techniques for selected steps in the
proposed flow for reversible logic.

2.3.1 Boolean Satisfiability

The Boolean satisfiability problem (SAT problem) is defined as follows:

Definition 2.12 Let h : Bn → B be a Boolean function. Then, the SAT problem is


to find an assignment to the variables of h such that h evaluates to 1 or to prove that
no such assignment exists.

In other words, SAT asks if ∃Xh for an h over variables X and determines a
satisfying assignment in this case. In this context, the Boolean formula h is often
given in Conjunctive Normal Form (CNF). A CNF is a set of clauses, each clause
is a set of literals, and each literal is a Boolean variable or its negation. The CNF
formula is satisfied if all clauses are satisfied, a clause is satisfied if at least one of
its literals is satisfied, and a variable is satisfied when 1 is assigned to the variable
(the negation of a variable is satisfied under the assignment 0).

Example 2.12 Let h = (x1 + x2 + x 3 )(x 1 + x3 )(x 2 + x3 ). Then, x1 = 1, x2 = 1, and


x3 = 1 is a satisfying assignment for h. The values of x1 and x2 ensure that the first
clause becomes satisfied, while x3 ensures this for the remaining two clauses.
22 2 Preliminaries

Fig. 2.13 Solving algorithm


in modern SAT solvers

To solve SAT problems, in the past several (backtracking) algorithms (or SAT
solvers, respectively) have been proposed [DP60, DLL62, MS99, MMZ+01, GN02,
ES04]. Most of them apply the steps depicted in Fig. 2.13: While there are free
variables left (a), a decision is made (c) to assign a value to one of these variables.
Then, implications are determined due to the last assignment (d). This may cause
a conflict (e) that is analyzed. If the conflict can be resolved by undoing assign-
ments from previous decisions, backtracking is done (f). Otherwise, the instance is
unsatisfiable (g). If no further decision can be made, i.e. a value is assigned to all
variables and this assignment did not cause a conflict, the CNF is satisfied (b). Ad-
vanced techniques like e.g. efficient Boolean constraint propagation [MMZ+01] or
conflict analysis [MS99] as well as efficient decision heuristics [GN02] are common
in state-of-the-art SAT solvers today.
These techniques as well as the tremendous improvements in the performance of
the respective implementations [ES04] enable the consideration of problems with
more than hundreds of thousands of variables and clauses. Thus, SAT is widely used
in many application domains. Therefore, the real world problem is transformed into
CNF and then solved by using a SAT solver as a black box.

2.3.2 Extended SAT Solvers

Despite their efficiency, Boolean SAT solvers have a major drawback: they work
on the Boolean level. But, many problems are formulated on a higher level of
abstraction and would benefit from a more general description, respectively. As a
consequence, researchers investigated the use of more expressive formulations than
CNF—by still exploiting the established SAT techniques. This leads (1) to the com-
bination of SAT solvers with decision procedures for decidable theories resulting
2.3 Satisfiability Solvers 23

in SAT Modulo Theories (SMT) [BBC+05, DM06b] and (2) to the application of
quantifiers resulting in Quantified Boolean Formulas (QBF) [Bie05, Ben05]. Fur-
thermore, problem-specific knowledge is exploited during the solving process by
the SAT solver SWORD [WFG+07]. The respective concepts are briefly reviewed in
the following.

2.3.2.1 SMT Solvers for Bit-vector Logic

An SMT solver integrates a Boolean SAT solver with other solvers for specialized
theories (e.g. linear arithmetic or bit-vector logic). The SAT solver thereby works on
an abstract representation (still in CNF) of the problem and steers the overall search
process, while each (partial) assignment of this representation has to be validated by
the theory solver for the theory constraints. Thus, advanced SAT techniques together
with specialized theory solvers are exploited.
In this book, the theory of quantifier free bit-vector logic (QF_BV) is utilized.
This logic is defined as follows:

Definition 2.13 A bit-vector is an element b = (bn−1 , . . . , b0 ) ∈ Bn . The index


[ ] : Bn × [0, n) → B maps a bit-vector b and an index i to the ith component
of the vector, i.e. b[i] = bi . Conversion from (to) a natural number is defined by
nat : Bn → N (bv : N → Bn ) with N = [0, 2n ) ⊂ N and nat(b) := n−1 i=0 bi · 2
i
−1
(bv := nat ).
Problems can be constraint by using bit-vector operations as well as arithmetic
operations. Let a, b ∈ Bn be two bit-vectors. Then, the bit-vector operation ◦ ∈
{∧, ∨, . . .} is defined by a ◦ b := (a[n − 1] ◦ b[n − 1], . . . , a[0] ◦ b[0]). An arithmetic
operation • ∈ {·, +, . . .} is defined by a • b := nat(a) • nat(b).

Example 2.13 Let a, b, and c be three bit-vector variables with bit-width n = 3 and
(a ∨ b = c) ∧ (a + b = c) an SMT bit-vector instance over these variables. Then,
a = (010), b = (001), and c = (011) is a satisfying solution of this instance, since it
satisfies each constraint.

To solve SMT instances in QF_BV logic either (1) a combination of a traditional


SAT solver and a specialized (bit-vector) theory solver is applied (see e.g. [BBC+05,
DM06a]), (2) the instance is pre-processed exploiting the higher level of abstraction
before the resulting (simplified) instance is bit-blasted to a traditional SAT solver
(see e.g. [GD07, BB09]), or (3) a specialized solver that directly works on the bit-
level of the problem is used (see e.g. [DBW+07]).
Having an efficient solver available, similar to Boolean SAT the real world prob-
lem is transformed into a QF_BV instance. But instead of a description in terms
of clauses, the higher level representation in terms of bit-vectors is used. Then, the
resulting instance is passed to the solver which is again used as a black box. The
higher abstraction which is now available can be exploited to accelerate the solving
process.
24 2 Preliminaries

2.3.2.2 QBF Solvers

Another generalization of the SAT problem is given by QBF satisfiability. Here,


variables of the Boolean function h additionally can be universally and existentially
quantified. More formally:

Definition 2.14 Let h : Bn → B be a Boolean function over the variables X (usually


given in CNF). Then, Q1 X1 . . . Qt Xt h with disjunct Xi ⊂ X and Qi ∈ {∃, ∀} is a
Quantified Boolean Formula (QBF). The QBF problem is to find an assignment to
the variables of h such that h evaluates to 1 with respect to the quantifiers or to
prove that no such assignment exists.

Example 2.14 Let ∃x2 , x3 ∀x1 (x1 + x2 + x 3 )(x 1 + x3 )(x 2 + x3 ). Then x2 = 1 and
x3 = 1 is a satisfying assignment for the QBF h. The value of x2 ensures that the
first clause becomes satisfied, while x3 ensures this for the remaining two clauses
for all possible assignments to x1 .

Obviously, solving QBF problems is significantly harder than solving pure SAT
instances. In fact, it is PSPACE-complete [Pap93]. Nevertheless, QBF enables the
formulation of many problems in a more compact way. In this sense, complexity is
moved from the problem formulation to the solving engine, i.e. the task can be for-
mulated in a more compact way resulting in a more complex problem to be solved
by the solver. However, since usually solving engines are well-engineered with re-
spect to the dedicated problem, this may lead to a faster solving process. Today,
recent solvers (e.g. [Bie05, Ben05]) exploit techniques like symbolic skolemization
to solve QBF instances (i.e. converting the instance to a normal form which enables
simplifications).

2.3.2.3 SWORD Solver

Due to the translation of the problem into CNF (or QF_BV logic, respectively),
problem-specific knowledge is lost. More illustrative, decisions, implications, and
learning schemes can only exploit the Boolean (bit-vector) description. In contrast,
with more problem-specific knowledge available, more options exist how to control
the search space traversion. This observation is exploited by the problem-specific
SAT solver SWORD [WFG+07].2
SWORD represents the problem in terms of so called modules. Each module de-
fines an operation over bit vectors of module variables. Each module variable is a
Boolean variable. By this, structural and semantical knowledge is available which
can be exploited by special algorithms for each kind of module. Furthermore, this

2 SWORD has been co-developed by the authors of this book. Even if SWORD is focused on
problem-specific knowledge, it can also be used as an SMT solver and already participated in the
respective SMT competitions in 2008 [WSD08] and 2009 [JSWD09], respectively.
2.3 Satisfiability Solvers 25

Fig. 2.14 SWORD algorithm

leads to a more compact problem formulation, since representing complex opera-


tions in terms of modules substitutes a significant amount of clauses.

Example 2.15 Consider an n × n-multiplier. This multiplier can be represented


by n2 AND gates and n − 1 adders [MK04]. Furthermore, a single AND gate can be
modeled by three clauses and requires θ (n2 ) auxiliary variables. Thus, just to encode
the AND gates a CNF with θ (n2 ) auxiliary variables and clauses is required, respec-
tively. In contrast, using SWORD only 3n module variables (for the two inputs and
the output of the multiplication) and a single (multiplier) module are needed to rep-
resent the whole multiplication.

Given a SAT instance including modules, the overall algorithm depicted in


Fig. 2.14 is used to solve the problem. This algorithm is similar to the procedure
as applied in standard SAT solvers (see Fig. 2.13 on p. 22): While free variables
remain (a), a decision is made (c), implications resulting from this decision are car-
ried out (d), and if a conflict occurs, it is analyzed (f). The important difference is
that SWORD has two operation levels: The global algorithm controls the overall
search process and calls local procedures of modules for decision and implication.
Thus, decision making and the implication engine can be adjusted for each type of
module.
In more detail, the solver first chooses a particular module based on a global
decision heuristic (c.1). Then, this module chooses a value for one of its variables
according to a local decision heuristic (c.2). Afterwards, the solver calls the local
implication procedures (d.2) of all modules that are potentially affected (d.1) by the
previous decision or implication. Here, a variable watching scheme similar to the
one presented in [MMZ+01] is used which can efficiently determine these modules.
The chosen modules imply further assignments and detect conflicts.
26 2 Preliminaries

Due to the two operation levels, problem-specific strategies e.g. for decision mak-
ing and propagation can be exploited by the modules. For example, decision making
can be prioritized so that modules, which are assumed to be “more important” than
others, are selected for a decision with a higher priority than less important modules.
Furthermore, different modules can be equipped with different strategies. For a more
detailed description of SWORD, the reader is referred to [WFG+07, WFG+09].
Therewith, all preliminaries required for this book have been introduced. Be-
sides an introduction of reversible and quantum logic, also the applied core tech-
niques have been briefly described. With that as a basis, the contributions towards
a design flow for reversible logic are proposed in the following chapters. Decision
diagrams are thereby applied for synthesis (Sect. 3.2), partially for exact synthesis
(Sect. 4.3.2), and for verification (Sect. 7.1.2), while Boolean satisfiability is ex-
ploited in exact synthesis (especially in Sect. 4.2 and partially in Sect. 5.3 as well as
Sect. 6.3), verification (Sect. 7.1.3), and debugging (Sect. 7.2). The extended SAT
solvers (i.e. SMT solvers, QBF solvers, and SWORD) are used to improve exact
synthesis (Sect. 4.3).

Note on Benchmarks In the following chapters, the respective contributions are


introduced and evaluated in detail. Different scopes are thereby considered that are
also reflected in the benchmark sets used to experimentally evaluate the respec-
tive methods. Synthesis approaches are evaluated using large functions (for the
proposed heuristic method), small functions (for exact synthesis), and irreversible
functions (to evaluate the different embeddings). In contrast, the methods target-
ing optimization, verification, and debugging work on a given circuit description.
Furthermore, different timeouts are applied in the respective evaluations (e.g. ex-
act synthesis normally requires more run-time than heuristic synthesis). As already
stated in the introduction, the benchmarks used in this book are publicly available
at www.revlib.org. The resulting tools can be obtained under www.revkit.org.
http://www.springer.com/978-90-481-9578-7
Chapter 3
Synthesis of Reversible Logic

Synthesis is the most important step while building complex circuits. Considering
the traditional design flow, synthesis is carried out in several individual steps such
as high-level synthesis, logic synthesis, mapping, and routing (see e.g. [SSL+92]).
To synthesize reversible logic, adjustments and extensions are needed. For example,
further tasks such as embedding of irreversible functions must be added. Further-
more, throughout the whole flow, the restrictions caused by the reversibility (no
fanout and feedback) and a completely new gate library must be considered as well.
In the last years, first approaches addressing some of these issues have been in-
troduced (see e.g. [SPMH03, MMD03, MD04b, Ker04, MDM05, GAJ06, HSY+06,
MDM07]). The first section of this chapter briefly reviews existing methods for the
individual steps. However, the research in this area is still at the beginning. So far,
the desired behavior of the circuit to be synthesized is given by function descriptions
like truth tables or permutations, respectively. As a result, current synthesis methods
are applicable to relatively small functions only and often need a significant amount
of run-time. This must be improved in order to design larger functions or complex
reversible systems in the future.
In this book, the wide area of reversible logic synthesis is covered by the follow-
ing three chapters—each chapter with an own detailed view on a particular aspect.
While Chap. 4 introduces exact (i.e. minimal) circuit synthesis, Chap. 5 discusses
aspects of embedding in detail. The present chapter builds the basis for them and ad-
ditionally proposes new methods that allows a fast synthesis of significantly larger
functions and more complex circuits, respectively. Since Toffoli circuits as intro-
duced in Sect. 2.1.2 generally build the basis for both, reversible and quantum cir-
cuits, in the following, the focus is on synthesis of Toffoli cascades. Nevertheless,
quantum circuit synthesis is additionally considered when it is appropriate.
As already mentioned, the first part of this chapter builds the basis for all re-
maining synthesis sections. Here, it is shown how irreversible functions must be
embedded into reversible ones before they can be applied to existing synthesis meth-
ods. Then, by using the example of the transformation-based approach introduced
in [MMD03], one of the previous synthesis methods is described and discussed.
Altogether, this briefly summarizes the basic synthesis steps for reversible logic as
they exist today.
R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 27
DOI 10.1007/978-90-481-9579-4_3, © Springer Science+Business Media B.V. 2010
28 3 Synthesis of Reversible Logic

Motivated by this (in particular by the limitations of the current synthesis meth-
ods), the second part of this chapter introduces a new synthesis approach [WD09,
WD10] that exploits Binary Decision Diagrams (BDDs) [Bry86]. BDDs allow an
efficient representation of large Boolean functions that can be mapped into re-
versible cascades. As a result, for the first time Toffoli circuits for functions con-
taining over 100 variables can be derived efficiently.
Finally, how to specify and synthesize more complex reversible circuits at higher
abstractions is considered in the third part of this chapter. For this purpose, a new
programming language (called SyReC) and a respective hierarchical synthesis ap-
proach is presented and evaluated [WOD10].

3.1 Current Synthesis Steps

This section illustrates the current synthesis steps that use well-established meth-
ods. First, the problem of embedding irreversible functions is considered. Second,
the synthesis itself is introduced. For the latter, a widely known approach, namely
the transformation-based approach introduced in [MMD03], is used. Most of the re-
maining synthesis methods apply similar strategies (e.g. [Ker04, GAJ06, MDM07])
or are developed on top of this method (e.g. [MDM05]).

3.1.1 Embedding Irreversible Functions

Table 3.1 shows the truth table of a 1-bit adder which is used as an example in this
section. The adder has three inputs (the carry-in cin as well as the two summands x
and y) and two outputs (the carry-out cout and the sum). The adder obviously is
irreversible, since
• the number of inputs differs from the number of outputs and
• there is no unique input-output mapping.
Even adding an additional output to the function (leading to the same number
of inputs and outputs) would not make the function reversible. Then, without loss
of generality, the first four lines of the truth table can be embedded with respect to
reversibility as shown in the rightmost column of Table 3.1. However, since cout = 0
and sum = 1 already appeared two times (marked bold), no unique embedding for
the fifth line is possible any longer. The same also holds for the lines marked italic.
This has already been observed in [MD04b]. Here, the authors came to the con-
clusion that at least log2 (μ) additional (garbage) outputs are required to make
an irreversible function reversible, where μ is the maximum number of times an
output pattern is repeated in the truth table. Since for the adder at most three out-
put patterns are repeated, log2 (3) = 2 additional outputs are required to make the
function reversible.
3.1 Current Synthesis Steps 29

Table 3.1 Truth table of an


adder cin x y cout sum

0 0 0 0 0 0
0 0 1 0 1 0
0 1 0 0 1 1
0 1 1 1 0 0
1 0 0 0 1 ?
1 0 1 1 0 1
1 1 0 1 0 ?
1 1 1 1 1 1

Table 3.2 Truth table of an


embedded adder 0 cin x y cout sum g1 g2

0 0 0 0 0 0 0 0
0 0 0 1 0 1 1 1
0 0 1 0 0 1 1 0
0 0 1 1 1 0 0 1
0 1 0 0 0 1 0 0
0 1 0 1 1 0 1 1
0 1 1 0 1 0 1 0
0 1 1 1 1 1 0 1
1 0 0 0 1 0 0 0
1 0 0 1 1 1 1 1
1 0 1 0 1 1 1 0
1 0 1 1 0 0 0 1
1 1 0 0 1 1 0 0
1 1 0 1 0 0 1 1
1 1 1 0 0 0 1 0
1 1 1 1 0 1 0 1

Adding new lines causes constant inputs and garbage outputs. The value of the
constant inputs can be chosen by the designer. Garbage outputs are by definition
don’t cares and thus can be left unspecified leading to an incompletely specified
function. However, many synthesis approaches require a completely specified func-
tion so that often all don’t cares must be assigned to a concrete value.
As a result, the adder is embedded in a reversible function including four vari-
ables, one constant input, and two garbage outputs. A possible assignment to the
constant as well as the don’t care values is depicted in Table 3.2 (where the original
adder function is marked bold). In the following, a synthesis method is introduced
assuming a completely specified reversible function as input. However, the concrete
embedding of irreversible functions (in particular the concrete assignment to don’t
30 3 Synthesis of Reversible Logic

Table 3.3 MDM procedure


line input output 1st step 2nd step 3rd step 4th step 5th step 6th step
(i) abcd abcd abcd abcd abcd abcd abcd abcd

0 0000 0000 0000 0000 0000 0000 0000 0000


1 0001 0111 0101 0001 0001 0001 0001 0001
2 0010 0110 0110 0110 0010 0010 0010 0010
3 0011 1001 1011 1111 1011 0011 0011 0011
4 0100 0100 0100 0100 0100 0100 0100 0100
5 0101 1011 1001 1101 1101 1101 0101 0101
6 0110 1010 1010 1010 1110 1110 1110 0110
7 0111 1101 1111 1011 1111 0111 1111 0111
8 1000 1000 1000 1000 1000 1000 1000 1000
9 1001 1111 1101 1001 1001 1001 1001 1001
10 1010 1110 1110 1110 1010 1010 1010 1010
11 1011 0001 0011 0111 0011 1011 1011 1011
12 1100 1100 1100 1100 1100 1100 1100 1100
13 1101 0011 0001 0101 0101 0101 1101 1101
14 1110 0010 0010 0010 0110 0110 0110 1110
15 1111 0101 0111 0011 0111 1111 0111 1111

cares) can have a significant impact on the synthesis results (i.e. on the number of
gates in the resulting circuit). Thus, this issue is again considered in Chap. 5 which
also provides examples showing the effect of different embeddings.

3.1.2 Transformation-based Synthesis

In this section, synthesis of reversible logic is exemplarily described using the ap-
proach from [MMD03]. The basic idea is to traverse each line of the truth table and
to add gates to the circuit until the output values match the input values (i.e. until the
identity is achieved). Gates are thereby chosen so that they don’t alter already con-
sidered lines. Furthermore, gates are added starting at the output side of the circuit
(this is because output values are transformed until the identity is achieved).
In the following, the approach is described using the example of the embedded
adder from Table 3.2. Table 3.3 shows the respective steps. The first column denotes
the truth table line numbers, while the second and third column give the function
specification of the adder. For brevity, the inputs 0, cin , x, y and the outputs cout ,
sum, g1 , g2 are denoted by a, b, c, d, respectively. The remaining columns provide
the transformed output values for the respective steps.
The approach starts at truth table line 0. Since for this line the input is already
equal to the output (both are assigned to 0000), no gate has to be added. In con-
trast, to match the output with the input in line 1, the values for c and b must be
3.2 BDD-based Synthesis 31

Fig. 3.1 Circuit obtained by


transformation-based
synthesis

inverted. To this end, two gates MCT({d}, c) (1st step) and MCT({d}, b) (2nd step)
are added as depicted in Fig. 3.1. Due to the control line d, this does not affect
the previous truth table line. In line 2 and line 3, an MCT({c}, b) as well as an
MCT({c, d}, a) is added to match the values of b and a, respectively (step 3 and 4).
For the latter gate, two control lines are needed to keep the already traversed truth
table lines unaltered. Afterwards, only two more gates MCT({d, b}, a) (5th step)
and MCT({c, b}, a) (6th step) are necessary to achieve the input-output identity.
The resulting circuit is shown in Fig. 3.1. This circuit consists of six gates and has
quantum cost of 18.
In [MMD03], further variations of this approach are discussed. In fact, this trans-
formation can also be applied in the inverse direction (i.e. so that the input must
match the output) and in both directions simultaneously. Furthermore, in [MDM05]
the approach has been extended by the application of templates. These help to re-
duce the size of the resulting circuits and thus to achieve circuits with lower cost.
Having this as a general introduction into synthesis of reversible logic, in the
following new synthesis approaches are proposed.

3.2 BDD-based Synthesis


The strategy introduced in the last section (namely selecting reversible gates so that
the chosen function representation becomes the identity) has been adopted and ex-
tended by many other researchers. More precisely, more compact data-structures
like decision diagrams [Ker04], positive-polarity Reed-Muller expansion [GAJ06],
or Reed-Muller spectra [MDM07] have been applied. But even if complementary
approaches are used (e.g. [SPMH03]), so far all approaches are applicable only to
relatively small functions, i.e. functions with at most 30 variables [GAJ06]. More-
over, often a significant amount of run-time is needed to achieve these results. Thus,
current synthesis methods are limited.
These limitations are caused by the underlying techniques. The existing synthesis
approaches often rely on truth tables (or similar descriptions like permutations) of
the function to be synthesized (e.g. in [SPMH03, MMD03]). But even if alternative
data-structures (e.g. the ones mentioned above) are used, the same limitations can
be observed.
In this section, a synthesis method that can cope with significantly larger func-
tions is introduced. The basic idea is as follows: First, for the function to be syn-
thesized a BDD (see Sect. 2.2.1) is built. This can be efficiently done for large
functions using existing well-developed techniques. Then, each node of the BDD is
substituted by a cascade of reversible gates. Since BDDs may include shared nodes
32 3 Synthesis of Reversible Logic

causing fanouts (which are not allowed in reversible logic), this may require addi-
tional circuit lines.
As a result, circuits composed of Toffoli or quantum gates, respectively, are ob-
tained in time and with memory linear to the size of the BDD. Moreover, since the
size of the resulting circuit is bounded by the BDD size, theoretical results known
from BDDs (see e.g. [Weg00, LL92]), can be transferred to reversible circuits. The
experiments show significant improvements (with respect to the resulting circuit
cost as well as to the run-time) in comparison to previous approaches. Furthermore,
for the first time large functions with more than hundred variables can be synthe-
sized at very low run-time.
In the remainder of this section, the BDD-based synthesis approach is introduced
as follows: In Sect. 3.2.1, the general idea and the resulting synthesis approach is de-
scribed in detail. How to exploit BDD optimizations is shown in Sect. 3.2.2, while
Sect. 3.2.3 briefly reviews some of the already known theoretical results from re-
versible logic synthesis and introduces bounds which follow from the new synthesis
approach. Finally, in Sect. 3.2.4 experimental results are given.

3.2.1 General Idea


In this section, the general idea of the BDD-based synthesis is proposed. The aim of
the approach is to determine a circuit realization for a given Boolean function. It is
well known that Boolean functions can be efficiently represented by BDDs. Given
a BDD G = (V , E), a reversible circuit can be derived by traversing the decision
diagram and substituting each node v ∈ V with a cascade of reversible gates. The
concrete cascade of gates depends on whether the successors of the node v are ter-
minals or not. For the general case (no terminals), the first row of Table 3.4 shows
a substitution with two Toffoli gates or five quantum gates, respectively. The fol-
lowing rows give the substitutions for the remaining cases. These cascades can be
applied to derive a complete Toffoli circuit (or quantum circuit, respectively) from
a BDD without shared nodes.

Example 3.1 Consider the BDD in Fig. 3.2(a). Applying the substitutions given in
Table 3.4 to each node of the BDD, the Toffoli circuit depicted in Fig. 3.2(b) results.

Remark 3.1 As shown in Table 3.4, an additional (constant) line is necessary if


one of the edges low(v) or high(v) leads to a terminal node. This is because of the
reversibility which has to be ensured when synthesizing reversible logic. As an ex-
ample consider a node v with high(v) = 0 (second row of Table 3.4). Without loss of
generality, the first three lines of the corresponding truth table can be embedded with
respect to reversibility as depicted in Table 3.5(a). However, since f is 0 in the last
line, no reversible embedding for the whole function is possible. Thus, an additional
line is required to make the respective substitution reversible (see Table 3.5(b)).1

1 For the same reason, it is also not possible to preserve the values for low(v) or high(v), respec-

tively, in the substitution depicted in the first row of Table 3.4.


3.2 BDD-based Synthesis 33

Table 3.4 Substitution of BDD nodes to reversible/quantum circuits


34 3 Synthesis of Reversible Logic

Fig. 3.2 BDD and Toffoli


circuit for f = x1 ⊕ x2

Table 3.5 (Partial) Truth (a) w/o add. line (b) with additional line
tables for node v with
high(v) = 0 xi low(f ) f – 0 xi low(f ) f xi low(f )

0 0 0 0 0 0 0 0 0 0
0 1 1 1 0 0 1 1 0 1
1 0 0 1 0 1 0 0 1 0
1 1 0 ? 0 1 1 0 1 1

Based on these substitutions, a method for synthesizing Boolean functions in


reversible or quantum logic can be formulated: First, a BDD for function f to be
synthesized is created. This can be done efficiently using state-of-the-art BDD pack-
ages (e.g. CUDD [Som01]). Next, the resulting BDD G = (V , E) is processed by
a depth-first traversal. For each node v ∈ V , cascades as depicted in Table 3.4 are
added to the circuit. As a result, circuits are synthesized that realize the given func-
tion f .

3.2.2 Exploiting BDD Optimization

To build compact BDDs, current state-of-the-art BDD packages exploit several op-
timization techniques such as shared nodes [Bry86], complement edges [BRB90],
or reordering [Bry86, Rud93]. In this section, it is shown how these techniques can
be applied to the proposed BDD-based synthesis.

3.2.2.1 Shared Nodes

If a node v has more than one predecessor, then v is called a shared node. The
application of shared nodes is common for nearly all BDD packages. Shared nodes
can be used to represent a sub-formula more than once without the need to rebuild
the whole sub-graph. In particular, functions f : Bn → Bm (i.e. functions with more
than one output) can be represented more compactly using shared nodes.
3.2 BDD-based Synthesis 35

Fig. 3.3 Substitution for shared nodes without terminals as successors

However, to apply shared nodes in reversible logic synthesis, the output value of
a respective node has to be preserved until it is not needed any longer. Considering
the substitutions depicted in Table 3.4, this holds for all cases where one of the edges
low(v) or high(v) lead to a terminal node. Here, all values of the inputs (in particular
of high(v) or low(v) that represent output values of other nodes) are preserved.
In contrast, this is not the case for the general case (first row of Table 3.4). Here,
only one value (namely the value from the select variable xi ) is preserved. Thus, a
modified substitution for shared nodes without terminals as successors is required.
Figures 3.3(a) and 3.3(b) show one possible substitution to a reversible cascade
and a quantum cascade, respectively. Besides an additional constant circuit line, this
requires one (three) additional reversible gates (quantum gates) in comparison to
the substitution of Table 3.4. But therefore, shared nodes are supported. Moreover,
this substitution also allows to represent the identity of a select variable (last row of
Table 3.4) by the respective input line of the circuit (i.e. without any additional gates
or lines). Previously, this was not possible, since the value of this circuit line was
not necessarily preserved (as an example see Fig. 3.2 where the value of the identity
node f  gets lost after node f is substituted).
Exploiting this, the synthesis algorithm proposed in the last section can be im-
proved as follows: Again a BDD for the function to be synthesized is build which is
afterwards traversed in a depth-first manner. Then, for each node v ∈ V , the follow-
ing checks are performed:
1. Node v represents the identity of a primary input (i.e. the select input)
In this case no cascade of gates is added to the circuit, since the identity can be
represented by the same circuit line as the input itself.
2. Node v contains at least one edge (low(v) or high(v), respectively) leading to a
terminal
In this case substitutions as depicted in Table 3.4 are applied, since they often
need a smaller number of gates and additionally preserve the values of all input
signals.
3. The values of low(v) and high(v) are still needed, since they represent either
shared nodes or the identity of an input variable
In this case the substitutions depicted in Fig. 3.3 are applied, since they preserve
the values of all input signals.
4. Otherwise
The substitution as depicted in the first row of Table 3.4 is applied, since no
input values must be preserved or a terminal successor occurs, respectively. In
36 3 Synthesis of Reversible Logic

Fig. 3.4 Toffoli circuits for shared BDD

this case, the smaller cascades (with respect to both the number of additional
lines and the number of gates) are preferred.

Example 3.2 In Fig. 3.4(a) a partial BDD including a shared node f  is shown.
Since the value of node f  is used twice (by nodes f1 and f2 ), an additional line
(the second one in Fig. 3.4(b)) and the cascade of gates as depicted in Fig. 3.3 are
applied to substitute node f1 . Then, the value of f  is still available such that the
substitution of node f2 can be applied. The resulting circuit is given in Fig. 3.4(b).
Figure 3.4(c) shows the resulting circuit for low(f  ) = 0 and high(f  ) = 1,
i.e. for f  representing the identity of xj . In this case no gates for f  are added.
Instead, the fifth line is used to store the value for both, xj and f  . Besides that, the
remaining substitutions are equal to the ones described above.

3.2.2.2 Complement Edges

Further reductions in BDD sizes can be achieved if complement edges [BRB90] are
applied. In particular, this allows to represent a function as well as its negation by
a single node only. If there is a complement edge e.g. between v and low(v), then
Shannon decomposition with an inverted value of low(v) is applied. To support
3.2 BDD-based Synthesis 37

complement edges in the proposed synthesis approach, adjusted substitutions have


to be used that take the inversion of complemented edges into account.
Table 3.6 shows the resulting cascades used in the proposed synthesis approach.
Note that complements have to be considered only at the low edges of the nodes,
since complements at high-edges can be mapped to them and vice versa. In some
cases, this leads to larger cascades in comparison to the substitution without comple-
ment edges (e.g. compare the second row of Table 3.6 to the first row of Table 3.4).
How far this can be compensated by the possible BDD reductions is discussed in
the experimental evaluation in Sect. 3.2.4.

3.2.2.3 Reordering of BDDs

Finally, different BDD orders may influence the synthesis results. It has been
shown that the order of the variables has a high impact to the size of the result-
ing BDD [Bry86] (see e.g. Fig. 2.11 on p. 19). Since reducing the number of nodes
may also reduce the size of the resulting circuits, reordering is considered in this
section.
In the past, several approaches have been proposed to achieve good orders
(e.g. sifting [Rud93]) or to determine exact results (e.g. [DDG00]) with respect to
the number of nodes. All these techniques can be directly applied to the BDD-based
synthesis approach and need no further adjustments of the already introduced sub-
stitutions.
Using these optimization techniques (i.e. shared nodes, complement edges, and
reordering), in Sect. 3.2.4 it is considered how they influence the resulting Toffoli
or quantum circuits, respectively. But before, it is briefly shown how the proposed
approach can be used to transfer theoretical results from BDDs to reversible logic.

3.2.3 Theoretical Consideration

In the past, first lower and upper bounds for the synthesis of reversible functions
containing n variables have been determined. In [MD04b], it has been shown that
there exists a reversible function that requires at least (2n / ln 3) + o(2n ) gates (lower
bound). Furthermore, the authors proved that every reversible function can be real-
ized with no more than n · 2n gates (upper bound). For a restricted gate library
leading to smaller quantum cost and thus only consisting of NOT, CNOT, and
two-controlled Toffoli gates (the same as applied for the substitutions proposed
here), functions can be synthesized with at most n NOT gates, n2 CNOT gates, and
9 · n · 2n + o(n · 2n ) two-controlled Toffoli gates (according to [SPMH03]). A tighter
upper bound of n NOT gates, 2 · n2 + o(n · 2n ) CNOT gates, and 3 · n · 2n + o(n · 2n )
two-controlled Toffoli gates has been proved in [MDM07]. In [PMH08] it has been
shown that linear reversible functions are synthesizable with CNOT gates only.
Moreover, their algorithm never needs more than Θ(n2 / log n) CNOT gates for any
linear function f with n variables.
38

Table 3.6 Subst. of BDD nodes with complement edge to reversible/quantum circuits
3 Synthesis of Reversible Logic
3.2 BDD-based Synthesis 39

Using the synthesis approach proposed in the last sections, reversible circuits
for a function f with a size dependent on the number of nodes in the BDD can
be constructed. More precisely, let f be a function with n primary inputs which
is represented by a BDD containing k nodes.2 Then, the resulting Toffoli circuit
consists of at most
• k + n circuit lines (since besides the input lines, for each node at most one addi-
tional line is added) and
• 3 · k gates (since for each node cascades of at most 3 gates are added according
to the substitutions of Table 3.4 and Fig. 3.3, respectively).
Asymptotically, the resulting reversible circuits are bounded by the BDD size.
Since for BDDs many theoretical results exist, using the proposed synthesis ap-
proach, these results can be transferred to reversible logic as well. In the following,
some results obtained by this observation are sketched.
• A BDD representing a single-output function has 2n nodes in the worst case.
Thus, each function can be realized in reversible logic with at most 3 · 2n gates
(where at most 2 · 2n CNOTs and 2 · 2n Toffoli gates are needed).
• A BDD representing a symmetric function has n·(n+1) 2 nodes in the worst case.
Thus, each symmetric function can be realized in reversible logic with a quadratic
number of gates (more precisely, a quadratic number of CNOTs and a quadratic
number of Toffoli gates are needed).
• A BDD representing specific functions, like AND, OR, or XOR has a linear size.
Thus, there exists a reversible circuit realizing these functions in linear size as
well.
• A BDD representing an n-bit adder has linear size. Thus, there exists a reversible
circuit realizing addition in linear size as well.
Further results (e.g. tighter upper bounds for general functions as well as for
respective function classes) are also known (see e.g. [Weg00, LL92]). Moreover,
in a similar way bounds for quantum circuits can be obtained. However, a detailed
analysis of the theoretical results that can be obtained by the BDD-based synthesis
is left for future work.

3.2.4 Experimental Results

The BDD-based synthesis method together with the suggested improvements has
been implemented in C++ on top of the BDD package CUDD [Som01]. In this
section, first a case study is given evaluating the effect of the respective BDD opti-
mization techniques on the resulting reversible or quantum circuits. Afterwards, the
proposed approach is compared against two previously proposed synthesis methods.

2 For simplicity, it is assumed that no complement edges are applied.


40 3 Synthesis of Reversible Logic

Benchmarks functions provided by RevLib [WGT+08] (including most of the


functions which have been previously used to evaluate existing reversible synthe-
sis approaches) as well as from the LGSynth package [Yan91] (a benchmark suite
for evaluating traditional synthesis) have been used. All experiments have been car-
ried out on an AMD Athlon 3500+ with 1 GB of memory. The timeout was set to
500 CPU seconds.

3.2.4.1 Effect of BDD Optimization

To investigate the effect of the respective BDD optimization techniques the pro-
posed synthesis approach has been applied to the benchmarks with the respective
techniques enabled or disabled. In the following, for each optimization technique
(i.e. shared nodes, complement edges, and reordering) the respective results are pre-
sented and discussed.

Shared Nodes Shared nodes can be enabled or disabled by manipulating the


unique table. Then, depending on the respective case, the substitutions of Table 3.4
or additionally of Fig. 3.3 are applied.
The results are summarized in Table 3.7. The first two columns give the name
of the benchmark (F UNCTION) as well as the number of primary inputs and out-
puts (PI/PO). Then, the number of resulting circuit lines (n), Toffoli gates (dTof ),
or quantum gates (dQua ), as well as the run-time of the synthesis approach (in CPU
seconds) is given for the naive approach (denoted by W / O S HARED N ODES) and the
approach that exploits shared nodes (denoted by WITH S HARED N ODES).
One can clearly conclude that the application of shared nodes leads to better
realizations for reversible and quantum logic. Both, the number of lines and the
number of gates can be significantly reduced. In particular, for the number of lines
this might not be obvious, since additional lines are required to support shared nodes
(see Sect. 3.2.2). But due to the fact that shared nodes also decrease the number of
terminal nodes (which require additional lines as well), this effect is compensated.

Complement Edges Complement edges are supported by the CUDD package and
can easily be disabled and enabled. For comparison, circuits from both, BDDs with
and BDDs without complement edges (denoted by WITH C OMPL . E DGES and W / O
C OMPL . E DGES, respectively), are synthesized. In the latter case, the substitutions
shown in Table 3.6 are applied whenever a successor is connected by a complement
edge. Shared nodes are also applied, since they make complement edges more ben-
eficial. The results are given in Table 3.8.3 The columns are labeled as described
above for Table 3.7.
Even if the cascades representing nodes with complement edges are larger in
some cases (see Sect. 3.2.2), improvements in the circuit sizes can be observed (see

3 Compared to Table 3.7, also benchmarks are considered for which no result could be determined

using the W / O S HARED N ODES approach.


3.2 BDD-based Synthesis 41

Table 3.7 Effect of shared nodes


F UNCTION PI/PO W/O S HARED N ODES WITH S HARED N ODES
n dTof dQua T IME n dTof dQua T IME

R EV L IB F UNCTIONS
decod24_10 2/4 7 7 21 <0.01 7 7 21 <0.01
4mod5_8 4/1 9 13 36 <0.01 9 13 36 <0.01
mini-alu_84 4/2 12 21 57 <0.01 11 20 52 <0.01
alu_9 5/1 15 30 73 <0.01 14 29 72 <0.01
rd53_68 5/3 31 85 212 <0.01 20 49 130 <0.01
hwb5_13 5/5 36 105 277 <0.01 32 91 238 <0.01
sym6_63 6/1 23 57 126 0.01 17 34 83 <0.01
hwb6_14 6/6 68 239 618 <0.01 53 167 437 <0.01
rd73_69 7/3 86 301 730 <0.01 38 105 272 <0.01
ham7_29 7/7 75 231 595 <0.01 36 88 224 <0.01
hwb7_15 7/7 136 526 1353 <0.01 84 284 744 <0.01
rd84_70 8/4 194 679 1650 0.01 52 140 373 <0.01
hwb8_64 8/8 277 1132 2903 0.02 129 456 1195 <0.01
sym9_71 9/1 104 325 724 <0.01 35 79 201 <0.01

LGS YNTH F UNCTIONS


xor5 5/1 17 40 98 <0.01 10 19 48 <0.01
bw 5/28 125 381 935 0.01 97 286 747 <0.01
9sym 9/1 104 325 724 <0.01 35 79 201 <0.01

e.g. rd84_70, 9sym, or cordic). But in particular for the LGSynth functions, some-
times better circuits result, when complement edges are disabled (see e.g. spla).
Here, the larger cascades obviously cannot be compensated by complement edge
optimization. In contrast, for quantum circuits in nearly all cases better realizations
are obtained with complement edges enabled. A reason for that is that the quan-
tum cascades for nodes with complement edges have the same size as the respective
cascades for nodes without complement edges in nearly all cases (see Table 3.4,
Fig. 3.3, and Table 3.6, respectively). Thus, the advantage of complement edges
(namely the possibility to create smaller BDDs) can be fully exploited without the
drawback that the respective gate substitutions become larger.

Reordering of BDDs To evaluate the effect of reordering the BDD on the result-
ing circuit sizes, three techniques are considered: (1) An order given by the occur-
rences of the primary inputs in the function to be synthesized (denoted by O RIG -
INAL ), (2) an optimized order achieved by sifting [Rud93] (denoted by S IFTING ),
and (3) an exact order [DDG00] which ensures the BDD to be minimal (denoted
by E XACT). Again, all created BDDs exploit shared nodes. Furthermore, comple-
ment edges are enabled in this evaluation. After applying the synthesis approach,
42 3 Synthesis of Reversible Logic

Table 3.8 Effect of complement edges


F UNCTION PI/PO W/O C OMPL . E DGES WITH C OMPL . E DGES
n dTof dQua T IME n dTof dQua T IME

R EV L IB F UNCTIONS
decod24_10 2/4 7 7 21 <0.01 6 11 23 <0.01
4mod5_8 4/1 9 13 36 <0.01 8 16 37 <0.01
mini-alu_84 4/2 11 20 52 <0.01 10 22 49 <0.01
alu_9 5/1 14 29 72 <0.01 11 25 53 <0.01
rd53_68 5/3 20 49 130 <0.01 13 34 75 <0.01
hwb5_13 5/5 32 91 238 <0.01 27 85 201 <0.01
sym6_63 6/1 17 34 83 <0.01 14 29 69 <0.01
hwb6_14 6/6 53 167 437 <0.01 46 157 377 <0.01
rd73_69 7/3 38 105 272 <0.01 25 73 162 <0.01
ham7_29 7/7 36 88 224 <0.01 18 50 82 <0.01
hwb7_15 7/7 84 284 744 <0.01 74 276 665 <0.01
rd84_70 8/4 52 140 373 <0.01 34 104 229 <0.01
hwb8_64 8/8 129 456 1195 <0.01 116 442 1067 <0.01
sym9_71 9/1 35 79 201 <0.01 27 62 153 <0.01

LGS YNTH F UNCTIONS


xor5 5/1 10 19 48 <0.01 6 8 8 <0.01
bw 5/28 97 286 747 <0.01 91 317 732 <0.01
ex5p 8/63 276 680 1676 0.02 233 706 1520 0.02
9sym 9/1 35 79 201 <0.01 27 62 153 <0.01
pdc 16/40 648 2074 4844 0.12 631 2109 4803 0.12
spla 16/46 567 1422 3753 0.09 559 1728 3799 0.09
cordic 23/2 76 177 448 0.02 53 109 265 0.02

circuit sizes as summarized in Table 3.9 result. Here again, the columns are labeled
as described above.
The results show that the order has a significant effect on the circuit size. In par-
ticular for the LGSynth functions, the best results are achieved with the exact order.
But as a drawback, this requires a longer run-time. Besides that, also in this evalua-
tion, examples can be found, showing that optimization for BDDs not always leads
to smaller circuits. Altogether, particularly for larger functions reordering is benefi-
cial. In most of the cases it is thereby sufficient to perform sifting instead of exact
reordering, since this lead to results of similar quality but without a notable increase
in run-time. For the following evaluations, BDD-based synthesis with shared nodes,
complement edges, and sifting has been applied.
3.2 BDD-based Synthesis 43

Table 3.9 Effect of variable ordering


F UNCTION PI/PO O RIGINAL S IFTING E XACT
n dTof dQua T IME n dTof dQua T IME n dTof dQua T IME

R EV L IB F UNCTIONS
decod24_10 2/4 6 11 23 <0.01 6 11 23 <0.01 6 11 23 <0.01
4mod5_8 4/1 8 16 37 <0.01 7 8 18 <0.01 7 8 18 <0.01
mini-alu_84 4/2 10 22 49 <0.01 10 20 43 <0.01 10 20 43 <0.01
alu_9 5/1 11 25 53 <0.01 7 9 22 <0.01 7 9 22 <0.01
rd53_68 5/3 13 34 75 <0.01 13 34 75 <0.01 13 34 75 <0.01
hwb5_13 5/5 27 85 201 <0.01 28 88 205 0.01 28 88 205 0.01
sym6_63 6/1 14 29 69 <0.01 14 29 69 <0.01 14 29 69 <0.01
hwb6_14 6/6 46 157 377 <0.01 46 159 375 <0.01 46 159 375 0.01
rd73_69 7/3 25 73 162 <0.01 25 73 162 <0.01 25 73 162 <0.01
ham7_29 7/7 18 50 82 <0.01 21 61 107 <0.01 21 61 107 0.01
hwb7_15 7/7 74 276 665 <0.01 73 281 653 <0.01 76 278 658 0.01
rd84_70 8/4 34 104 229 <0.01 34 104 229 <0.01 34 104 229 <0.01
hwb8_64 8/8 116 442 1067 <0.01 112 449 1047 <0.01 114 440 1051 0.03
sym9_71 9/1 27 62 153 <0.01 27 62 153 <0.01 27 62 153 <0.01

LGS YNTH F UNCTIONS


xor5 5/1 6 8 8 <0.01 6 8 8 <0.01 6 8 8 <0.01
bw 5/28 91 317 732 <0.01 87 307 693 <0.01 84 306 667 <0.01
ex5p 8/63 233 706 1520 0.02 206 647 1388 0.02 206 647 1388 0.06
9sym 9/1 27 62 153 <0.01 27 62 153 <0.01 27 62 153 0.01
pdc 16/40 631 2109 4803 0.12 619 2080 4781 0.13 619 2087 4850 66.38
spla 16/46 559 1728 3799 0.09 489 1709 4372 0.09 483 1687 4322 86.92
cordic 23/2 53 109 265 0.02 52 101 247 0.03 50 95 237 6.90

3.2.4.2 Comparison to Previous Synthesis Approaches

In this section, circuits synthesized by the BDD-based approach are compared to


the results generated by (1) the RMRLS approach (described in [GAJ06] using ver-
sion 0.2 in the default settings) and (2) the RMS approach (based on the concepts
of [MDM07] in its most recent version including improved handling of don’t care
conditions at the output). Since previous approaches (i.e. RMRLS and RMS) require
reversible functions as input, non-reversible functions are embedded into reversible
ones (based on the concepts introduced in Sect. 3.1.1). For BDD-based synthesis,
the original function description has been used which automatically leads to an em-
bedding.
The results are summarized in Table 3.10. The first columns give the name as
well as the number of the primary inputs (PI) and primary outputs (PO) of the origi-
nal function. In the following columns, the number of lines (n), the gate count (dTof ),
44

Table 3.10 Comparison of BDD-based synthesis to previous methods


F UNCTION PI/PO P REVIOUS A PPROACHES BDD- BASED S YNTHESIS Δ QC Δ QC
RMRLS [GAJ06] RMS [MDM07] (RMRLS) (RMS)
n dTof QC T IME dTof QC T IME n dTof QC dQua T IME

R EV L IB F UNCTIONS
decod24_10 2/4 4 11 55 497.51 7 19 <0.01 6 11 27 23 <0.01 −32 4
4mod5_8 4/1 5 9 25 0.86 5 9 <0.01 7 8 24 18 <0.01 −7 9
mini-alu_84 4/2 5 21 173 495.61 36 248 <0.01 10 20 60 43 <0.01 −130 −205
alu_9 5/1 5 9 49 122.48 9 25 0.01 7 9 29 22 0.01 −27 −3
rd53_68 5/3 7 – – >500.00 221 2646 0.14 13 34 98 75 <0.01 – −2571
hwb5_13 5/5 5 – – >500.00 42 214 0.01 28 88 276 205 0.01 – −9
sym6_63 6/1 7 36 777 485.47 15 119 0.13 14 29 93 69 <0.01 −708 −50
mod5adder_66 6/6 6 37 529 494.46 35 151 0.06 32 96 292 213 <0.01 −316 62
hwb6_14 6/6 6 – – >500.00 100 740 0.04 46 159 507 375 <0.01 – −365
rd73_69 7/3 9 – – >500.00 1344 20779 1.93 13 73 217 162 <0.01 – −20617
hwb7_15 7/7 7 – – >500.00 375 3378 0.18 73 281 909 653 <0.01 – −2725
ham7_29 7/7 7 – – >500.00 26 90 0.09 21 61 141 107 <0.01 – 17
rd84_70 8/4 11 – – >500.00 124 8738 9.92 34 104 304 229 <0.01 – −8509
hwb8_64 8/8 8 – – >500.00 229 3846 0.90 112 449 1461 1047 0.01 – −2799
sym9_71 9/1 10 – – >500.00 27 201 3.98 27 62 206 153 <0.01 – −48
hwb9_65 9/9 9 – – >500.00 2021 23311 1.45 170 699 2275 1620 0.02 – −21691
cycle10_2_61 12/12 12 26 1435 491.87 41 1837 26.17 39 78 202 164 0.09 −1271 −1673
plus63mod4096_79 12/12 12 – – >500.00 24 4873 17.74 23 49 89 79 0.08 – −4794
plus127mod8192_78 13/13 13 – – >500.00 25 9131 57.16 25 54 98 86 0.21 – −9045
plus63mod8192_80 13/13 13 – – >500.00 28 9183 57.19 25 53 97 87 0.20 – −9096
ham15_30 15/15 15 – – >500.00 – – >500.00 45 153 309 246 1.25 – –
3 Synthesis of Reversible Logic
Table 3.10 (Continued)
F UNCTION PI/PO P REVIOUS A PPROACHES BDD- BASED S YNTHESIS Δ QC Δ QC
RMRLS [GAJ06] RMS [MDM07] (RMRLS) (RMS)
3.2 BDD-based Synthesis

n dTof QC T IME dTof QC T IME n dTof QC dQua T IME

LGS YNTH F UNCTIONS


xor5 5/1 6 27 387 484.11 8 68 0.01 6 8 8 8 <0.01 −379 −60
bw 5/28 ∼ ∼ ∼ ∼ ∼ ∼ ∼ 87 307 943 693 <0.01 – –
ex5p 8/63 ∼ ∼ ∼ ∼ ∼ ∼ ∼ 206 647 1843 1388 0.02 – –
9sym 9/1 10 – – >500.00 27 201 4.00 27 62 206 153 <0.01 – −48
pdc 16/40 ∼ ∼ ∼ ∼ ∼ ∼ ∼ 619 2080 6500 4781 0.14 – –
spla 16/46 ∼ ∼ ∼ ∼ ∼ ∼ ∼ 489 1709 5925 4372 0.10 – –
cordic 23/2 ∼ ∼ ∼ ∼ ∼ ∼ ∼ 52 101 325 247 0.02 – –
cps 24/109 ∼ ∼ ∼ ∼ ∼ ∼ ∼ 930 2676 8136 6301 0.10 – –
apex2 39/3 ∼ ∼ ∼ ∼ ∼ ∼ ∼ 498 1746 5922 4435 0.24 – –
seq 41/35 ∼ ∼ ∼ ∼ ∼ ∼ ∼ 1617 5990 19362 14259 1.14 – –
e64 65/65 ∼ ∼ ∼ ∼ ∼ ∼ ∼ 195 387 907 713 0.04 – –
apex5 117/88 ∼ ∼ ∼ ∼ ∼ ∼ ∼ 1147 3308 11292 8387 0.14 – –
ex4p 128/28 ∼ ∼ ∼ ∼ ∼ ∼ ∼ 510 1277 4009 3093 0.03 – –
45
46 3 Synthesis of Reversible Logic

the quantum cost (QC), and the synthesis time (T IME) for the respective approaches
(i.e. RMRLS, RMS, and the BDD- BASED S YNTHESIS) are reported.4 For BDD-
based synthesis, additionally the resulting number of gates (and thus the quantum
cost) when directly synthesizing quantum gate circuits is given in the column de-
noted by dQua . Furthermore, a “∼” denotes that an embedding needed by the pre-
vious synthesis approaches could not be created within the given timeout. Finally,
the last two columns (Δ QC) give the absolute difference of the quantum cost for
the resulting circuits obtained by the BDD-based quantum circuit synthesis and the
RMRLS and RMS approach, respectively.
As a first result, one can conclude that for large functions to be synthesized it
is not always feasible to create a reversible embedding needed by the previous ap-
proaches. Moreover, even if this is possible, both RMRLS and RMS need a signif-
icant amount of run-time to synthesize a circuit from the embedding. As a conse-
quence, for most of the LGSynth benchmarks no result can be generated within the
given timeout. In contrast, the BDD-based approach is able to synthesize circuits
for all given functions within a few CPU seconds.
Furthermore, although BDD-based synthesis often leads to larger circuits with re-
spect to gate count and number of lines, the resulting quantum cost are significantly
lower in most of the cases (except for decod24_10, 4mod5_8, mod5adder_66, and
ham7_29). As an example, for plus63mod4096_79 the BDD- BASED S YNTHESIS
synthesizes a circuit with twice the number of lines but with two orders of magni-
tude fewer quantum cost in comparison to RMS. In the best cases (e.g. hwb9_65)
a reduction of several thousands in quantum cost is achieved. Note that quantum
costs are more important than gate count, since they consider gates with more con-
trol lines to be more costly. Thus, even if the total number of circuit lines that have
been added by the BDD- BASED S YNTHESIS is higher than by previous approaches,
significant improvements in the quantum cost are obtained. Furthermore, reversible
logic for functions with more than 100 variables can be automatically synthesized.
How to reduce the number of circuit lines is addressed later in Sect. 6.2.

3.3 SyReC: A Reversible Hardware Language

Besides synthesis of reversible functions, also how to realize more complex cir-
cuits has to be addressed in order to provide an efficient design flow. Thus, synthe-
sis of reversible logic has to reach a level which allows the description of circuits
at higher abstractions. For this purpose, programming languages can be exploited.
Considering traditional synthesis, approaches using languages like VHDL [LSU89],
SystemC [GLMS02], or SystemVerilog [SDF04] have been established to specify
and subsequently synthesize circuits. Even if first programming languages are also
available in the reversible domain (see e.g. [Abr05, PHW06, YG07]), so far they

4 T IME for BDD- BASED S YNTHESIS includes both, the time to build the BDD as well as to derive

the circuit from it.


3.3 SyReC: A Reversible Hardware Language 47

only have been used to design reversible software. Similar approaches for reversible
circuit synthesis are still missing.
In this section, the programming language SyReC is proposed intended to spec-
ify and afterwards to automatically synthesize reversible logic. For this purpose,
Janus [YG07]—an existing language designed to specify reversible software—is
used as a basis and enriched by new concepts as well as operations aimed to specify
reversible circuits. A hierarchical approach is presented that automatically trans-
forms the respective statements and operations of the new programming language
into a reversible circuit. Experiments show that complex circuits can be efficiently
generated with the help of SyReC. Moreover, a comparison to the BDD-based syn-
thesis approach presented in the previous section shows the advantages of SyReC,
if more complex circuits instead of single functions should be synthesized.
The remainder of this section is structured as follows: The SyReC programming
language as well as the new concepts, operations, and restrictions applied for hard-
ware synthesis are introduced in Sect. 3.3.1. Section 3.3.2 describes the hierarchi-
cal synthesis approach and explains in detail how reversible circuits specified in
SyReC can be generated. Finally, experimental results and conclusions are given in
Sect. 3.3.3.

3.3.1 The SyReC Language

As mentioned above, Janus [YG07] is used as a basis for the programming lan-
guage SyReC to specify reversible systems to be synthesized as circuits. This sec-
tion briefly reviews the syntax of the Janus language. Afterwards, the new concepts
and operations added to address circuit synthesis are introduced.

3.3.1.1 The Software Language Janus

Janus is a reversible language that is simple but yet powerful enough to design prac-
tical reversible software systems [YG07]. It provides fundamental constructs to de-
fine control and data operations while still preserving reversibility.
Figure 3.5 shows the syntax of Janus. Each Janus program (denoted by P) con-
sists of variable declarations (denoted by D) and procedure declarations. The vari-
ables have non-negative integer values and are denoted by strings. They can be
grouped as arrays. New variables are initially assigned to 0. Constants are denoted
by c. Each procedure consists of a name (id) and a sequence of statements (denoted
by S) including operations, reversible conditionals, reversible loops, as well as call
and uncall of procedures (lines 4 to 7 in Fig. 3.5). Variables within statements are
denoted by V .
In the following, it is distinguish between reversible assignment operations (de-
noted by ⊕) and (not necessarily reversible) binary operations (denoted by ). The
former ones assign values to a variable on the left-hand side. Therefore, the respec-
tive variable must not appear in the expression on the right-hand side. Furthermore,
48 3 Synthesis of Reversible Logic

only a restricted set of assignment operations exists, namely increase (+ =), de-
crease (− =), and bit-wise XOR (ˆ =), since they preserve the reversibility (i.e. it is
possible to compute these operations in both directions). In particular, the bit-wise
XOR is of interest because aˆ = b is equal to an assignment a = b if a is equal to 0.
In contrast, binary operations, i.e. arithmetic (+, ∗, /, %, ∗/), bit-wise (&, |, ˆ),
logical (&&, ||), and relational (<, >, =, ! =, <=, >=) operations, may not be re-
versible. Thus, they can only be used in right-hand expressions which preserve
(i.e. do not modify) the values of the respective inputs. In doing so, all computa-
tions remain reversible since the input values can be applied to revert any operation.
For example, to specify a multiplication (i.e. a ∗ b) in Janus, a new free variable c
must be introduced which is used to store the product (i.e. cˆ = a ∗ b is applied).
In comparison, to common (irreversible) programming languages this forbids state-
ments like a = a ∗ b. Having this as basis, Janus can be used to specify reversible
programs and execute them in a reversible manner (i.e. forward and backward).

3.3.1.2 The Hardware Language SyReC

In the following, the programming language SyReC for synthesis of reversible cir-
cuits is described. Janus is thereby used as a basis and enriched by further concepts
(e.g. declaring circuit signals of different bit-width) and operations (e.g. bit-access
and shifts). Besides that, some restrictions are applied (e.g. dynamic loops are for-
bidden in hardware). Incorporating all these aspects, a syntax of a programming
language for reversible circuit synthesis as depicted in Fig. 3.6 results. More pre-
cisely, the following extensions and restrictions have been applied:
• The declaration of variables has been extended so that the designer can declare
variables with different bit-widths (line 2).
• Arrays are not allowed.
• Operators to access single bits (x.N), a range of bits (x.N:N), as well as the size
(#V) of a variable, respectively, have been added (line 3 and line 4).
• Since loops must be completely unrolled during synthesis, the number of itera-
tions has to be available before compilation. That is, dynamic loops (defined by
expressions) are not allowed (line 7).
• Macros for the SWAP operation (<=>) (line 5) as well as for the for-loop state-
ment (line 8) have been added.5
• Further operations used in hardware design (e.g. shifts <) have been added
(line 10 and line 14).

Example 3.3 Figure 3.7 shows a simple Arithmetic Logic Unit (ALU) illustrating
the core concept of the resulting hardware programming language. The basic arith-
metic operations can be thereby applied directly. Furthermore, control variables can
be defined with a lower bit-width than data variables.

5 These extensions are not necessarily needed (i.e. they can also be expressed by the existing oper-

ations), but they allow a more intuitive programming of reversible circuits.


3.3 SyReC: A Reversible Hardware Language 49

Fig. 3.5 Syntax of the (1) P ::= D∗ (procedure id S+ )+


software language Janus (2) D ::= x | x[c]
(3) V ::= x | x[E]
(4) S ::= V ⊕ = E |
(5) if E then S else S fi E |
(6) from E do S loop S until E |
(7) call id | uncall id | skip
(8) E ::= c | V | (E E)
(9) ⊕ ::= + | − | ˆ
(10) ::= ⊕ | ∗ | / | % | ∗/ | & | | | && |  |
(11) <|>|=|!=|<=|>=

Fig. 3.6 Syntax of the (1) P ::= D∗ (procedure id S+ )+


hardware language SyReC (2) D ::= x | x ( c )
(3) V ::= x | x.N:N | x.N
(4) N ::= c | #V
(5) S ::= V < => V | V ⊕ = E |
(6) if E then S else S fi E |
(7) from N do S loop S until N |
(8) for N do S until N |
(9) call id | uncall id | skip
(10) E ::= N | V | (E E) | (E < N)
(11) ⊕ ::= + | − | ˆ
(12) ::= ⊕ | ∗ | / | % | ∗/ | & | | | && |  |
(13) < | > | = | ! = | <= | >=
(14) < ::= << | >>

Fig. 3.7 SyReC example: op ( 2 ) x0 x1 x2


ALU procedure a l u
i f ( op = 0 ) t h e n
x0 ^= ( x1 + x2 )
else
i f ( op = 1 ) t h e n
x0 ^= ( x1 − x2 )
else
i f ( op = 2 ) t h e n
x0 ^= ( x1 ∗ x2 )
else
x0 ^= ( x1 ^ x2 )
f i ( op = 2 )
f i ( op = 1 )
f i ( op = 0 )
50 3 Synthesis of Reversible Logic

In contrast to previous approaches, this allows a much easier specification of


(complex) reversible circuits. Having this, the next section describes how circuits
can be synthesized from this representation.

3.3.2 Synthesis of the Circuits

Using the language introduced above it is possible to specify reversible circuits on


a higher level. As demonstrated by Example 3.3 this particularly allows to design
complex circuits in an easier way than e.g. by truth tables or decision diagrams.
Nevertheless, the specified circuits still need to be synthesized. To this end, a hierar-
chical synthesis method is proposed that uses existing realizations of the individual
operations (i.e. building blocks) and combines them so that the desired circuit re-
sults. More precisely, the approach (1) traverses the whole program and (2) adds
cascades of reversible gates to the circuit to be synthesized for each statement or
expression, respectively. In the following, the individual mappings of the operations
and expressions to the respective reversible cascades are described.

3.3.2.1 Reversible Assignment Operations

As introduced in Sect. 3.3.1, reversible assignment operations include those which


are reversible even if they assign a new value to the variable on the left-hand side of
a statement. In the following, the notation as depicted in Fig. 3.8(a) is used to denote
such an operation in a circuit structure.6 Solid lines represent the variable(s) on the
right-hand side of the operation, i.e. the variable(s) whose values are preserved.
The simplest reversible assignment operation is the bit-wise XOR (e.g. x1 ˆ = x2 ).
For Boolean variables, this operation can be synthesized by a single Toffoli gate as
shown in Fig. 3.8(b). If variables with a bit-width greater than 1 are applied, then a
Toffoli gate has to be analogously applied for each bit.
To synthesize the increase operation (e.g. a+ = b), a modified addition network
is added. In the past, several realizations of addition in reversible logic have been
investigated. In particular, it is well known that the minimal realization of a one-
bit adder consists of four Toffoli gates (see e.g. [WGT+08]). Thus, cascading the
required number of one-bit adders is a possible realization. But since every one-
bit adder also requires one constant input, this is a very poor solution with respect
to circuit lines. In contrast, heuristic realizations exist that require a fewer number
of additional lines (see e.g. [TK05]). Here, a realization with only one additional
line (which additionally can be reused for any further addition operation) is used.
A cascade showing this realization for a 3-bit addition is depicted in Fig. 3.8(c).
Nevertheless, any other adder realization can be applied as well.

6 Figure 3.8(a) shows the notation for a single bit operation. For larger bit-widths the notation is

extended accordingly.
3.3 SyReC: A Reversible Hardware Language 51

Fig. 3.8 Realization of reversible assignment operations

Fig. 3.9 Realization of binary operations

Finally, the mapping for the decrease operation is left (e.g. a− = b). Here, also
the realization from Fig. 3.8(c) is applied, which is fed with a negated variable value.

3.3.2.2 Binary Operations

Binary operations include operations that are not necessarily reversible so that its
inputs have to be preserved to allow a (reversible) computation in both directions.
To denote such operations, in the following the notation depicted in Fig. 3.9(a) is
used. Again, solid lines represent the variable(s) whose values are preserved (i.e. in
this case the input variables).
Synthesis of irreversible functions in reversible logic is not new so that for most
of the respective operations already a reversible circuit realization exists. Additional
52 3 Synthesis of Reversible Logic

lines with constant inputs are thereby applied to make an irreversible function re-
versible (see e.g. Sect. 3.1.1). As an example, Fig. 3.9(b) shows a reversible cascade
that realizes an AND operation. As can be seen, this requires one additional circuit
line with a constant input 0. Similar mappings exist for all other operations.
However, since binary operations can be applied together with reversible assign-
ment operations (e.g. cˆ = a&b), sometimes a more compact realization is possible.
More precisely, additional (constant) circuit lines can be saved (at least for some
operations), if the result of a binary operation is applied to a reversible assignment
operation. As an example, Fig. 3.9(c) shows the realization for cˆ = a&b where no
constant input is needed but the circuit line representing c is used instead. However,
such a “combination” is not possible for all operations. As an example, Fig. 3.9(d)
shows a two-bit addition whose result is applied to a bit-wise XOR, i.e. cˆ = a + b.
Here, removing the constant lines and directly applying the XOR operation on the
lines representing c would lead to a wrong result. This is because intermediate re-
sults are stored at the lines representing the sum. Since these values are reused later,
performing the XOR operation “in parallel” would destroy the result. Thus, to have
a combined realization of a bit-wise XOR and an addition, a concrete embedding
for this case must be generated. Since finding and synthesizing respective embed-
dings for all affected operations and combinations, respectively, is a non-trivial task,
a more detailed consideration of this aspect is left for future work. So far, constant
lines are applied to realize the desired functionality.
In this sense, most of the binary operations (in particular the bit-wise, logical,
and relational operations as well as the addition) are synthesized. Besides that, the
realization of the multiplication is of interest. A couple of possible ways are de-
scribed in [OWDD10]. Figure 3.9(e) briefly shows how multiplication is realized by
the proposed synthesis method. As can be seen, partial products are applied. Con-
sidering one of the factors a, each time a respective bit of this factor (denoted by ai )
is equal to 1, the respective partial product is added to the product. This allows to
reuse the increase realization introduced in the previous section.

3.3.2.3 Conditional Statements, Loops, Call/Uncall

Finally, the realization of control operations as reversible cascades is considered.


Loops and procedure calls/uncalls can be realized in a straightforward way. More
precisely, loops are realized by simple cascading (i.e. unrolling) the respective state-
ments within a loop block for each iteration. Since the number of iterations must be
available before synthesis (see Sect. 3.3.1), this results in a finite number of state-
ments which is subsequently processed. Call and uncall of procedures are handled
similarly. Here, the respective statements in the procedures are cascaded together.
To realize conditional statements (e.g. the one shown in Fig. 3.10(a)), two vari-
ants are proposed. Figure 3.10(b) shows the first one, which is realized in three
steps:
1. All variables in the then- or else-block, respectively, which potentially are as-
signed with a new value (i.e. that are on the left-hand side of a reversible assign-
3.3 SyReC: A Reversible Hardware Language 53

Fig. 3.10 Realization of an if-statement

ment operation) are duplicated. This respectively requires an additional circuit


line with constant input 0.
2. The statements in the respective blocks are mapped to reversible cascades. The
duplications introduced in the last step are thereby applied to intermediately store
the results of the then-block and the original values of the variables in the else-
block, respectively.
3. Depending on the result of the if-statement e, the respective values of the dupli-
cated lines and the original lines are swapped. More precisely, in the example of
Fig. 3.10(a) the value of a is swapped with its (newly assigned) duplication iff e
evaluates to 1. Analogously, iff e evaluates to 0 the (newly assigned) value of c
is passed through.
The second realization of a conditional statement is depicted in Fig. 3.10(c). In
contrast to the previous one, here no duplications (and therewith no additional circuit
lines) are required. Instead, control lines are added to all gates in the realization
of the respective then- and else-block. Thus, the operations are computed iff the
expression e is assigned to 1 or 0, respectively. A NOT gate (i.e. a Toffoli without
control lines) is thereby used to flip the value of e so that the gates of the else block
can be “controlled” as well.
Having both realizations, it is up to the designer which one should be used during
synthesis. Using the first realization leads to additional circuit lines (particularly in
quantum logic a restricted resource). This is not in case in the second realization;
however, here due to the additional control lines both the quantum cost and the
transistor cost of the circuit significantly increase. Besides other aspects, this is also
evaluated in the experiments in the next section.

3.3.3 Experimental Results

The proposed synthesis approach for the programming language SyReC has been
implemented in C++. In this section, experimental results obtained by this ap-
proach are provided. In particular, the different realizations of conditional state-
ments are evaluated in more detail. Furthermore, the results obtained by the pro-
54 3 Synthesis of Reversible Logic

posed approach are compared to the BDD-based synthesis method introduced in


Sect. 3.2.
As benchmarks a couple of programs are used including a simple arithmetic logic
unit (denoted by alu; see also Fig. 3.7), a program computing the average of 8 or 16
values (denoted by avg8 and avg16), a logic unit applying bit-wise operations in-
stead of arithmetic (denoted by lu), as well as an arbiter with 8 clients (denoted
by arb8). Thus, results obtained by programs including arithmetic (alu, avg8, and
avg16) as well as bit-wise operations (lu and arb8) have been evaluated. All exper-
iments have been carried out on an AMD DualCore Athlon 3 GHz machine with
32 GB of main memory. The time-out was set to 500 CPU seconds.
In a first evaluation the effect of different if-statement realizations was consid-
ered in detail. The results are presented in Table 3.11(a). The first column gives the
name of the benchmark followed by the applied bit-width of data variables (denoted
by BW) and the resulting number of primary inputs (denoted by PI). The following
columns give the number of constant input lines (CI), the number of gates (d), the
quantum cost (QC), and the transistor cost (TC) of the circuits obtained using the
if-realization with additional circuit lines (denoted by IF - STM . W / ADD . LINES) or
without additional circuit lines (denoted by IF - STM . W / O ADD . LINES), respectively.
Finally, the run-time is reported for both approaches in column T IME.
The results confirm the discussion from the last section. If additional circuit lines
are applied, the respective cost can be significantly reduced. In comparison to the re-
alization without additional circuit lines for if-statements, approx. 80% of the quan-
tum costs and at least for alu and lu more than 50% of the transistor costs can
be saved (this does not hold for the avg benchmarks since they do not include if-
statements). In contrast, this leads to a significant increase in the number of constant
inputs.
Finally, the results are compared to the BDD-based synthesis approach. Here, a
function given as binary decision diagram is used as input. Thus, the circuits ob-
tained by SyReC are extracted as BDDs and re-synthesized based on the concepts
introduced in Sect. 3.2.7 The results are given in Table 3.11(b) using the same de-
notation as described above. As can be clearly seen, the proposed approach out-
performs the BDD-based synthesis in all objectives: Circuits with significantly less
number of gates, quantum cost, and transistor cost, respectively, are synthesized in
much less run-time (only the small arb8 is an exception). Moreover, in particular for
the benchmarks including arithmetic (i.e. alu and awg) for large bit-widths no cir-
cuit can be synthesized within the given time-out. This can be explained by the fact,
that in particular for the multiplication no efficient representation as BDD exists.
Thus, for these examples the BDD-based approach suffers from memory explosion.
Altogether, SyReC allows the specification of complex circuits that are hard to
describe in terms of a decision diagram or truth table, respectively. Afterwards, the
specified circuits can efficiently be synthesized.

7A similar comparison to further work (e.g. [GAJ06, MDM07]) was not possible since due to
memory limitations the respective benchmarks cannot be represented in terms of truth tables which
is required by these approaches.
3.3 SyReC: A Reversible Hardware Language 55

Table 3.11 Experimental results


(a) Applying the programming language SyReC
B ENCH BW PI IF - STM . W / ADD . LINES IF - STM . W / O ADD . LINES
CI d QC TC T IME CI d QC TC T IME

alu 8 26 65 453 2069 5840 0.03 s 41 408 11125 13208 0.02 s


alu 16 50 121 1345 7377 19856 0.03 s 73 1252 40749 45240 0.03 s
alu 32 98 233 4473 27785 72464 0.03 s 137 4284 155677 166136 0.03 s

avg8 8 72 11 405 885 4200 0.01 s 11 405 885 4200 0.01 s


avg8 16 144 19 861 1853 8872 0.01 s 19 861 1853 8872 0.01 s
avg8 32 288 35 1773 3789 18216 0.01 s 35 1773 3789 18216 0.01 s

avg16 8 136 12 754 1654 7832 0.01 s 12 754 1654 7832 0.01 s
avg16 16 272 20 1602 3462 16536 0.01 s 20 1602 3462 16536 0.01 s
avg16 32 544 36 3298 7078 33944 0.01 s 36 3298 7078 33944 0.01 s

lu 8 26 64 164 392 1328 0.02 s 40 119 1960 2960 0.02 s


lu 16 50 120 308 744 2544 0.02 s 72 215 3768 5584 0.02 s
lu 32 98 232 596 1448 4976 0.03 s 136 407 7384 10832 0.02 s

arb8 1 16 37 80 296 640 0.45 s 1 24 746 800 0.44 s

(b) BDD-based synthesis


B ENCH BW PI BDD-based synthesis
CI d QC TC T IME

alu 8 26 768 3560 11196 70792 0.06 s


alu 16 50 541099 2842702 9380494 57530752 283.63 s
alu 32 98 – – – – >500.00 s

avg8 8 72 2933 10449 36581 217240 4.61 s


avg8 16 144 – – – – >500.00 s
avg8 32 288 – – – – >500.00 s

avg16 8 136 7410 25454 89938 532424 9.61 s


avg16 16 272 – – – – >500.00 s
avg16 32 544 – – – – >500.00 s

lu 8 26 111 331 823 5928 0.01 s


lu 16 50 215 651 1623 11688 0.03 s
lu 32 98 423 1291 3223 23208 0.08 s

arb8 1 16 15 49 101 824 0.01 s


56 3 Synthesis of Reversible Logic

3.4 Summary and Future Work

Having automated synthesis methods is crucial in the design of reversible or quan-


tum circuits. In this chapter, the current synthesis steps (including embedding and
actual synthesis) have been described. Even if only a selected approach has been
considered in detail, it was illustrated and discussed that synthesis of reversible logic
currently is still at the beginning. Most of the existing methods are not able to syn-
thesize large Boolean functions—not to mention complex reversible systems.
One contribution towards synthesis of significantly larger circuits was made in
the second part of this chapter. Here, BDDs representing the function to be syn-
thesized are constructed whose nodes afterwards are substituted with a cascade of
Toffoli or quantum gates, respectively. While previous approaches are only able to
handle functions with up to 30 variables at high run-time, the BDD-based approach
can synthesize circuits for functions with more than hundred variables in just a few
CPU seconds. Furthermore, with respect to quantum cost (i.e. number of quantum
gates), significantly smaller realizations are obtained.
Due to these promising results, BDD-based synthesis should be subject to further
research. In particular, a detailed analysis of the theoretical results that can be ob-
tained by the BDD method is left. Section 3.2.3 gave a first sketch. However, BDDs
are very well understood so that for sure many more results can be transferred. Fur-
thermore, it would be of interest to evaluate the effect of the proposed approach if
adjusted cost functions for reordering as well as other decompositions (e.g. positive
Davio or negative Davio) are applied.
Besides that, synthesis of reversible logic should reach the “next” level, i.e. the
system level. Therefore, reversible hardware programming languages are needed.
SyReC, as introduced in the third part of this chapter, provides a first approach in
this direction. Using this language in combination with the proposed hierarchical
synthesis approach enables synthesis of more complex reversible circuits for the
first time.
Nevertheless, the circuits resulted from both, BDD-based as well as SyReC-
based synthesis, still require a notable amount of additional circuit lines. Depend-
ing on the technology this might be a drawback (e.g. for quantum systems where
the number of lines or qubits, respectively, is limited). Thus, how to reduce these
number of lines in a circuit is an important question. Section 6.2 introduces a post-
process approach which addresses this issue. Besides that, finding embeddings lead-
ing to fewer additional circuit lines (e.g. for the binary operations in SyReC) is an
important task for future work.
Chapter 4
Exact Synthesis of Reversible Logic

In contrast to the heuristic approaches introduced in the last chapter for synthesis
of reversible logic, exact methods determine a minimal solution, i.e. a circuit with a
minimal number of gates or quantum cost, respectively. Ensuring minimality often
causes an enormous computational overhead and thus exact approaches are only
applicable to relatively small functions. Nevertheless, it is worth to consider exact
methods, since they
• allow finding smaller circuits than the currently best known realizations,
• allow the evaluation of the quality of heuristic approaches, and
• allow the computation of minimal circuits as building blocks for larger circuits.
For example, improving heuristic results by 10% is significant, if this leads to
optimal results, but marginal if the generated results are still factors away from
the optimum. Conclusions like this are only possible, if the optimum is available.
Another aspect is the computation of building blocks that can be reused to synthesize
larger designs. For example, the substitutions used in the last chapter for the BDD-
based synthesis have been generated using exact approaches.
However, only very little research has been done in exact synthesis of reversible
logic so far. A method based on a depth-first traversal with iterative deepening
that uses circuit equivalences to rewrite a limited set of gates has been presented
in [SPMH03]. The authors of [YSHP05] introduce an exact algorithm based on
group theory. But for both approaches, only results for functions with up to three
variables are reported. Furthermore, in [HSY+06] another exact synthesis method
based on a reachability analysis has been proposed which is geared towards quan-
tum gates. However, also here only functions with three and a couple of functions
with four variables can be handled in a significant amount of run-time.
This chapter proposes methods based on Boolean satisfiability (SAT) that allow
a faster exact synthesis and that is applicable for functions with up to six variables.
The general idea is as follows: The synthesis problem is formulated as a sequence
of decision problems. Then, each decision problem is encoded as a SAT instance
and checked for satisfiability using an off-the-shelf SAT solver. If the instance is
unsatisfiable, then no realization with d gates exists and a check for another d value
R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 57
DOI 10.1007/978-90-481-9579-4_4, © Springer Science+Business Media B.V. 2010
58 4 Exact Synthesis of Reversible Logic

is performed. Otherwise, the circuit can be obtained from the satisfying assignment.
Minimality is ensured by iteratively increasing d starting with d = 1.
In the following, the main flow and the respective SAT encodings for Tof-
foli circuit synthesis [GCDD07, GWDD09a] as well as for quantum logic synthe-
sis [GWDD08, GWDD09b] are introduced in detail in Sects. 4.1 and 4.2, respec-
tively. Since nowadays very powerful techniques for solving SAT instances exist
(see Sect. 2.3), already this enables efficient exact synthesis of reversible functions.
However, further improvements are possible, if (1) the problem is formulated and
solved on the SMT level [WGSD08], (2) additional knowledge provided by the ded-
icated solving engine SWORD is exploited [WG07, GWDD09a], or (3) quantified
Boolean satisfiability is used [WLDG08]. The respective encodings and methods are
described in Sect. 4.3. The last (and most efficient) method also has been applied
and evaluated to synthesize reversible circuits including Fredkin and Peres gates.
Finally, the chapter is concluded and future work is sketched in Sect. 4.4.

4.1 Main Flow


In this section, the main concepts of the exact synthesis algorithm for reversible
and quantum logic are presented. The basic idea is as follows: Given a reversible
function f to be synthesized. Then, the exact synthesis of f is formulated as a
sequence of decision problems. In each iteration, it is checked if for the reversible
function f and a natural number d a circuit with exactly d gates exists. Here, options
for specifying and solving the decision problem as well as finding the optimal value
for d exist. The decision problem is encoded and solved by using SAT techniques
which is described in the next section in detail.
To find the optimal value for d, i.e. to determine a d where the resulting circuit
has the minimal number of gates, a possible approach is to start searching a solu-
tion with d = 1. If there is no solution, i.e. the decision problem returns false, the
number of gates (i.e. d) is incremented until one of the remaining decision problems
becomes true. Following this procedure, minimality is ensured. Obviously, it is also
possible to choose another technique to reach the optimal d. For example, upper or
lower bounds can be exploited. However, for the exact synthesis problem at hand the
following observation holds that is first illustrated with an example and afterwards
formulated in a lemma.

Example 4.1 Consider the reversible function in Fig. 4.1(a). For this function two
Toffoli circuits are shown in Fig. 4.1(b). By exhaustive enumeration it has been
proven that, even if there are realizations including d = 2 gates and d = 4 gates, no
realization with d = 3 gates exist.

Hence, if a realization with d gates has been found, minimality cannot be shown
by only proving that there is no realization with d − 1 gates. However, for Toffoli
circuits it is sufficient to prove that there are no realizations with d − 1 and d − 2
gates as the following lemma shows:
4.1 Main Flow 59

Fig. 4.1 Function with


circuits including d = 2/4,
but not d = 3 gates

Fig. 4.2 Extension of a


circuit with d gates to a
circuit with d + 2 gates

Fig. 4.3 Gate equivalences

Lemma 4.1 Let f : Bn → Bn be a reversible function to be synthesized. A Tof-


foli circuit including d gates is minimal with respect to the number of gates, if no
realization with d − 1 gates and no realization with d − 2 gates exists.

Proof Assuming that for a reversible function f a realization with d gates and a
smaller realization with d − r gates exist (r > 0). Then, as shown in Fig. 4.2, the
smaller realization can be extended by two additional NOT gates so that the resulting
circuit still realizes f . By cascading the respective NOT gates, it follows that there
are realizations with d − r + 2 · s gates as well (s > 0). Thus, if there is a realization
with d − r gates, there has to be at least one realization with d − 1 or with d − 2
gates. 

If quantum gate circuits are considered this observation can be applied as well.
Moreover, if at least a CNOT gate, a V gate, or a V+ gate occurs, a somewhat tighter
extension is possible. Then, each of these gates can be “extended” as depicted in
Fig. 4.3. This leads to valid realizations with costs d + r for any r ∈ N. As a result,
it is sufficient to check for a realization with d − 1 gates to prove that d is minimal.
Thus, the minimal d can be approached by two methods: (1) Start with d = 1
and iteratively increment d until a realization is found or (2) determine a value
for d (e.g. by heuristics or bounds) and non-iteratively modify d until a minimal
realization (approved by Lemma 4.1) is found. In this context, non-iteratively means
that if there exists a circuit with d gates, then it is tried to find a better realization
with only d  < d gates; otherwise, it is tried to find a circuit with d  > d gates.
However, for the considered exact synthesis problem an iterative approach is cho-
sen due to the complexity in solving the respective problem instances for large val-
ues of d. To illustrate this, Table 4.1 shows the results (R ES) as well as the run-time
60 4 Exact Synthesis of Reversible Logic

Table 4.1 Iterative approach


vs. non-iterative approach for d I TERATIVE B EST CASE NON - IT.
mod5d1 R ES T IME R ES T IME

1 UNSAT 0.23 – –
2 UNSAT 1.92 – –
3 UNSAT 16.68 – –
4 UNSAT 36.62 – –
5 UNSAT 194.24 UNSAT 194.24
6 UNSAT 1625.88 UNSAT 1625.88
7 SAT 218.56 SAT 218.56

total 2094.13 2038.68

Fig. 4.4 Main flow of exact (1) exactSynthesis(f : Bn → Bn )


synthesis algorithm (2) // f is given in terms of a truth table
(3) found = false;
(4) d = 1;
(5) while (found == false) do
(6) inst = encodeProblem(f, d);
(7) res = callSolver(inst);
(8) if (res == satisfiable)
(9) // f is synthesizable with costs d
(10) A = getAssignment();
(11) extractCircuitFromAssignment(A);
(12) found = true;
(13) else
(14) // f is not synthesizable with costs d
(15) d = d + 1;

(T IME, in CPU seconds) of the respective checks that have been performed to syn-
thesize an optimal Toffoli circuit for the function mod5d1.1 The minimal circuit for
this function includes d = 7 gates. Thus, using the iterative approach seven checks
are performed in total. In contrast, assuming the best case for the non-iterative ap-
proach (i.e. the minimal depth d = 7 is determined just at the beginning and addi-
tionally the two checks for d = 6 and d = 5 are performed) only three checks are
necessary. However, since the run-time needed for the first checks of the iterative
approach are small, the total run-time for both approaches differs only slightly (less
than 3%). Hence, the non-iterative approach for reaching the minimal d is not feasi-
ble in general. This particularly holds, since this approach naturally requires checks
with a d greater than the minimal d (that obviously are harder).
As a result, an iterative approach as shown in Fig. 4.4 is used for exact synthe-
sis of reversible logic. The input is the truth table of the reversible function f to

1 Therefore, the SAT-based encoding described in the next section has been used. However, the

same behavior was observed for other encodings and functions, respectively.
4.2 SAT-based Exact Synthesis 61

be synthesized. The algorithm tries to find a circuit representation for f with one
gate only, i.e. d is initialized to 1 and a respective SAT instance is created. If no
realization with d gates exists, d is incremented. This procedure is repeated until a
realization is found.
The respective checks are thereby performed by
1. encoding the synthesis problem as an instance of Boolean satisfiability inst
(line 6) and
2. checking the instance for satisfiability using an off-the-shelf solver (line 7).
If there exists a satisfying assignment for inst a circuit representing f has been
found. This circuit is extracted from the assignment of the encoding given by the
solver. If inst is unsatisfiable, it has been proven that no realization for f with d
gates exists. By increasing d iteratively from d = 1 minimality is ensured.
Using this as the main flow, the next sections introduce concrete encodings for
Toffoli and quantum circuit synthesis, respectively.

4.2 SAT-based Exact Synthesis

Having the main flow as a basis, the open question still is how to encode the decision
problem “Is there a circuit with exactly d gates that realizes the given reversible
function f ?” as a SAT instance. In this section, the concrete SAT formulation as well
as first results obtained with it are presented. Section 4.2.1 addresses Toffoli circuit
synthesis, while Sect. 4.2.2 is about quantum circuit synthesis, respectively. These
encodings allow an efficient handling of the embedding problem for irreversible
functions (see Sect. 3.1.1) which is considered in Sect. 4.2.3 in more detail. Finally,
experimental results are given in Sect. 4.2.4.

4.2.1 Encoding for Toffoli Circuits

The synthesis problem for Toffoli circuits is encoded so that the resulting instance
is satisfiable iff a circuit with d gates realizing a given function f exists; other-
wise the instance must be unsatisfiable. To this end, Boolean variables (for brevity
denoted by vectors in the following) and constraints are used as described in the
following.
First, the vectors defining the type of a Toffoli gate at an arbitrary depth k are
introduced:2

2 The Toffoli gates in a circuit are enumerated from left to right (starting from 0). Furthermore, the

term depth is used to refer to the respective position of a Toffoli gate in this enumeration.
62 4 Exact Synthesis of Reversible Logic

Fig. 4.5 Representation of Toffoli gates by assignments to tk and ck

Definition 4.1 Let f : Bn → Bn be a reversible function to be synthesized as a


circuit with d gates. Then,
• tk = (tlog
k . . . t1k ) with 0 ≤ k < d is a Boolean vector defining the position of
2 n
the target line for the Toffoli gate at depth k. More precisely, tk is a binary encod-
ing of a natural number t k ∈ {0, . . . , n − 1} that indicates this target line.
• ck = (cn−1
k ck
n−2 . . . c1 ) with 0 ≤ k < d is a Boolean vector defining the control
k

lines of the Toffoli gate at depth k. More precisely, assigning clk = 1 with (1 ≤
l < n) means that line (t k + l) mod n becomes a control line of the Toffoli gate
at depth k.

Remark 4.1 In total there are n · 2n−1 different types of Toffoli gates for a reversible
function with n variables. This holds, since a Toffoli gate has exactly one target
line resulting in n − 1 lines that are left as possible control lines. Thus, there are
n lines possible for placing a target line and 2n−1 combinations for control lines,
respectively.

Example 4.2 Figure 4.5 shows all 3 · 23−1 = 12 possible types of Toffoli gates for
a circuit with n = 3 lines. For each gate its assignments to the vectors tk and ck
are also given. For example, the assignments tk = (01) and ck = (01) state that
line [01]2 = 1 is the target line. Furthermore, because c1 is assigned to 1, line (1 + 1)
mod 3 = 2 becomes a control line. In contrast, because c2 is assigned to 0, line
(1 + 2) mod 3 = 0 does not become a control line.

Furthermore variables representing the inputs and outputs as well as the internal
signals of the circuit to be synthesized are defined:
4.2 SAT-based Exact Synthesis 63

Fig. 4.6 SAT formulation for


Toffoli circuit synthesis with
n = 4 and d = 4

Definition 4.2 Let f : Bn → Bn be a reversible function to be synthesized as a


circuit with d gates. Then, xki = (xi(n−1)
k k ) with 0 ≤ i < 2n and 0 ≤ k ≤ d is
. . . xi0
a Boolean vector representing the input (for k = 0), the output (for k = d), or the
internal variables (for 1 ≤ k ≤ d − 1), respectively, of the circuit to be synthesized
for each truth table line i of f . So, the left side of a truth table line i corresponds to
the vector x0i , while the right side corresponds to the vector xdi , respectively.

Example 4.3 Figure 4.6 shows the variables needed to formulate the synthesis prob-
lem for an (embedded) adder function3 with n = 4 variables and depth d = 4. The
first row gives the variables for the first truth table line, the second row the variables
for the second truth table line, and so on. Thus, for each of the 24 = 16 lines in
the truth table, n = 4 circuit lines with the respective vectors for input, output, and
internal variables are considered (i.e. overall 4 · 16 = 64 lines are considered). The
positions for the Toffoli gates to be synthesized are marked by dashed rectangles.
For each depth, all possible types of Toffoli gates can be defined by assigning the
respective values to tk and ck .

3 In the example, the adder from Table 3.2 on p. 29 is used.


64 4 Exact Synthesis of Reversible Logic

Using these variables, the synthesis problem for a reversible function f with d
Toffoli gates can be formulated as follows: Is there an assignment to all variables
of the vectors tk and ck such that for each line i, x0i is equal to the left side of the
truth table, while xdi is equal to the corresponding right side? This is encoded by the
conjunction of the following three constraints:
1. The input/output constraints set the input and output of the truth table given by
the function f to the respective variables x0i and xdi (see also left-hand and right-
hand side of Fig. 4.6), i.e.
n −1
2
[x0i ]2 = i ∧ [xdi ]2 = f (i).
i=0

2. For each gate to be synthesized at depth k, functional constraints are added so


that—depending on the assignments to tk and ck as well as on the input xki of the
kth gate for truth table line i—the respective gate output xk+1
i is computed, i.e.

n −1 d−1
2 
xk+1
i = t (xki , tk , ck ).
i=0 k=0

The function t (xki , tk , ck ) covers the functionality of a Toffoli gate with target line
t k = [tk ]2 and the control lines defined by ck . As an example, consider tk = (01)
and ck = (100), i.e. with c3k = 1. This assignment states that the Toffoli gate at
depth k has line t k = [01]2 = 1 as target line and line

(t k + l) mod n
= (1 + 3) mod 4
=0

as single control line. For this case, constraints

tk = (01) ∧ ck = (100) ⇒ k+1


xi0 = xi0
k

∧ xi1
k+1
= xi1
k ⊕ xk
i0

∧ xi2
k+1
= xi2
k

∧ xi3
k+1
= xi3
k

are added for each truth table line i of a function with n = 4 variables. That
means, the values of the circuit lines 0, 2, and 3 are passed through, while the
output value of line 1 becomes inverted, if line 0 is assigned to 1. Similar con-
straints are added for all remaining cases.
4.2 SAT-based Exact Synthesis 65

3. Finally, exclusion constraints ensure that illegal assignments to tk are excluded,


since not all values of tk are necessary to enumerate all possible target lines, i.e.


d−1
[tk ]2 < n.
k=0

For example, for a circuit consisting of n = 3 lines the target line is represented
by two variables tk = (t2 t1 ) as shown in Fig. 4.5. Here, the assignment tk = (11)
has to be excluded, since line [11]2 = 3 does not exist.
As a result, a formulation has been constructed which is satisfiable, if there is
a valid assignment to tk and ck so that for all truth table lines the desired input-
output mapping is achieved. Then, the concrete Toffoli gates can be obtained by
the assignments to tk and ck as depicted in Fig. 4.5. If there is no such assignment
(i.e. the instance is unsatisfiable), then it has been proven that no circuit representing
the function with d gates exists.
As a last step, the proposed encoding has to be transformed from bit-vector logic
into a Conjunctive Normal Form (CNF)—the standard input format for SAT solvers
(see Sect. 2.3.1). This is a well understood process that can be done in time and
space linear in the size of the original formulation [Tse68]. A possible way is to de-
fine methods for clause generation of simple logic functions like AND, OR, etc. and
extending this scheme for more complex logic like implications or comparisons.
Then, in particular the functional constraints can be mapped to CNF. The assign-
ments of the input/output constraints can be applied by using unit clauses. Finally,
the exclusion constraints can be expressed by explicitly enumerating all values that
are not allowed in terms of a blocking clause [McM02].
Having the formulation in CNF, the satisfiability of the instance (as well as the
satisfying assignments) can be efficiently determined.

4.2.2 Encoding for Quantum Circuits

To synthesize quantum circuits, an encoding similar to the one introduced in the


last section is used. However, since quantum circuits may have V gates and V+
gates, circuit lines (or qubits, respectively) may not only be assigned to Boolean 0
and Boolean 1, but also to V0 and V1 (see Sect. 2.1.3). Thus, for assignments to the
input, output, and internal variables xki a multi-valued encoding is applied. To this
end, each xijk is replaced by new variables yijk and zij
k that represent the respective

values as follows:
yijk k
zij

0 0 0
0 1 V0
1 0 1
1 1 V1
66 4 Exact Synthesis of Reversible Logic

k is assigned to 0 the Boolean domain is considered, otherwise the


That is, if zij
non-Boolean quantum states V0 and V1 are selected.
Furthermore, since another gate library is applied, new variables to represent the
respective types of a gate at depth k are required. Thus, the variables tk and ck used
for Toffoli gates are replaced by qk defined as follows:

Definition 4.3 Let f : Bn → Bn be a reversible function to be synthesized as a


quantum circuit with d gates. Then, qk = (qlog k . . . q1k ) with (0 ≤ k < d) is a
2 g
Boolean vector defining the type of the quantum gate at depth k. The number g of
gate types possible with n circuit lines is thereby defined by g = 3n(n − 1) + n.

The number g of all gates types is determined as follows: Each CNOT gate,
V gate, and V+ gate has exactly one target line and one control line leading to
3n(n−1) possible gate types for a circuit with n lines. Additionally, n NOT gates are
possible (one at each line). Thus, in total 3n(n − 1) + n different types of quantum
gates exist.

Remark 4.2 If additionally double gates are considered (see Sect. 2.1.3), for a circuit
with n lines in total g = 7n(n − 1) + n different types of quantum gates have to be
considered. This holds, since in total four double gates exist (namely the ones shown
in Fig. 2.7 on p. 16), leading to 4n(n − 1) additional types.

Example 4.4 Figure 4.7 shows the variables needed to formulate the constraints for
an (embedded) adder function. In comparison to the variables needed for Toffoli
synthesis (see Example 4.3 or Fig. 4.6, respectively) the variables defining the type
of a gate at depth k and the variables representing the circuit line values have been
changed.

Having these variables (enabling a multi-valued encoding considering the quan-


tum gate library), in comparison to the Toffoli synthesis formulation the constraints
are modified as follows:

1. The input/output constraints now argue over yij0 , zij


0 and y d , zd leading to
ij ij

n −1 n−1
2 
yij0 = i[j ] ∧ zij
0
= 0 ∧ yijd = f (i)[i] ∧ zdij = 0.
i=0 j =0

That is, each yij0 (yijd ) is assigned to 1 or 0 according to the j th position in the
truth table line i of f . Furthermore, each zij0 (zd ) is assigned to 0, since Boolean
ij
functions are synthesized. As an example, consider the left-hand and the right-
hand side of Fig. 4.7.
4.2 SAT-based Exact Synthesis 67

Fig. 4.7 SAT formulation for quantum circuit synthesis with n = 4 and d = 4

2. The functional constraints are modified so that the functionality of the new gate
library is represented by a new function q(yijk , zij
k , qk ), i.e.

n −1 n−1
 2
d−1 
yijk+1 zij
k+1
= q(yijk , zij
k
, qk ).
k=0 i=0 j =0

Therefore, a similar formulation as described in the last section for the Toffoli
gate library is possible.
3. And finally, illegal assignments to qk are now excluded by


d−1
[qk ]2 < g,
k=0

where g is given by 7n(n − 1) + n (including double gates) or 3n(n − 1) + n


(without double gates), respectively.
As for Toffoli circuit synthesis, in a last step this formulation has to be trans-
formed into a CNF and passed to a SAT solver. If the solver returns satisfiable, then
68 4 Exact Synthesis of Reversible Logic

the quantum circuit can be obtained by the assignments to qk . Even for the multi-
valued encoding, this can be done efficiently for many practically relevant func-
tions. However, before the performance of the encodings (for both, Toffoli circuit
synthesis as well quantum circuit synthesis), is considered in detail in Sect. 4.2.4,
a beneficial modification for exact synthesis of (embedded) irreversible functions is
introduced in the following section.

4.2.3 Handling Irreversible Functions

As described in Sect. 3.1.1, an irreversible function often must be embedded into


a reversible one before synthesis can be applied. This also holds for the exact syn-
thesis approach proposed in the last sections. As a result, constant inputs, garbage
outputs, and don’t cares, respectively, may occur in the embedded functions. Thus,
to determine the minimal reversible circuit for an irreversible function all possible
embeddings (and therewith all possible assignments to constant inputs, garbage out-
puts, and don’t cares) have to be checked separately.4 But, if some slight modifica-
tions in the encoding are applied, several embeddings can be considered in parallel.
It is thereby distinguished between don’t cares at the outputs (e.g. because of
garbage outputs) and concrete values to be assigned to constant inputs. In the for-
mer case, only the input/output constraints are relaxed. Instead of forcing all output
variables xijd (yijd zij
d ) to have concrete values, only constraints for the specified ones

are added. Then, the variables for don’t care conditions are left unspecified and
are—if the instance is satisfiable—assigned by the SAT solver.
The same is done for all constant inputs. But, since a constant input must have
the same assignment in all truth table lines, an additional constraint

0
x0c = x1c
0
= · · · = x(2
0
n−|c| −1)c
0 0
or y0c z0c = y1c
0 0
z01c = · · · = y(2
0 0
n−|c| −1)c z(2n−|c| −1)c

is added for each constant input c, respectively. This restricts the SAT solver to
0 (y 0 z0 ) with the same value for each truth table line.
assign all input variables xic ic ic
Furthermore, since the constant inputs are now modeled symbolically (the value of
each constant input is not fixed to 0 or 1) only 2n−|c| truth table lines have to be
considered (where |c| is the number of constants).

Example 4.5 Consider the incompletely embedded adder function shown in Ta-
ble 4.2. The adder needs one additional variable to become reversible leading to
a function with n = 4 variables, one constant input c, and two garbage outputs g1

4 In principle, also embeddings with an arbitrary number of garbage outputs and different output
permutations are possible. However, in the following only embeddings with minimal garbage and a
fix output order are considered. Chapter 5 provides a further consideration of different embeddings.
4.2 SAT-based Exact Synthesis 69

Table 4.2 Incomplete


embedding of an adder c cin x y cout sum g1 g2

– 0 0 0 0 0 – –
– 0 0 1 0 1 – –
– 0 1 0 0 1 – –
– 0 1 1 1 0 – –
– 1 0 0 0 1 – –
– 1 0 1 1 0 – –
– 1 1 0 1 0 – –
– 1 1 1 1 1 – –

and g2 , respectively.5 To handle this incompletely specified function, four modifi-


cations in the proposed SAT encoding are performed:

• Constraints for only 2n−|c| = 23 = 8 (instead of 2n = 24 = 16) truth table lines


are created, i.e. 0 ≤ i < 8.
• For all output variables xig
d (y d zd ) and x d (y d zd ) no output constraints are
1 ig1 ig1 ig2 ig2 ig2
added, i.e. they are left unspecified.
• For all input variables xic
0 (y 0 z0 ) no input constraints are added, i.e. they are left
ic ic
unspecified.
• An additional constraint x0c 0 = x 0 = · · · = x 0 (y 0 z0 = y 0 z0 = · · · = y 0 z0 )
1c 7c 0c 0c 1c 01c 7c 7c
is added for each truth table line i.

In summary, these modifications do not only simplify the SAT encoding (since
a smaller number of truth table lines is considered), but also reduce the number
of checks that have to be performed to find a minimal circuit. Normally, to en-
sure minimality both values for each constant input have to be considered. Thus,
for a function with one constant input two checks (one with c = 0 and one with
c = 1) have to be performed. Moreover, for functions with more than one constant
input an exponential number of combinations has to be checked (e.g. values from
{00, 01, 10, 11} for a function with two constant inputs). For all these combinations,
a single instance must be encoded and separately solved by the solver. In contrast,
using the proposed modifications a single instance to be checked is sufficient to syn-
thesize a minimal result. This leads to significant speed-ups as the experiments in
the next section show.

5 Note, that only a partial truth table is shown. Depending on the assignment to the constant input,
n
2n−1 = 22 truth table lines with don’t care outputs are added either above or below the shown truth
table lines. For more details see Sect. 3.1.1.
70 4 Exact Synthesis of Reversible Logic

4.2.4 Experimental Results

The proposed approaches have been implemented in C++. To solve the resulting
instances, the SAT solver MiniSAT [ES04] has been used. This section provides ex-
perimental results for both, exact synthesis of quantum circuits and exact synthesis
of Toffoli circuits. More precisely, it is shown that exact synthesis can be applied
with up to six variables. Improvements in the run-time can be obtained if irreversible
functions containing constant inputs are considered. Furthermore, a comparison to
heuristic approaches confirm the need of exact synthesis methods for both, finding
smaller circuits than the currently best known realizations and evaluating the quality
of heuristic methods.
As benchmarks a wide range of functions from different domains has been used.
This includes reversible functions as well as embedded irreversible functions. All
benchmarks have been taken from RevLib [WGT+08]. The experiments have been
carried out on an AMD Athlon 3500+ with 1 GB of memory. This section starts with
an evaluation of the quantum circuit synthesis afterwards followed by an evaluation
of exact Toffoli circuit synthesis.

4.2.4.1 Synthesis of Quantum Circuits

For exact synthesis of quantum circuits the results of three evaluations are presented.
First the modifications for irreversible functions as introduced in Sect. 4.2.3 are
studied in detail. Next, the effect of the application of double gates on the synthesis
results is observed. Finally, the presented approach is compared to a previously
introduced method for quantum circuit synthesis.

Handling of Irreversible Functions To synthesize quantum circuits for irre-


versible functions, appropriate embeddings including constant inputs, garbage out-
puts, and don’t care conditions are used. Then, circuits are synthesized (1) by as-
signing the respective constants before creating and solving the SAT instance (de-
noted by O RIG . SAT E NCODING) or (2) by applying the modifications proposed in
Sect. 4.2.3 (denoted by I MPR . SAT E NCODING). The respective results are shown in
Table 4.3. Note that if the O RIG . SAT E NCODING is applied for all possible com-
binations of constant input assignments a single instance is encoded and solved.
Thus, for each function there are 2|c| entries, where |c| is the number of constant
inputs. Each line below the function name corresponds to one assignment to the
constant inputs. In contrast, if the I MPR . SAT E NCODING is applied, only a single
instance has to be solved and therewith only a single result is reported. Besides that,
columns labeled d show the number of gates of the resulting (minimal) circuit and
columns labeled T IME list the corresponding run-time in CPU seconds (the number
of variables in the function is given by n).
As can be seen, the modified encoding offers a significant speed-up for all ex-
amples. The reduction of run-times is between 70% and 95%. Reductions are more
substantial if at least one of the assignments has a solution with more gates than
4.2 SAT-based Exact Synthesis 71

Table 4.3 Handling of


irreversible functions O RIG . SAT I MPR . SAT
E NCODING E NCODING
d T IME d T IME

Half-adder (n = 3)
4 1.47 4 0.85
5 2.65 – –

total 4 4.12 4 0.85


Half-adder2 (n = 3)
4 1.28 4 0.76
5 3.53 – –

total 4 4.81 4 0.76


Full-adder (n = 4)
6 297.569 6 209.95
7 1123.55 – –

total 6 1421.11 6 209.95


low-high (n = 4)
7 10636.95 6 2180.66
6 1444.26 – –

total 6 12081.21 6 2180.66


zero-one-two (n = 4)
7 430.30 6 132.46
6 91.23 – –
6 204.46 – –
6 477.84 – –

total 6 1203.83 6 132.46


decod24 (n = 4)
8 2162.51 8 5110.26
8 4391.96 – –
8 6158.05 – –
8 6063.92 – –

total 8 18776.44 8 5110.26


72 4 Exact Synthesis of Reversible Logic

Table 4.4 Effect of double


gates F UNCTION Q UA .G ATES D BL .G ATES
n d T IME d T IME

R EVERSIBLE FUNCTIONS
3_17 3 10 1641.49 8 280.98
miller 3 8 15.49 6 11.60
fredkin 3 7 7.04 5 3.28
peres 3 4 0.33 4 1.21
toffoli 3 5 0.71 5 2.38
peres-double 3 6 11.32 6 175.86
toffoli-double 3 7 86.75 7 1121.68
graycode6 6 5 66.50 5 608.11
q4example 4 6 9.08 5 24.83

E MBEDDED IRREVERSIBLE FUNCTIONS


Half-adder 3 5 0.40 4 0.85
Half-adder2 3 4 0.19 4 0.76
Full-adder 4 7 145.07 6 209.95
rd32 4 6 37.75 6 436.11
low-high 4 7 2245.47 7 2180.60
zero-one-two 4 7 32.05 6 132.46
decod24 4 9 5660.77 8 5110.26

the optimal assignment. This can be observed for all functions in Table 4.3 ex-
cept decod24. It should be noted that the constraining of constant input variables
requires some computation time (i.e. run-times may be higher than those for solv-
ing the function with a fixed constant input assignment). However, this overhead
is easily compensated by the fact that only one instance needs to be solved. Since
the proposed modifications often lead to better results (with respect to run-time and
resulting circuit size), in the following, they are also applied in the remaining exper-
iments.

Effect of Double Gates In [HSY+06], double gates (as introduced in Sect. 2.1.3)
are assumed to have unit cost. However, other synthesis methods (e.g. [BBC+95,
MYDM05]) consider the cost of a double gate to be two, since they are composed
of two quantum gates. Hence, there are compelling reasons to consider synthesis
that rely only on (single) quantum gates only. As described above, the proposed
SAT-based formulation supports both, synthesis with quantum gates (denoted by
Q UA . G ATES) only or additionally with double gates (denoted by D BL . G ATES). In
one evaluation, circuits with double gates enabled and with double gates disabled
have been considered. Disabling double gates reduces the number of possible gates
at each depth from 7n(n − 1) + n to 3n(n − 1) + n (making the instance more
compact). The results are summarized in Table 4.4. In the first two columns the name
4.2 SAT-based Exact Synthesis 73

Table 4.5 Comparison to


exact synthesis in [HSY+06] F UNCTION RA [HSY+06] SAT (PIII) I MPR
T IME T IME

R EVERSIBLE FUNCTIONS
miller 318.29 34.53 >9.2
fredkin 78.02 10.96 >7.1
peres 35.18 4.43 >7.9
toffoli 122.52 8.45 >14.5

E MBEDDED IRREVERSIBLE FUNCTIONS


Half-adder 6.77 2.99 >2.3
Half-adder2 26.25 2.70 >9.7
Full-adder 25200.00 551.92 >45.7

of the function and its number of variables are given. The next columns provide
the number of gates (d) and the run-time in CPU seconds (T IME) for both cases
(quantum gates only and additionally with double gates).
In general, it is expected that more choices of possible gates at each level will
increase the time to find a correct solution. This can clearly be seen for the bench-
mark functions where the inclusion of double gates offers no advantage (i.e. both
values for d are the same). For example, the run-time for graycode6 increases by
one order of magnitude when double gates are considered—even though the results
are identical with respect to the costs. On the other hand for some functions where
the inclusion of double gates leads to smaller circuits (e.g. 3_17), the run-time can
be reduced, since fewer instances have to be solved.

Comparison with Previous Work To compare the SAT-based synthesis to the


exact approach introduced in [HSY+06], a 733 MHz Pentium III with 512 MB of
main memory has been used (that is significantly slower than a 850 MHz Pentium III
processor, the system used in [HSY+06]). The outcome is shown in Table 4.5. RA
denotes the run-time of the reachability analysis from [HSY+06] and SAT denotes
the run-time of the proposed approach (using both quantum gates and double gates).
IMPR gives the run-time improvement, i.e. the run-time of RA divided by the run-
time of SAT. The table clearly shows that all functions from [HSY+06] considered
for exact synthesis can be synthesized in significant shorter run-time—even on a
slower processor. Improvements of at least a factor of 2 are achieved. In the best
case an improvement of a factor of 45 is observed.
Besides that, also synthesized results for the functions q4-example, Peres-double,
and Toffoli-double are compared. For these functions, the authors of [HSY+06] con-
strained the search space, i.e. they restrict the target-line of the V and V+ gates to
be a fix line. Therefore, they cannot guarantee optimal solutions. The comparison
in Table 4.6 shows that the proposed approach is able to find the optimal results for
these functions with a low run-time increase (the results of the heuristic approach
of [HSY+06] are denoted by RAheu ). It is thereby proven that for Peres-double and
74 4 Exact Synthesis of Reversible Logic

Table 4.6 Comparison to


heurstic synthesis in F UNCTION RAheu [HSY+06] SAT (PIII)
[HSY+06] d T IME T IME d

peres-double 6 171.27 481.71 6


toffoli-double 7 853.78 2985.88 7
q4-example 6 34.78 78.09 5

Toffoli-double the minimal quantum gate circuits have been found in [HSY+06].
Additionally, in case of q4-example the SAT-based approach synthesizes an optimal
quantum gate representation with costs 5 instead of the non-optimal circuit of size
6 obtained in [HSY+06].
Note, again all these benchmarks are carried out on a slower system than the one
used in [HSY+06]. For absolute run-times on a faster machine see the rightmost
column of Table 4.4.

4.2.4.2 Synthesis of Toffoli Circuits


In a next series of experiments, the SAT-based synthesis of Toffoli circuits was con-
sidered. Since the effect of the improved handling for irreversible functions already
has been evaluated above for quantum circuits, another discussion of the results for
Toffoli circuits is omitted (for all cases similar results have been obtained). Instead,
the respective handling is directly applied where appropriate. In the following the re-
sults of SAT-based Toffoli circuit synthesis in comparison to previously introduced
exact and heuristic results are presented, respectively.
Comparison to Exact Approaches In the past, [SPMH03] and [YSHP05] inves-
tigated exact synthesis of reversible functions for Toffoli circuits. However, both
approaches give results only for reversible functions with up to n = 3 variables.
More precisely, the overall synthesis time for all possible 23 ! = 40320 reversible
functions with 3 variables is reported (at least 40 CPU seconds for [SPMH03] and
12 CPU seconds for [YSHP05]). Applying the SAT-based approach to each of these
40320 functions takes less than 0.01 CPU seconds for most of the instances and 0.65
CPU seconds in the worst case. Thus, the synthesis time for any 3-variable function
is negligible. Adding up the run times for all 40320 functions would only accumu-
late errors of measurement. Thus, one can conclude that the overall synthesis time
for a function with n = 3 is not crucial, i.e. exact synthesis for such functions can
be performed efficiently. Furthermore, in contrast to [SPMH03, YSHP05] it is also
possible to synthesize minimal circuits for functions with more than 3 variables as
the next evaluation shows.
Comparison to Heuristic Approaches Since several heuristic approaches for
Toffoli circuit synthesis have been proposed so far, circuits obtained by SAT-based
synthesis are not compared to the results obtained by a single previous method, but
to the currently best known results. The results are shown in Table 4.7. For each
function, the number of gates (d) as well as the source (S RC .) of the currently best
4.2 SAT-based Exact Synthesis 75

Table 4.7 Comparison of synthesis results


F UNCTION B EST KNOWN E XACT
n d QC S RC . d QC T IME Δd Δ QC

R EVERSIBLE FUNCTIONS
mod5mils 5 5 13 [MDM05] 5 13 48.28 0 0
ham3 3 5 9 [GAJ06] 5 9 0.60 0 0
ex-1 3 4 8 [MDM05] 4 8 0.12 0 0
graycode3 3 2 2 [MDM05] 2 2 0.01 0 0
graycode4 4 3 3 [MDM05] 3 3 0.64 0 0
graycode5 5 4 4 [MDM05] 4 4 22.08 0 0
graycode6 6 5 5 [GAJ06] 5 5 583.14 0 0
3_17 3 6 14 [GAJ06] 6 14 0.43 0 0
mod5d1 5 8 24 [WGT+08] 7 11 2094.13 1 13
mod5d2 5 8 16 [MDM05] 8 20 1616.07 0 −4

E MBEDDED IRREVERSIBLE FUNCTIONS


rd32 4 4 12 [GAJ06] 4 12 3.03 0 0
decod24 4 11 31 [GAJ06] 6 18 6.33 5 13
4gt4 5 17 89 [MDM05] 5 54 412.03 12 35
4gt5 5 13 29 [MDM05] 4 28 48.75 9 1
4gt10 5 13 53 [MDM05] 5 37 245.60 8 16
4gt11 5 12 16 [MDM05] 3 7 7.32 9 9
4gt12 5 14 58 [MDM05] 5 41 440.30 9 17
4gt13 5 14 34 [MDM05] 3 15 7.23 11 19
4mod5 5 5 9 [WGT+08] 5 9 125.74 0 0
4mod7 5 6 38 [MDM05] 6 38 653.82 0 0
one-two-three 5 11 71 [MDM05] 8 24 2186.71 3 47
alu 4 18 114 [GAJ06] 6 22 2001.32 12 92

known realization is given in column B EST KNOWN.6 In contrast, column E XACT


shows the number of gates obtained by the SAT-based synthesis. The quantum costs
for the respective circuits are denoted by QC. The difference with respect to the num-
ber of gates and quantum costs is given in the last two columns, respectively. Since
the results from previous work have been obtained by different approaches on dif-
ferent machines, no run-times for these are reported. The run-time of the SAT-based
synthesis is given in column T IME.
Using the exact synthesis, it can be proven that for many functions minimal Tof-
foli circuits (with respect to gate count) already have been found (all rows with
Δd equal to 0). That shows, today’s synthesis approaches are achieving very good

6 Forsome functions no results have been reported before. In this case, the approach of [MDM05]
has been applied to generate a heuristic result.
76 4 Exact Synthesis of Reversible Logic

results for these benchmarks. However, exact synthesis additionally enables the re-
alization of significantly smaller circuits. For example, the circuit for 4gt5 can be
reduced by more than two third. In absolute numbers, up to 12 gates can be saved
for some functions. Moreover, the proposed approach also improves the quantum
cost for many functions (only in one case (for mod5d2) the quantum cost increase.7
In the best case the quantum costs are reduced by 92.
It can be concluded that using SAT-based synthesis as proposed, exact results
can be produced for functions with up to six variables. The comparison to heuristic
approaches confirm the need of exact methods for both, finding smaller circuits
than the currently best known realizations and evaluating the quality of heuristic
methods. However, synthesizing exact results still requires high computing times.
Thus, in the next section improvements are proposed that accelerate the synthesis
process.

4.3 Improved Exact Synthesis


In the last section, SAT-based synthesis for quantum gate and Toffoli gate circuits
has been demonstrated as a promising alternative to achieve minimal results for
functions with up to six variables. But, the proposed approach still works on a prob-
lem description in CNF. In this section, two improvements are described exploiting
higher levels of abstraction and leading to an increase of both, efficiency and quality
of the obtained results. Therefore, from now on the focus is on circuits composed of
reversible gates only. But, the described techniques can also be applied to synthesis
of quantum circuits.
In the first part of this section, the SAT encoding presented in the last section is
used as basis but lifted to a higher level of abstraction. More precisely, the synthesis
problem is encoded in bit-vector logic which can be solved by Satisfiability Mod-
ulo Theories (SMT) solvers instead of Boolean SAT solvers. Experiments show that
this leads to significant speed-ups. However, after a detailed analysis of the funda-
mental limits of these solving paradigms, another approach is proposed that utilizes
problem-specific knowledge. Consequently, the general solver framework SWORD
(see Sect. 2.3.2.3) is applied for which dedicated modules have been specified. Be-
sides a high-level and very compact problem representation, these modules allow
more efficient decision and propagation strategies.
But even with higher levels of abstractions, the size of the synthesis encoding is
still exponential. That is, constraints are built for each truth table line of the func-
tion f to be synthesized. As an alternative, in the second part of this section an
approach for reversible logic synthesis which leads to a polynomial size encoding
is proposed. This encoding takes advantage of Quantified Boolean Formula (QBF)

7 This is because circuits are optimally synthesized with respect to number of gates not quantum

cost. In some (few) cases circuits with larger number of gates but with lower quantum cost are
possible. For results with respect to quantum gates (and therewith with respect to quantum cost)
see the discussion above.
4.3 Improved Exact Synthesis 77

satisfiability—a generalization of Boolean satisfiability (see Sect. 2.3.1). More pre-


cisely, the exact synthesis problem of a reversible function f is formulated as a
QBF problem by encoding the cascade structure of a reversible circuit as a func-
tional composition of universal gates and enforcing to meet the specification of f
by quantification. In this sense, complexity is moved from the problem formulation
to the solving engine. Then, the quantified Boolean formula is solved by applying
QBF solvers and Binary Decision Diagrams (BDDs). This leads to three major im-
provements: (1) the circuits are synthesized faster, (2) all minimal circuits are found
in a single step which allows to choose the best one with respect to the quantum cost,
and (3) different reversible gate libraries are easily supported by a simple extension
of the problem formulation.
In the remainder of this section, both approaches are described in detail in
Sect. 4.3.1 (for SMT and SWORD) and Sect. 4.3.2 (for OBF), respectively. Af-
terwards, experimental results for both are presented in Sect. 4.3.3.

4.3.1 Exploiting Higher Levels of Abstractions

Recalling the proposed SAT-based encoding, for a function f : Bn → Bn to be syn-


thesized as a circuit consisting of d gates, the variables
• tk = (tlog
k . . . t1k ) (to define the target line of a gate at depth k),
2 n
• ck = (cn−1
k ck k
n−2 . . . c1 ) (to define the control lines of a gate at depth k), and
• xi = (xi(n−1) . . . xi0 ) (to represent the input, output, and internal variables)
k k k

as well as the constraints


n
• 2i=0−1 [x0i ]2 = i ∧ [xdi ]2 = f (i) (input/output),
n 
• 2i=0−1 d−1 k=0 xi
k+1
= t (xki , tk , ck ) (functional), and

• d−1
k=0 [t ]2 < n (exclusion)
k

have been introduced. As can be seen, a large part of the problem formulation con-
sists of bit-vector variables and bit-vector constraints, respectively. However, most
of this high level of abstraction is lost, when the formulation is encoded as a pure
Boolean formula and afterwards solved by a Boolean SAT solver. Furthermore, this
transformation requires a high amount of auxiliary variables leading to an additional
overhead. Thus, it is worth to consider alternative encodings.
The emerging area of SAT Modulo Theories (SMT) (see Sect. 2.3.2) provides new
solving engines that directly support bit-vector logic and thus allow an encoding
that avoids the conversion to the Boolean level. As a result, all bit-vector variables
and most of the bit-vector operations are preserved; hardly any auxiliary variables
are needed. Furthermore, the formulation at this higher level of abstraction allows
stronger implications. As the experiments in Sect. 4.3.3 show, already this simple
“replacement” allows significant improvements in the resulting synthesis times.
78 4 Exact Synthesis of Reversible Logic

However, further accelerations can be achieved if more dedicated solving en-


gines are exploited. One adjusted solving technique, based on the framework
SWORD [WFG+07], is described in this section. The general limits of the SAT- and
SMT-based approaches are thereby discussed first. Then, the dedicated implication
procedures and decision heuristics are introduced.

4.3.1.1 Limits of Common SAT and SMT Solvers

The input of a SAT solver is a Boolean function in terms of clauses. The input of
an SMT solver is a description in bit-vector logic. Both solvers are optimized for
their particular problem representation. For example, common SAT solvers utilize
the two literal watching scheme to carry out implications, which exploits the special
structure of clauses [MMZ+01]. SMT solvers on the other hand, use e.g. canoniz-
ing [BDL98] and term-rewriting [BB09] to efficiently handle bit-vector constraints.
Furthermore, highly optimized heuristics have been developed to decide the assign-
ment of variables if no more implications are possible. Strategies employed are
based on statistical information, for example occurrences or activities of variables
[Mar99].
All these techniques work very well if CNF formulas or bit-vector logic are con-
sidered in general. But, the respective solvers are not able to take specific properties
of the problem into account. For example, promising problem-specific strategies for
the exact Toffoli circuits synthesis would be:
• The type of the Toffoli gates (represented by tk and ck ) near to the inputs should
be defined first, because the corresponding input variables are already assigned
by the truth table. This allows for early implications and helps to determine the
types of the remaining gates or to detect conflicts faster. Thus, tk and ck with
small k should be preferred in the decision procedure. Similarly, this observation
also holds for modules near to the outputs.
• If the assignment to an input line of a Toffoli gate is not equal to the assignment
to the corresponding output line of the same gate, this line has to be the target
line. This observation allows to imply the assignment to variables in tk .
• If the target line of a Toffoli gate is known, the values of all remaining lines can
be implied if there is an assignment at the corresponding input or output.
These specific strategies cannot be provided by a standard SAT or SMT solver.
Moreover, extensions of standard solvers in this direction (e.g. by modifications of
the heuristics) are not possible in general, because most of the problem-specific
information is lost when encoding the instance. SAT and SMT solvers just have a
clause database or constraint database, respectively. Thus, strategies like the ones
described above can only be exploited with a solver that is based on a problem-
specific representation.8

8 In principle, this problem can be prevented by introducing additional constraints to the problem

instance. But then, the encoding becomes inefficient due to a very large number of constraints.
4.3 Improved Exact Synthesis 79

4.3.1.2 Dedicated Solve Techniques for Toffoli Circuit Synthesis

To overcome the limitations discussed above, the solver framework SWORD is ap-
plied. While SAT solvers provide strategies optimized for clauses and SMT solvers
for bit-vector constraints, respectively, SWORD makes problem-specific informa-
tion available by using so called modules. These modules enable the implemen-
tation of dedicated heuristic as well as implication strategies, while still utilizing
sophisticated SAT techniques such as conflict analysis or learning. In the following,
the application of SWORD to Toffoli circuit synthesis is described. Section 2.3.2.3
gives a brief overview of the underlying solving techniques (starting on p. 24).
For Toffoli circuit synthesis, dedicated modules have been developed that incor-
porate the problem-specific strategies described above. More precisely, a concrete
Toffoli synthesis instance for the reversible function f to be synthesized with d
gates includes d modules in a cascade structure—one module for each depth k.
Each module has access to its relating variables tk , ck , xki , and xk+1
i . The functional-
ity of a Toffoli gate is defined by methods of the module, i.e. a concrete Toffoli gate
function is selected by assigning tk and ck . Then, each module realizes the decision
and implication strategies as described in the following.

Decision Strategies The decision heuristic chooses a variable to be assigned if no


further implication is possible. Therefore, decisions that cause many implications
are preferred. Both, the global decision strategy (deciding which module should
make the next decision) and the local decision strategy (deciding which variable
of the chosen module should be assigned next) are motivated by the following two
observations:
• A module can imply many other assignments if the target line t k of the repre-
sented gate is known (i.e. if tk is completely assigned). In this case, the input xki
and the output xk+1
i for all lines, except the target line, have to be equal.
• The assignment of nearly all gate inputs xki are either given by the truth table (if
k = 0) or they can be implied if the types of the previous gates are defined.
These observations lead to the following decision heuristics:
• Global decision heuristic
Modules whose target lines are still undefined are selected for a decision. If all
target lines of each module are defined, the module that still has other unassigned
variables (from the vector ck for example) is selected. During this process the
modules are considered in ascending order starting with depth k = 0.
• Local decision heuristic
Variables representing the target line of the respective Toffoli gate are decided
first. If all target line variables are assigned, variables corresponding to the control
lines are decided.
Since the overall decision strategy (global and local) ensures that the gates become
completely defined from the first gate to the last gate, there is no need for deciding
the variables representing the inputs or outputs. These variables are implied after
the corresponding types of the previous gates are decided.
80 4 Exact Synthesis of Reversible Logic

Fig. 4.8 Propagate routine (1) for each (truth table line i)
for module at depth k (2) for each (circuit line j )
(3) k = x k+1 // input = output
if (xij ij
(4) imply(tk ); // use value of j
(5)
(6) for each (circuit line j )
(7) if (j == [tk ]2 ) continue;
(8) k or x k+1 );
imply(xij ij
(9)
(10) flipTargetLine = true;
(11) for each (clk ∈ ck )
(12) if (clk == 1 ∧ x k k == 0)
it +l mod n)
(13) flipTargetLine = false;
(14) break;
(15) if (!flipTargetLine)
(16) imply(x k+1 ); // use value of x k k
it k it
(17) else
(18) if (ck completely defined)
(19) imply(x k+1 k
k ); // use value of x k ⊕ 1
it it

Propagation Strategies The propagation procedures of a module consider the


connected variables for the implication of the values. The pseudo-code of the propa-
gation routine is shown in Fig. 4.8. The propagation routine consists of three parts:
1. Propagate the position of the target line (lines 2–4)
If the assignment to an input is not equal to the assignment to the corresponding
output of the same circuit line, then this line has to be the target line of the gate.
In this case, the position j of this target line is assigned to tk .
2. Propagate non-target lines (lines 6–8)
If the target line is known, all outputs are implied whose corresponding inputs
are assigned (except the ones at the target line). This also holds vice versa. Thus,
the output (input) is assigned to the value of the corresponding input (output).
3. Propagate the target line (lines 10–19)
In the last step the output of the target line is assigned. To do so, the assignment to
the control lines ck and to the corresponding input variables are considered. If a
circuit line is a control line and the input of this line is assigned to 0 (line 12), then
the assignment to the output of the target line has to be equal to its corresponding
input assignment (line 16). Otherwise, if additionally ck is completely defined
(i.e. no other control line with input value 0 can occur), the assignment to the
output of the target line has to be equal to the inverted assignment to the input of
this line (line 19).
The presented decision and propagation strategies incorporated in the modules
replace the respective functional and exclusion constraints. Furthermore, due to
the problem-specific checks and heuristics, the overall formulation is lifted to a
higher level of abstraction leading to faster run-times as shown in the evaluation
4.3 Improved Exact Synthesis 81

in Sect. 4.3.3. Nevertheless, the applied formulation still has to consider all truth
table lines of f , i.e. the formulation is still of exponential size. How to overcome
this drawback is described in the next section.

4.3.2 Quantified Exact Synthesis

So far, for the respective checks “Is there a circuit with exactly d gates that realizes
the given reversible function f ?”, several encodings on different levels of abstrac-
tions, i.e. on Boolean level, bit-vector level, or problem-specific level have been
introduced. In all variations the problem is encoded for each truth table line sepa-
rately. That is, the respective constraints representing the circuit to be synthesized
are not built only for one truth table line, but they are duplicated for the remaining
2n − 1 truth table lines. Thus, the instances grow exponentially with respect to the
number n of variables.
In this section, an alternative problem formulation based on Quantified Boolean
Formulas (QBF) (see Sect. 2.3.2) is introduced. QBF allows to encode the synthesis
problem in polynomial size, i.e. the circuit to be synthesized is encoded only once
and the specification of the considered function f is enforced by quantification. In
doing so, complexity is moved from the problem description to the solving engine.
In the following, the concrete method is described using a new formulation based
on a universal gate type definition. This does not only enable to synthesize Toffoli
circuits, but also reversible circuits consisting of Fredkin and Peres gates, respec-
tively. Finally, it is shown how the resulting formulation can be solved using QBF
solvers and Binary Decision Diagrams (BDDs).

4.3.2.1 Quantified Problem Formulation

For the synthesis of a function f with n inputs/outputs into a reversible circuit, a


set GT = {g0 , . . . , gq−1 } of q ∈ N different gate types is considered. The set GT
is used to distinguish between all possible gate types in n variables. According to
the chosen gate library (i.e. Toffoli gates, Fredkin gates, and/or Peres gates), the
cardinality of GT varies. More precisely, let f : Bn → Bn be a reversible function
to be synthesized. Then, there are
• n · 2n−1 different multiple control Toffoli gate types,
• n · (n − 1) · 2n−2 different multiple control Fredkin gate types, and
• n · (n − 1) · (n − 2) different Peres gate types.
If the gate library used for synthesis consists of more than one gate type, then
the numbers above have to be added. For example, in the case of a gate library
containing multiple control Toffoli gates and multiple control Fredkin gates for
the synthesis of a 3 variable function, GT contains 3 · 23−1 + 3 · (3 − 1) · 23−2 =
12 + 12 = 24 different gates in total.
Before the synthesis problem is formulated as a QBF instance, the universal gate
is defined that covers the functionality of all gates given in the set GT.
82 4 Exact Synthesis of Reversible Logic

Definition 4.4 Let f : Bn → Bn be a reversible function to be synthesized. Then, a


universal gate represents the function

U GT (X, Y ) : Bn × Blog q → Bn with



gt (X), if t = [y1 . . . ylog q ]2 < q,
U (X, Y ) =
GT
X, otherwise

where
• X = {x1 , . . . , xn } is the set of the inputs of the gate and
• Y = {y1 , . . . , ylog q } is the set of variables representing a binary encoding of a
natural number t, which defines the type gt of the gate (in the following called
gate select inputs).
According to the assignments to the gate select inputs Y , a universal gate U GT acts
either as a gate from the given set GT or as the identity gate.

Remark 4.3 The variables Y = {y1 , . . . , ylog q } are comparable to the variables tk
and ck used in the previous sections to define the type of a Toffoli gate. However,
since now Fredkin and Peres gates are additionally considered, tk and ck cannot be
applied any longer and thus are replaced by Y . Furthermore, the identity gate has
been added to the definition of a universal gate to handle the case where the set GT
does not exactly contain a power of two gate types. In this case, GT is extended by
identity gates to fill the gap. In doing so, exclusion constraints are not needed any
longer.

Having a universal gate as a basis, a cascade of universal gates is defined as


follows:

Definition 4.5 Let f be a reversible function to be synthesized with at most d gates


from the set GT. Then, a function F d is built representing the cascade structure of
d universal gates U GT (X1 , Y1 ), . . . , U GT (Xd , Yd ). The output of the i-th universal
gate (0 < i ≤ d) is equal to the input of the next gate, i.e. U GT (Xi , Yi ) = Xi+1 .

Figure 4.9 shows the resulting cascade structure of the function F d for d uni-
versal gates. Using this structure, any reversible circuit containing d gates can
be obtained by assigning the respective values to each of the gate select input
variables yij ∈ Yi (0 < j ≤ log q). In other words, if a circuit realization with
at most d gates for the reversible function f exists, there has to be at least
one assignment to all variables yij ∈ Yi such that F d is equal to f . More for-
mally, if f is synthesizable with at most d gates the quantified Boolean formula
∃y11 . . . ∃ydlog q ∀x1 . . . ∀xn (F d = f ) holds. This represents the new encoding of
the synthesis problem which can be solved either by a QBF solver or by BDDs as
described in the following.
4.3 Improved Exact Synthesis 83

Fig. 4.9 Problem formulation

4.3.2.2 Implementations

Based on the proposed QBF formulation for the synthesis problem, two approaches
can be applied to solve the formula: First, the problem is encoded as an instance
of quantified Boolean satisfiability, which is given to a QBF solver. Second, the
function F d = f is constructed as a BDD and thereafter the quantification is carried
out on the BDD. A solution exists if the final BDD is not the constant 0-function.
Moreover, all solutions can be extracted by traversing all paths to the 1-terminal.
For both approaches, the incremental nature of F d is exploited during the con-
struction of the formula. That is, first the formula F 0 = (x1 , . . . , xn ) is built for
depth d = 0. Then, for each iteration, the function F d is incrementally built by ap-
plying F d = U GT (U GT (. . . (U GT (F 0 , Y1 ), Y2 ) . . . , Yd−1 ), Yd ). Finally, the equation
to f is constrained.
The next two paragraphs describe the respective steps for both approaches in
more detail.

Using QBF Solvers To use a common QBF solver, the formula F d = f is trans-
formed into CNF, i.e. a representation that consists of Boolean variables and clauses.
Then, the resulting set of clauses represents a cascade of d universal gates which
has to meet the specification of f . The complete QBF instance is formed by adding
the respective existential and universal quantifiers followed by an existential quan-
tifier for the auxiliary variables added during the transformation into CNF (de-
noted as A in the following). Overall, this leads to the following quantification:
∃y11 . . . ∃ydlog q ∀x1 . . . ∀xn ∃A. Together with the CNF, afterwards this is passed to
a QBF solver. In the case that the instance is satisfiable, a circuit realization of the
function can be obtained from the assignments to the variables yij ∈ Yi . Otherwise
it has been proven that no circuit realizing f with d gates exists.

Using BDDs As shown later in the experiments, the performance of the QBF
solver approach is poor. Therefore, BDDs are used as an alternative. That is, in-
stead of building a quantified CNF and solving this instance with a QBF solver, the
synthesis is carried out on a BDD representation.
84 4 Exact Synthesis of Reversible Logic

To this end, the BDD for the formula F d = f is build. This can be done ef-
ficiently using a state-of-the-art BDD package (e.g. CUDD [Som01]). The fixed
variable order X, Y has thereby been applied. The alternative order Y, X leads to
a blow-up of the BDD representation, since in this case, the BDD for F d would
already represent all possible functions in n variables which are synthesizable with
at most d gates. During the construction, isomorphic functions that result from the
n output functions for F d are shared.
After the computation of the equality, the resulting BDD is a single output func-
tion. For this BDD, the universal quantification of all xi variables is carried out.
This is a standard operation available in a BDD package. The idea is to compute the
product of the positive co-factor and the negative co-factor for a universally quan-
tified variable, i.e. ∀x h(. . . , x, . . .) = h(. . . , 0, . . .) · h(. . . , 1, . . .). If the final BDD
consists of the 0-terminal, then no reversible circuit with the given depth d exists
for the function f . Otherwise, there is at least one path to the 1-terminal. Each of
those paths represents an assignment to all variables yij ∈ Yi and thus can be con-
verted into a concrete circuit realization. Since the BDD represents not only one
but all 1-paths, in fact all realizations with the given depth are found in one single
step. All solutions are of interest, since one can choose the best result with respect
to quantum cost which is discussed later in the experiments.

4.3.3 Experimental Results

The proposed improvements for exact synthesis have been implemented in C++.
According to the respective encoding, the SMT solver MathSAT [BBC+05], the
solver framework SWORD [WFG+07] (see also Sect. 2.3.2.3), the QBF solver
sKizzo [Ben05], and the BDD package CUDD [Som01] have been used as solv-
ing engine, respectively. In this section, the described encodings are compared to
each other as well as to the SAT-based encoding with MiniSAT [ES04] as solver.
It is shown that higher levels of abstractions significantly improve the run-time
when performing exact synthesis. Moreover, using the quantified formulation, fur-
ther speed-ups can be documented and additionally the quality of the results can be
strengthened. As benchmarks, again functions from RevLib [WGT+08] have been
applied. All experiments have been carried out on an AMD Athlon 3500+ with
1 GB of memory. The timeout was set to 3000 CPU seconds.

4.3.3.1 Exploiting Higher Levels of Abstractions

The original SAT encoding for exact synthesis (denoted by SAT) has been lifted to
two higher levels of abstractions. First, instead of a Boolean formulation in CNF, the
problem has been encoded in bit-vector logic that can be handled by SMT solvers
(denoted by SMT). Second, problem-specific strategies developed within the solver
framework SWORD provide an alternative (denoted by SWORD).
4.3 Improved Exact Synthesis 85

Results obtained by both approaches are summarized in Table 4.8. The first col-
umn provides the name of the function. Column n denotes the number of variables
for each function, while in Column d the minimal number of Toffoli gates necessary
to synthesize the function is given. The following columns provide the run-time of
the respective synthesis approaches in CPU seconds (denoted by T IME). Further-
more, the improvements of the SMT approach and the SWORD approach are given
in the last two columns, i.e. the run-time of MiniSAT divided by the run-time of
MathSAT/SWORD (denoted by I MPR SAT ) and the run-time of MathSAT divided by
the run-time of the SWORD approach (denoted by I MPR SMT ), respectively.
The results clearly show that the chosen encoding is crucial for the resulting run-
times. For most of the functions a corresponding Toffoli circuit can be synthesized
faster by the SMT approach than by using the SAT encoding. Only in some cases
the SAT-based approach is slightly better. However, this only holds for functions
that can be synthesized in less than one second, e.g. peres or fredkin. Overall, im-
provements of up to three orders of magnitude are achieved.
Moreover, it is evident that the problem-specific approach outperforms the other
methods. In many cases, the run-times are further reduced by a factor of ap-
prox. 30—in the best case by a factor of over 170. Furthermore, using the problem-
specific approach, Toffoli circuits for the functions alu-v2 and alu-v3 are synthesized
within the given timeout. In comparison to the SAT synthesis approach, speed-ups
of up to four orders of magnitude are reported.

4.3.3.2 Quantified Exact Synthesis

To evaluate the quantified exact synthesis encodings, three series of experiments


have been carried out. First, the performance of the QBF solver and the BDD ap-
proach, respectively, is compared to each other as well as to previous encodings.
Second, the fact that (using the BDD approach) all circuits for a given depth are
available in parallel is evaluated. And finally, synthesis results for different gate
libraries are considered.

Run-time Comparison The run-times of both, the proposed QBF solver encoding
(denoted by QBF S OLVER) as well as the BDD formulation (denoted by BDD S),
is compared to the SAT-based and the SWORD-based approach (denoted by SAT
and SWORD, respectively). A similar set of functions as in the previous sections
was applied. Only some trivial functions (e.g. peres, fredkin) have been omitted. In
contrast, two further functions, i.e. hwb4 and 4_49 (also taken from [WGT+08]), are
additionally considered. The results are given in Table 4.9. The first columns show
the name of the function as well as the number of lines (n) and the minimal number
of Toffoli gates (d) of the resulting circuit, respectively. In the remaining columns,
the run-times in CPU seconds (denoted by T IME) and the improvements of the new
approaches with respect to the SAT solver (denoted by I MPR SAT ) and with respect to
SWORD (denoted by I MPR SW. ) are given, respectively. The improvement is thereby
obtained by the run-time of the SAT/SWORD approach divided by the run-time of
the QBF solver/the BDD approach.
86 4 Exact Synthesis of Reversible Logic

Table 4.8 Experimental results if higher levels of abstractions are exploited


F UNCTION SAT SMT SWORD
n d T IME T IME I MPRSAT T IME I MPRSAT I MPRSMT

R EVERSIBLE FUNCTIONS
peres 3 2 0.01 0.03 0.33 <0.01 >1.00 >33.00
fredkin 3 3 0.03 0.12 0.25 <0.01 >3.00 >24.00
peres-double 3 4 2.35 0.36 6.53 0.01 235.00 36.00
miller 3 5 0.23 0.22 1.05 <0.01 >23.00 >22.00
mod5mils 5 5 48.28 3.81 12.67 0.08 603.50 47.63
ham3 3 5 0.60 0.29 2.07 0.01 60.00 29.00
ex-1 3 4 0.12 0.20 0.60 <0.01 >20.00 >12.00
graycode3 3 2 0.01 0.06 0.17 <0.01 >1.00 >6.00
graycode4 4 3 0.64 0.24 2.27 <0.01 >64.00 >24.00
graycode5 5 4 22.08 1.00 22.08 <0.01 >2208.00 >100
graycode6 6 5 583.14 3.25 179.43 0.12 4859.50 27.08
3_17 3 6 0.43 0.72 0.59 0.03 14.33 24.00
mod5d1 5 7 2094.13 135.36 15.47 11.21 186.80 12.07
mod5d2 5 8 1616.07 56.72 28.49 9.06 178.37 6.26
mini_alu 4 5 27.60 3.85 7.17 0.03 920.00 123.33

E MBEDDED IRREVERSIBLE FUNCTIONS


rd32-v0 4 4 2.97 0.54 5.50 <0.01 >297.00 >54.00
rd32-v1 4 5 13.51 1.84 7.34 0.04 337.75 46.00

decod24-v0 4 6 6.54 1.33 4.92 0.02 327.00 66.50


decod24-v1 4 6 6.22 1.44 4.32 0.09 69.11 16.00
decod24-v2 4 6 7.25 1.35 5.37 0.02 362.50 67.50
decod24-v3 4 7 28.88 3.31 8.73 0.18 160.44 18.39

4gt4-v0 5 6 >3000.00 697.37 >4.30 21.19 >141.58 32.91


4gt4-v1 5 5 395.36 33.31 11.87 0.62 637.67 53.73

4gt5-v0 5 5 321.27 36.92 8.70 0.41 783.59 90.05


4gt5-v1 5 4 51.51 10.35 4.98 0.06 858.50 172.50

4gt10-v0 5 5 229.01 47.97 4.77 5.31 43.13 9.03


4gt10-v1 5 6 2417.04 234.85 10.29 14.69 164.54 15.99

4gt11-v0 5 3 8.54 1.32 6.47 0.01 854.00 132.00


4gt11-v1 5 4 33.39 3.30 10.12 0.24 139.13 13.75

4gt12-v0 5 5 441.79 25.86 17.08 2.39 184.85 10.80


4gt12-v1 5 5 470.96 41.17 11.44 9.53 49.42 4.32
4.3 Improved Exact Synthesis 87

Table 4.8 (Continued)


F UNCTION SAT SMT SWORD
n d T IME T IME I MPRSAT T IME I MPRSAT I MPRSMT

4gt13-v0 5 3 6.75 0.74 9.12 0.01 675.00 74.00


4gt13-v1 5 4 32.99 4.35 7.58 0.21 157.10 20.71

4mod5-v0 5 5 122.54 12.70 9.65 0.69 177.59 18.40


4mod5-v1 5 5 413.21 43.86 9.42 0.48 860.85 91.38

4mod7-v0 5 6 665.66 34.68 19.19 2.99 222.63 11.60


4mod7-v1 5 7 2055.80 100.07 20.54 133.97 15.34 0.75

one-two-three-v0 5 8 2292.26 443.21 5.17 55.66 41.18 7.96


one-two-three-v1 5 8 2094.26 481.53 4.35 71.72 29.20 6.71
one-two-three-v2 5 8 >3000.00 609.96 >4.91 78.05 >38.44 7.81
one-two-three-v3 5 8 >3000.00 250.36 >11.98 136.00 >22.06 1.80

alu-v0 5 6 1998.83 223.48 8.94 8.76 228.18 25.51


alu-v1 5 7 >3000.00 1692.29 >2.95 369.14 >13.54 4.58
alu-v2 5 7 >3000.00 >3000.00 – 840.25 >3.57 >3.57
alu-v3 5 7 >3000.00 >3000.00 – 764.04 >3.93 >3.93

From the results, it is easy to see that utilizing QBF leads to significant improve-
ments for both, the QBF solver and the BDD approach, in comparison to the com-
mon SAT solving techniques. Only if additional knowledge is utilized, as done by
SWORD, the QBF solver method is outperformed. However, the BDD approach
for QBF leads to the smallest overall synthesis time for non-trivial functions. That
is, for some functions the run-time, indeed, is higher than for SWORD, but this
only holds for functions with an overall synthesis time of less than one second
(e.g. graycode6 and decod24-v0). For all other functions, better run-times are doc-
umented. In the best case (hwb4), an improvement of more than a factor of 100 is
achieved.

Quantum Costs of Resulting Circuits After the efficiency of the BDD approach
has been shown with respect to run-time, further experiments demonstrate the qual-
ity of the obtained results. As described in the preliminaries, quantum costs provide
a good measurement of the complexity of the resulting circuits. The quantum costs
thereby depend on the used Toffoli gates. Thus, it may be an advantage to determine
not only one, but more Toffoli circuits for a given function. Then, by checking the
resulting quantum costs for each of the realizations obtained the cheapest one with
respect to quantum costs can be selected.
Previous approaches for minimal Toffoli circuit synthesis determine only one
circuit in each run. In contrast, using BDDs as described in Sect. 4.3.2 leads to
88 4 Exact Synthesis of Reversible Logic

Table 4.9 Comparison of quantifed encodings


F UNCTION SAT- BASED QBF- BASED

SAT SWORD QBF S OLVER BDD S


n d T IME T IME T IME I MPRSAT I MPRSW. T IME I MPRSAT I MPRSW.

R EVERSIBLE FUNCTIONS

mod5mils 5 5 48.28 0.08 32.22 1.50 <0.01 0.15 321.87 0.53


graycode6 6 5 583.14 0.12 145.02 4.02 <0.01 0.46 1267.69 0.33
3_17 3 6 0.43 0.03 0.19 2.26 0.16 0.01 43.00 3.00
mod5d1 5 7 2094.13 11.21 405.96 5.16 0.03 1.68 1246.50 6.67
mod5d2 5 8 1616.17 9.06 337.49 4.79 0.03 3.84 420.88 2.36
hwb4 4 11 >3000.00 >3000.00 >3000.00 – – 20.38 >147.20 >147.20
4_49 4 12 >3000.00 >3000.00 >3000.00 – – 837.92 >3.58 >3.58

E MBEDDED IRREVERSIBLE FUNCTIONS

rd32-v0 4 4 2.97 <0.01 0.22 13.50 <0.05 0.01 297.00 <1.00


rd32-v1 4 5 13.51 0.04 0.35 38.60 0.11 0.03 450.33 1.33
4mod5-v0 5 5 122.54 0.69 40.01 3.06 0.02 0.20 612.70 3.45
4mod5-v1 5 5 413.21 0.48 44.63 9.25 0.01 0.16 2582.56 3.00
decod24-v0 4 6 6.54 0.02 0.97 6.74 0.02 0.04 163.50 0.50
decod24-v1 4 6 6.22 0.09 1.28 4.86 0.07 0.04 155.50 2.25
decod24-v2 4 6 7.25 0.02 1.03 7.04 0.02 0.03 241.66 0.66
decod24-v3 4 7 28.88 0.18 2.00 14.44 0.09 0.05 577.60 3.60
alu-v0 5 6 1998.83 8.76 181.99 10.98 0.05 2.73 732.17 3.21
alu-v1 5 7 >3000.00 369.14 >3000.00 – – 30.42 >98.62 12.13
alu-v2 5 7 >3000.00 840.25 >3000.00 – – 34.72 >86.41 24.20
alu-v3 5 7 >3000.00 764.04 >3000.00 – – 45.69 >65.66 16.72

all possible circuits in parallel. The differences in the resulting quantum costs are
documented in Table 4.10. Column #S OL denotes the number of solutions found by
the BDD approach, while QC denotes the minimal as well as the maximal quantum
costs for the determined realizations.
Considering the quantum costs of the obtained Toffoli circuits lead to further sig-
nificant improvements. For example, circuits representing function 4_49 have quan-
tum cost of 32 in the best case, while in the worst case quantum cost of more than
70 are required. Thus, in contrast to previous algorithms, the BDD-based synthesis
is not only faster but also another quality criterion—the resulting quantum costs—is
applicable.

Synthesis with Extended Libraries Finally, the application of further gate types
to the BDD-based synthesis is shown. This is done by extending the universal gate
formula with further gates, i.e. Fredkin and Peres gates.
4.3 Improved Exact Synthesis 89

Table 4.10 Quantum costs


of resulting circuits F UNCTION d #S OL QC

R EVERSIBLE FUNCTIONS
mod5mils 5 12 13–13
graycode6 5 1 5–5
3_17 6 7 14–14
mod5d1 7 1208 11–15
mod5d2 8 135 12–20
hwb4 11 264 23–39
4_49 12 374 32–72

E MBEDDED IRREVERSIBLE FUNCTIONS


rd32-v0 4 4 12–12
rd32-v1 5 20 13–13
4mod5-v0 5 1176 9–21
4mod5-v1 5 592 9–25
decod24-v0 6 75 10–34
decod24-v1 6 3 14–22
decod24-v2 6 23 14–26
decod24-v3 7 1950 11–43
alu-v0 6 824 14–38
alu-v1 7 850 15–27
alu-v2 7 16296 15–55
alu-v3 7 132 15–39

The results are shown in Table 4.11. The respective depth (d), the run-time of the
synthesis (T IME), the number of solutions (#S OL), and the quantum costs (QC) are
listed. MCT+MCF denotes the results for a set of gates including multiple control
Toffoli and multiple control Fredkin gates, MCT+P denotes the results for the set
of gates including multiple control Toffoli and Peres gates, and MCT+MCF+P
denotes the results for the set of all three gate types.
As expected, extending the gate library leads to smaller realizations as for ex-
ample the results for hwb4 show. While the minimal MCT-circuit for this function
consists of eleven gates, it can be reduced by three more gates if additionally Peres
gates are used. Furthermore, improvements with respect to the number of gates can
be achieved for alu, 3_17, mod5d2, 4_49, rd32 and decod24, respectively.
However, with an increasing number of gates to be considered, the run-times
increase as well. This can be seen e.g. for function 4_49 or 4mod5. Only for the
functions where the extension of the gate library leads to smaller circuits, the run-
times sometimes decrease (e.g. for function alu with the MCT+MCF library), since
fewer iterations of the main flow have to be performed (see Sect. 4.1).
Table 4.11 Synthesis results using other gate libraries
90

F UNCTION MCT+MCF MCT+P MCT+MCF+P


n d T IME #S OL QC d T IME #S OL QC d T IME #S OL QC

C OMPLETELY S PECIFIED F UNCTIONS


mod5mils 5 5 0.52 12 13–13 5 0.90 24 12–13 5 1.50 24 12–13
graycode6 6 5 2.77 1 5–5 5 4.22 1 5–5 5 7.19 1 5–5
3_17 3 5 0.02 2 15–15 5 0.02 9 11–12 5 0.03 43 11–20
mod5d1 5 7 15.08 1352 11–19 7 63.62 5632 10–23 7 250.81 5856 10–25
mod5d2 5 8 63.96 135 12–20 6 4.61 8 12–19 6 9.60 8 12–19
hwb4 4 9 37.36 774720 31–51 8 10.03 164 23–29 8 64.83 1084 23–33
4_49 4 10 1193.90 32 54–58 – >3000.00 – – – >3000.00 – –

I NCOMPLETELY S PECIFIED F UNCTIONS


rd32-v0 4 4 0.02 4 12–12 2 0.01 4 8–8 2 0.02 4 8–8
rd32-v1 4 5 0.10 780 11–41 3 0.02 12 9–9 3 0.04 12 9–9
4mod5-v0 5 5 4.09 3672 9–23 5 2.96 35088 8–27 5 36.32 58176 8–39
4mod5-v1 5 5 3.52 2792 9–29 4 0.25 768 7–18 4 0.89 768 7–18
decod24-v0 4 5 0.05 13 11–25 4 0.03 5 13–14 4 0.04 8 13–16
decod24-v1 4 5 0.04 12 15–23 5 0.05 268 14–27 5 0.09 913 14–29
decod24-v2 4 6 0.09 1435 12–38 5 0.06 180 11–23 5 0.09 911 11–31
decod24-v3 4 6 0.07 292 12–32 5 0.06 300 11–29 5 0.09 513 11–31
alu-v0 5 4 0.43 22 16–30 6 181.43 29900 12–64 4 1.65 38 16–30
alu-v1 5 5 6.73 114 17–33 6 202.87 638 18–32 5 97.95 198 17–33
alu-v2 5 5 8.66 224 17–39 6 189.73 280 22–40 5 108.39 402 17–39
alu-v3 5 5 10.00 126 17–33 – >3000.00 – – 5 124.04 431 17–34
4 Exact Synthesis of Reversible Logic
4.4 Summary and Future Work 91

4.4 Summary and Future Work

Even if they are only applicable to small functions, exact synthesis methods are im-
portant in the context of evaluating heuristic methods, determining minimal building
blocks (e.g. for the BDD-based synthesis as introduced in Sect. 3.2), and other as-
pects as well. In this chapter, several approaches based on satisfiability techniques
have been introduced enabling exact synthesis for functions with up to six vari-
ables and leading to circuits with up to twelve gates. A comparison with the results
obtained by previous approaches showed: Smaller circuits than the currently best
known ones have been synthesized or, for the first time, the minimality of existing
circuits was approved.
Furthermore, it was shown that choosing an encoding for exact Toffoli cir-
cuit synthesis is crucial to the resulting run-times. Lifting the originally proposed
Boolean SAT encoding to the SMT- or a problem-specific level accelerates the syn-
thesis time by three or four orders of magnitudes, respectively. Complementary,
applying quantifiers together with BDDs, a further speed-up by more than a factor
of 100 can be observed in the best case.
By the approaches proposed in this chapter, the number of reversible gates as well
as the number of quantum gates (and therewith also quantum cost) have been consid-
ered. Depending on the addressed physical realization, further constraints (e.g. con-
current gates, adjacent gates, etc.) are important as well. Thus, exact synthesis with
respect to further cost criteria might be a topic for future work. Chapter 6 discusses
this aspect in detail and also proposes one respective exact approach for this aim.
The results of this chapter build the basis for future investigations. The QBF
encoding leads to the best results (in particular, with respect to run-time), since
it does not need the exponential duplication of the instance for each truth table
line. However, this encoding is still on the Boolean level, since a BDD is used as
underlying solving engine. Thus, lifting the quantified encoding to higher levels of
abstraction (as done for the Boolean SAT encoding) would be a promising task for
future work. Therefore, respective solving engines efficiently supporting quantifiers
must be available first.
Additionally, the encoding itself is improvable. So far, all possible gate type com-
binations are tried by the solving engine. But, many combinations are redundant
and can be ignored. Identifying easy to detect redundancies and exclude those from
the search space may accelerate the solving process. Also, special function classes
(e.g. symmetric functions) probably can be synthesized faster if the respective prop-
erties are fully exploited in the encoding. Furthermore, the application of advanced
solving techniques like incremental SAT (see e.g. [WKS01]) seems to be promis-
ing, since several iterations of very similar instances are sequentially solved in the
proposed synthesis approach.
Chapter 5
Embedding of Irreversible Functions

Quite often reversible logic should be synthesized for irreversible functions. Thus,
the problem of embedding is an important aspect. Partially, how to handle irre-
versible functions during synthesis has already been discussed in the previous chap-
ters (see e.g. Sects. 3.1.1 and 4.2.3). Additional lines are thereby introduced and the
resulting constant inputs, garbage outputs, and don’t care conditions are arbitrar-
ily assigned to concrete values. Further options exist how (i.e. in which order) to
arrange the outputs in the circuit to be synthesized. Overall, functions can be em-
bedded in different ways whereby the concrete don’t care assignments as well as
the chosen output arrangement may have a significant impact on the resulting cir-
cuit size. As an example, for BDD-based synthesis introduced in Sect. 3.2 different
output orders are applied to the building block-functions as they lead to better sub-
stitutions of the respective nodes. Since in the last chapters synthesis approaches (in
particular the transformation-based approach and the exact synthesis method) have
been described, now they can be used to evaluate the effect of different embeddings.
In this chapter, the different aspects of embedding mentioned above are investi-
gated in detail. First, strategies for the don’t care assignment [MDW09, MWD09]
are proposed. More precisely, a greedy approach, a method based on the Hungar-
ian algorithm, and an XOR-based strategy are introduced. Even if these strategies
address don’t care assignments of the outputs only, it can be shown that the chosen
method is crucial to the synthesis results.
Afterwards, the order of outputs in the function to be synthesized is considered.
Usually, each output is set to a fix position. But since in general, the output order is
irrelevant for a given reversible function f , a new synthesis paradigm [WGDD09]
is proposed that determines an equivalent circuit realization for f modulo the out-
put permutation. That is, the result of the synthesis is a circuit whose outputs have
been permuted. Therefore, distinct methods to efficiently determine “good” output
permutations are introduced. As a result, significantly smaller circuits (even smaller
than the ones previously obtained by the exact approaches) can be synthesized if
this new synthesis paradigm is applied.
In the following, the embedding problem together with the number of possibili-
ties is described in Sect. 5.1, which therewith builds the motivation for the remain-
ing sections. Afterwards, the approaches for don’t care determination (Sect. 5.2)
R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 93
DOI 10.1007/978-90-481-9579-4_5, © Springer Science+Business Media B.V. 2010
94 5 Embedding of Irreversible Functions

Table 5.1 Embedding of an adder


(a) Original adder (b) Incomplete embedding (c) Complete embedding
cin x y cout sum 0 cin x y cout sum g1 g2 0 cin x y cout sum g1 g2

0 00 0 0 0 0 0 0 0 0 – – 0 0 0 0 0 0 0 0
0 01 0 1 0 0 0 1 0 1 – – 0 0 0 1 0 1 0 0
0 10 0 1 0 0 1 0 0 1 – – 0 0 1 0 0 1 0 1
0 11 1 0 0 0 1 1 1 0 – – 0 0 1 1 1 0 0 0
1 00 0 1 0 1 0 0 0 1 – – 0 1 0 0 0 1 1 0
1 01 1 0 0 1 0 1 1 0 – – 0 1 0 1 1 0 0 1
1 10 1 0 0 1 1 0 1 0 – – 0 1 1 0 1 0 1 0
1 11 1 1 0 1 1 1 1 1 – – 0 1 1 1 1 1 0 0
1 0 0 0 – – – – 1 0 0 0 0 0 0 1
1 0 0 1 – – – – 1 0 0 1 0 0 1 0
1 0 1 0 – – – – 1 0 1 0 0 0 1 1
1 0 1 1 – – – – 1 0 1 1 0 1 1 1
1 1 0 0 – – – – 1 1 0 0 1 0 1 1
1 1 0 1 – – – – 1 1 0 1 1 1 0 1
1 1 1 0 – – – – 1 1 1 0 1 1 1 0
1 1 1 1 – – – – 1 1 1 1 1 1 1 1

and output permutation (Sect. 5.3) are proposed and evaluated. At the end of this
chapter, all results are summarized and future work is sketched.

5.1 The Embedding Problem


As already described in Sect. 3.1.1, at least g = log2 (μ) additional outputs are re-
quired to embed a completely specified irreversible function into a reversible func-
tion, where μ is the maximum number of times an output pattern is repeated in the
irreversible function. For an irreversible function f : Bn → Bm , this means that the
reversible embedding has m + g outputs. Furthermore, c constant inputs must be
added such that n + c = m + g.
Once the garbage outputs and constant inputs are added, an open issue is how
to assign the don’t care conditions in the expanded truth table as shown by the
following example.

Example 5.1 Consider the adder function shown in Table 5.1(a). This function has
three inputs (the carry-in cin as well as the two summands x and y) and two outputs
(the carry-out cout and the sum). The function is irreversible, because the number
of inputs differs from the number of outputs. Since the output pattern 01 appears
three times (as does the output pattern 10), adding one additional output (leading
to the same number of input and outputs) cannot make the function reversible. In
5.1 The Embedding Problem 95

Fig. 5.1 Circuits obtained with different embeddings

fact, log2 (3) = 2 additional outputs (and therewith one constant input) must be
added. This is shown in Table 5.1(b). But since this incompletely specified function
is not applicable for many synthesis approaches, afterwards the don’t cares must
be assigned. One possible, albeit naive, embedding is shown in Table 5.1(c). This
embedding was found by assigning the garbage outputs to the patterns 00, 01, and
10 in order for each of the output patterns in the top half of the table and then
completing the bottom half of the table using the remaining available output patterns
in numerical order.

Remark 5.1 Not every synthesis approach requires a completely specified reversible
function. For example, the SAT-based approach introduced in the last chapter can
also handle don’t cares (see Sect. 4.2.3). However, most of the other synthesis ap-
proaches (as e.g. [Ker04, GAJ06, MDM07] and the transformation-based method
described in Sect. 3.1.2) need a completely specified function. For these approaches,
a completely specified embedding is required.

To appreciate the complexity of choosing a don’t care assignment, consider Ta-


ble 5.1(b). There are 4, 4, 3, 4, 2, 3, 2, and 4 choices for completing the don’t
cares in the top eight rows of the table, respectively for a total of 9216 choices. The
bottom eight rows of the table can then be completed in 8! ways. Lastly, the out-
puts can be permuted in 4! = 24 ways. Combining these yields 9216 · 40320 · 24 =
8, 918, 138, 880 possible embeddings for this small example. Each respective em-
bedding may have an effect on the synthesis results, i.e. on the size of the resulting
circuits. For example, the embedding from Table 5.1(c) lead to the circuit depicted
in Fig. 5.1(a) (obtained by the transformation-based approach from Sect. 3.1.2). In
contrast, using the embedding introduced in Sect. 3.1.1 (see Table 3.2 on p. 29),
a significantly smaller circuit result as shown in Fig. 5.1(b). Thus, in the next two
sections strategies how to determine “good” don’t care assignments and output per-
mutations, respectively, are introduced and evaluated.
96 5 Embedding of Irreversible Functions

5.2 Don’t Care Assignment

In this section, methods for don’t care assignment are presented and evaluated, re-
spectively, that complete a reversible embedding of an irreversible function. It is
assumed that always the minimal number of outputs (i.e. log2 (μ)) is added. Fur-
thermore, all constant inputs are assigned to the value 0 and are always added as the
most significant inputs in the truth table. This leads to a significant computational
advantage as shown below and results in a circuit overhead of at most one NOT gate
per constant input.

5.2.1 Methods

In total, three approaches are introduced: A greedy algorithm, a method based on


the Hungarian algorithm, and an XOR-based procedure. Afterwards, an incomplete
application of the don’t care assignment methods is discussed that can be applied to
many existing synthesis approaches and leads to a significant simplification of the
synthesis process.

5.2.1.1 Greedy Method

The first method for assigning don’t cares is motivated by the basic operation of the
transformation-based synthesis algorithms (see Sect. 3.1.2). Here, gates are chosen
so that each input value of the truth table matches its respective output value (i.e. so
that the identity is achieved). Each line of the truth table is thereby sequentially
traversed. It is thus reasonable to conjecture that assigning the don’t cares so that
the Hamming distance of the output patterns to the corresponding input patterns is
as small as possible, should help to reduce the number of gates required. This firstly
leads to a simple greedy approach.
The truth table is traversed downwards starting at the first row. In each row, the
following two steps are performed:
1. For each distinct output assignment in the embedding, identify the target set of
rows of the table containing that pattern. Then, determine the set of output as-
signments which are found by assigning the don’t cares in all possible ways. The
candidates are arranged in ascending numerical order.
2. For each row in the target set in turn, choose the first remaining candidate assign-
ment with minimal Hamming distance to the input assignment for that row.

Example 5.2 Table 5.2(a) shows the embedding obtained by the greedy method for
the full adder. The circuits synthesized from this assignment as well as from the
naive assignment given in Table 5.1(c) are shown in Fig. 5.2(a) and Fig. 5.2(c),
5.2 Don’t Care Assignment 97

Table 5.2 Resulting embeddings of an adder after don’t care assignment


(a) Greedy/Hungarian method (b) XOR-based method

0 cin x y cout sum g1 g2 0 cin x y cout sum g1 g2

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 1 0 1 0 0 0 1 0 1 1 1
0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0
0 0 1 1 1 0 1 1 0 0 1 1 1 0 0 1
0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0
0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 1
0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0
0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
1 0 0 1 0 0 0 1 1 0 0 1 1 1 1 1
1 0 1 0 0 0 1 0 1 0 1 0 1 1 1 0
1 0 1 1 0 0 1 1 1 0 1 1 0 0 0 1
1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0
1 1 0 1 1 1 0 1 1 1 0 1 0 0 1 1
1 1 1 0 1 1 1 0 1 1 1 0 0 0 1 0
1 1 1 1 0 1 1 1 1 1 1 1 0 1 0 1

Fig. 5.2 Circuits for the embeddings


98 5 Embedding of Irreversible Functions

respectively.1 The greedy assignment method leads to a circuit with 7 gates and
quantum cost of 27, while with the naive embedding a circuit with 20 gates and
quantum cost of 44 results.

5.2.1.2 Applying the Hungarian Algorithm

Furthermore, the Hamming distance can be applied to formulate the don’t care as-
signing problem as an instance of the Hungarian algorithm [HL05]. To this end,
let S be the set of truth table rows sharing a common output pattern in the irre-
versible function and let T be the set of possible assignments to the don’t cares
to complete those rows. |T | is equal to 2g where g is the number of garbage lines
added to permit the embedding of the irreversible function into a reversible one.
Then, the don’t care assignment problem is to associate each element of S with a
unique element from T . Let K(Si , Tj ) be the “cost” of associating the don’t care
assignment Tj with Si , for which the hamming distance is applied. More precisely,
K(Si , Tj ) is the Hamming distance between the completely specified truth table
output pattern and the corresponding input pattern when Si is completed using Tj .
This formulation can be expressed in tabular form with a row for each Si and a col-
umn for each Tj with each K(Si , Tj ) in the corresponding table entry. Assigning the
don’t cares to minimize the total Hamming distance is then a matter of choosing one
entry in each row such that those entries appear in unique columns and such that the
sum of the chosen entries is minimal. This is a standard assignment problem. The
Hungarian algorithm is a well-known method [HL05] for solving the assignment
problem in polynomial time and thus has been applied to solve this instance. The
only issue of note here is that storing the potentially very large assignment matrix
is avoided, since Hamming distance is easily computed as needed—in fact more
quickly than a matrix access.

Example 5.3 Applying the Hungarian algorithm to the considered adder function,
the same assignment as for the greedy method results. This may happen since both
approaches use the Hamming distance as cost metric. Nevertheless, the experiments
in Sect. 5.2.2 show that both assignment methods lead to notable differences for
other (mainly larger) functions.

5.2.1.3 XOR-based Method

The third method proposed for don’t care assignments is based on the observation
that for many functions (in particular for arithmetic) a good embedding of an irre-
versible function into a reversible one is based upon setting the don’t care outputs
to XOR combinations of the primary inputs. More precisely, the following steps are
performed:

1 The transformation-based synthesis method from Sect. 3.1.2 has been used to synthesize these
circuits.
5.2 Don’t Care Assignment 99

1. For each truth table row i of the embedding f : Bn → Bn (i.e. 0 ≤ i < 2n ):


a. Set k = i so that k represents the current input vector as a natural number.
b. Set p = 0, q = 0.
c. For each output fj of the embedding (i.e. 0 ≤ j < n):
i. If fj is a garbage output, set q = q ⊕ kj and pj = q with kj (pj ) denoting
the j th bit-value of the binary encoding of k (p).
ii. Otherwise set pj to the j th bit-value of fi .
In doing so, an output assignment is created (represented by the natural num-
ber p), where the don’t cares are assigned to a value achieved by an XOR-
combination with the respective input values.
d. If p represents an already assigned output pattern, increment k and repeat
from Step 1b.
e. Set the output of the ith truth table line to p.

Example 5.4 Table 5.2(b) shows the embedding obtained by the XOR-based
method for the full adder. The circuit obtained from this assignment is shown in
Fig. 5.2(b) (also synthesized using the transformation-based synthesis method). The
XOR-based method yields a circuit with five gates and quantum costs of 13, which
is significantly smaller than the circuits obtained with the greedy/Hungarian and the
naive embedding, respectively. Overall, these circuits clearly show the importance
of a good don’t care assignment.

5.2.1.4 Incomplete Don’t Care Assignment

As noted above, all constant inputs take the value 0 and are always the most sig-
nificant inputs in the truth table. This means that the irreversible function is always
embedded in the first rows of the reversible truth table, while the remaining rows are
completely don’t care (see e.g. Table 5.1(c)). In particular, if the original function
has n primary inputs and, furthermore, c constant inputs are added, only the first 2n
truth table rows of the embedding are of interest. The remaining (2c − 1)2n rows
can be ignored.
Given this construction, a synthesis method that works row by row from the top of
the truth table (as e.g. the transformation-based synthesis approach from Sect. 3.1.2
and its derivates) can stop after transforming 2n rows. Because of this, it is not
necessary to complete a don’t care assignment beyond the (2n )th row.2

5.2.2 Experimental Results

In this section, experimental results obtained with the described don’t care as-
signing approaches are documented. To this end, (irreversible) functions from

2 Note that when this simplification is employed, bidirectional synthesis methods cannot be applied,

because don’t cares occur in the latter truth table lines so that no definition of the inverse function
is possible.
100 5 Embedding of Irreversible Functions

RevLib [WGT+08] have been embedded using the proposed methods. Afterwards,
the resulting embeddings have been passed to (1) the transformation-based syn-
thesis from Sect. 3.1.2 (denoted by T RANSFORMATION - BASED A LGORITHM) or
(2) to an extended version combining transformation-based synthesis with a search-
based method as proposed in [MWD09] (denoted by C OMBINED S YNTHESIS A L -
GORITHM ), respectively. An AMD Athlon 3500+ with 1 GB of memory was used
for the experiments.
Table 5.3 presents the results for each of the synthesis approaches and don’t care
assignment methods, respectively. For the resulting circuits, the gate count (denoted
by d) and the quantum cost (denoted by QC) are shown. Furthermore, for each func-
tion, the best result with respect to quantum cost is highlighted in bold. Run-times
are not documented, since every circuit in Table 5.3 was found in less than one CPU
second.
The results show that the chosen embedding is crucial to the synthesis results.
For example, the quantum costs of the circuits representing function rd73_69 range
from 1112 to 184 using the combined synthesis approach. Thus, an improvement of
nearly one order of magnitude can be achieved only by modifying the assignment
of the don’t cares.
In the next section, the second aspect of embedding, namely permutation of out-
puts, is considered in detail, where similar results have been achieved.

5.3 Synthesis with Output Permutation


Usually, the outputs of a reversible function (or embedding, respectively) to be syn-
thesized are set to a fix position. Since in general the output order is irrelevant for a
given reversible function f , in this section a synthesis methodology—emphasized
as Synthesis with Output Permutation (SWOP)—is proposed. SWOP determines a
circuit for the function f modulo output permutation. That is, the result is a circuit
representing the desired function but maybe with an adjusted order of outputs. This
enables the determination of smaller realizations.
In a naive way, synthesis with output permutation can easily be applied to existing
approaches just by encoding all permutations, synthesize each in one turn, and keep
the best one. Since each respective output order has to be considered, this results
in an increase of factor n! (where n is the number of variables of the reversible
n!
function). If garbage outputs occur, this complexity can be reduced to g! (where g
is the number of garbage outputs), which might still be a large number. Thus, in
this section two approaches are introduced that efficiently determine “good” output
permutations for both, exact as well as heuristic synthesis approaches. Moreover,
applying the proposed exact synthesis with output permutation, also minimality with
respect to all possible permutations is guaranteed.
In the remainder of this section, both approaches are described in detail. First,
the general idea, the best case benefits, as well as the complexity of the proposed
synthesis paradigm are introduced and discussed in Sect. 5.3.1. Afterwards, exact
SWOP and heuristic SWOP are described in Sects. 5.3.2 and 5.3.3, respectively.
Finally, experimental results are given in Sect. 5.3.4.
Table 5.3 Comparison of don’t care assignment methods
F UNCTION C OMBINED S YNTHESIS A LGORITHM T RANSFORMATION - BASED A LGORITHM
G REEDY H UNGARIAN XOR- BASED G REEDY H UNGARIAN XOR- BASED
n d g d QC d QC d QC d QC d QC d QC

decod24_10 4 2 0 7 11 7 11 7 11 7 11 7 11 7 11
rd32_19 4 1 2 5 13 5 13 5 13 5 17 5 17 5 17
4gt10_22 5 1 4 3 47 3 47 4 40 3 47 3 47 3 47
4gt11_23 5 1 4 1 5 1 5 4 12 1 5 1 5 1 5
5.3 Synthesis with Output Permutation

4gt12_24 5 1 4 3 55 3 55 6 62 3 55 3 55 3 55
4gt13_25 5 1 4 1 13 1 13 3 27 1 13 1 13 1 13
4gt4_20 5 1 4 6 58 6 58 9 65 7 79 7 79 7 79
4gt5_21 5 1 4 3 19 3 19 6 30 3 19 3 19 3 19
4mod5_8 5 1 4 7 19 7 19 5 9 9 25 9 25 9 25
4mod7_26 5 1 2 21 65 21 65 21 65 15 55 15 55 15 55
alu_9 5 0 4 9 65 16 72 28 224 12 64 16 68 32 252
mini-alu_84 5 1 3 22 110 26 114 25 125 22 98 28 108 22 110
one-two-three_27 5 2 2 9 33 9 33 9 33 9 33 9 33 9 33
decod24-enable_32 6 3 2 15 39 11 35 14 42 15 39 13 37 15 39
rd53_68 7 2 4 27 228 27 228 22 137 22 187 22 187 22 187
sym6_63 7 1 6 36 485 36 485 17 133 36 777 36 777 36 777
rd73_69 9 2 6 80 1112 80 1112 40 184 100 2187 100 2187 100 2187
sym9_71 10 1 9 76 1047 76 1047 51 573 210 4368 210 4368 210 4368
rd84_70 11 3 7 104 1823 104 1823 47 446 111 2100 111 2100 111 2100
101
102 5 Embedding of Irreversible Functions

Table 5.4 Function


specification x1 x2 x3 f1 f2 f3

0 0 0 0 0 0
0 0 1 0 1 0
0 1 0 1 0 0
0 1 1 1 1 1
1 0 0 0 0 1
1 0 1 0 1 1
1 1 0 1 0 1
1 1 1 1 1 0

Fig. 5.3 Minimal Toffoli


circuits

5.3.1 General Idea

Many synthesis approaches get a reversible function (or a reversible embedding,


respectively) f : Bn → Bn , where each specified output has a fix position. Thus,
often only a fix order of outputs is considered during the synthesis.

Example 5.5 Consider the function specification shown in Table 5.4. The reversible
function maps (x1 , x2 , x3 ) to (x2 , x3 , x2 x3 ⊕ x1 ) = (f1 , f2 , f3 ). A minimal Toffoli
circuit for this function is shown in Fig. 5.3(a). This circuit consists of 6 gates.

Usually, the order of the outputs is irrelevant and can be swapped. As shown in
the following example, this can lead to a much more compact circuit.

Example 5.6 In Fig. 5.3(b) a Toffoli circuit is depicted which computes the same
reversible function than the Toffoli circuit shown in Fig. 5.3(a). But in contrast, the
three output functions have been reordered to another position in the output vector.
More precisely, the Toffoli circuit shown in Fig. 5.3(b) maps the input (x1 , x2 , x3 )
to the output (x2 x3 ⊕ x1 , x2 , x3 ) = (f3 , f1 , f2 ). This reduces the overall number of
gates from 6 to 1, i.e. 5 gates have been saved.

Motivated by this example, a new synthesis paradigm denoted as Synthesis with


Output Permutation (SWOP) is introduced. To this end, synthesis approaches are ex-
5.3 Synthesis with Output Permutation 103

Fig. 5.4 Permutations with garbage outputs

Fig. 5.5 Realization of a


permutation

tended in such a way that different (or all) output permutations are considered. This
causes a significant increase in complexity, since in general all possible permuta-
tions have to be checked (resulting in n! different synthesis calls in total). This can
be slightly reduced, if a function containing garbage outputs should be synthesized.
n!
Then, only g! different permutations have to be considered, since permutations of
the garbage outputs can be ignored.

Example 5.7 Figure 5.4 shows all n! possible permutations for function with n = 3
variables and g = 2 garbage outputs (denoted by g1 and g2 ). Since the garbage
outputs are left unspecified, the permutations that only swap garbage outputs can be
2! = 3 permutations
skipped (i.e. the last three permutations of Fig. 5.4). Thus, only 3!
instead of all 3! = 6 permutations are considered.

Nevertheless, the number of additional checks is quite high. In contrast, synthesis


with output permutation may lead to significant reductions in the resulting circuit
sizes. To illustrate this, consider Fig. 5.5 depicting the gates needed to permute two
signals in a reversible circuit with Toffoli gates (in total three gates are required).
Since the best position of the outputs is unknown at the beginning of the synthesis
process, outputs may be placed arbitrarily in the function specification. Then, the
three gates of Fig. 5.5 may be needed to permute the value of a signal to the position
given by the function. If in contrast output permutation is considered during the
synthesis, the number of gates of the resulting circuit may be significantly lower.

Lemma 5.1 The number of gates in a reversible circuit obtained by common syn-
thesis approaches may be up to 3 · (n − 1) higher than the number of gates in a
circuit where synthesis with output permutation is applied (with n being the number
of variables).

Proof Let d be the minimal number of gates of a circuit obtained by enabling output
permutation during synthesis. To move one output line to the position given by the
function three Toffoli gates are required (see Fig. 5.5). At most n − 1 lines need to be
moved. It follows that the cost of the minimal circuit, where no output permutation
is allowed, is less than or equal to d + 3 · (n − 1). 
104 5 Embedding of Irreversible Functions

Remark 5.2 Lemma 5.1 gives a best case improvement. Because of the heuristic
nature of most of the synthesis approaches, of course also circuits with a larger
number of gates may result.

This motivates the investigation of methods that exploit output permutation dur-
ing the synthesis. The next two sections show how this is realized using an exact
synthesis approach as well as a heuristic synthesis approach, respectively.

5.3.2 Exact Approach

As intensely considered in Chap. 4, exact synthesis approaches determine mini-


mal circuits for a given function, i.e. circuits with the minimal number of gates.
The methods introduced in Chap. 4 exploit Boolean satisfiability (SAT) techniques
where the basic idea is to check whether there exists a Toffoli circuit for a reversible
function with d gates (starting with d = 1, where d is increased in each iteration
if no realization is found). The respective checks are performed by representing
the problem as an instance of SAT which is afterwards solved by a SAT solver or
similar (specialized) solving engines (for more details see the respective sections in
Chap. 4). To describe how synthesis with output permutation is applied to this exact
synthesis method, the concrete SAT encoding is sketched as follows:

Definition 5.1 Let f : Bn → Bn be a reversible function to be synthesized. Then,


the SAT instance of the respective synthesis problem is given as
n −1
2
Φ∧ ([inpi ]2 = i ∧ [outi ]2 = f (i)),
i=0

where
• inpi is a Boolean vector representing the inputs of the circuit to be synthesized
for truth table line i,
• outi is a Boolean vector representing the outputs of the circuit to be synthesized
for truth table line i, and
• Φ is a set of constraints representing the synthesis problem as described in
Sects. 4.2.1 and 4.3.1, respectively.

As an example Fig. 5.6(a) shows the simplified representation of the synthesis


problem for the function specified in Table 5.4 (where the values of the truth table
are given as integers).
Applying SWOP to the exact approach and still ensuring minimality, all per-
n!
mutations have to be considered. This can be done—as mentioned above—by g!
separate synthesis calls. However, exploiting the advanced techniques of the used
SAT solvers leads to faster synthesis. Therefore, an adjusted encoding is proposed
which requires one additional Boolean vector.
5.3 Synthesis with Output Permutation 105

Fig. 5.6 Encoding for exact


synthesis

Definition 5.2 Let f : Bn → Bn be a reversible function to be synthesized. Then,


p = (plog n!  , . . . , p1 ) is a Boolean vector representing the binary encoding of a
2 g!

natural number p ∈ {1, . . . , g!


n!
} which indicates the chosen output permutation of
the circuit.

Using this vector, the SAT encoding is slightly extended: According to the as-
signments to p (set by the SAT solver), a value for p is determined, which selects
the current output permutation. Depending on this permutation the respective out-
put order is set during the search. More formally, the encoding of Definition 5.1 is
extended as follows:
n −1
2
Φ∧ ([inpi ]2 = i ∧ [outi ]2 = πp (f (i))).
i=0

The extended encoding of the synthesis problem for the function specified in
Table 5.4 is illustrated in Fig. 5.6(b).
If the solver finds a satisfying assignment for this SWOP instance, one can ob-
tain the circuit from the result as described in Chap. 4 and the best permutation is
provided by the assignment to p.
Overall, this extension allows exact SWOP with only one synthesis call in con-
n!
trast to g! separate ones. Furthermore, since the variables of p are an integral part of
the search space, the permutations are checked much more efficiently. Because of
modern SAT techniques (in particular conflict analysis [MS99]), during the search
process reasons for conflicts are learned. This learned information prevents the
solver from reentering non-solution search space, i.e. large parts of the search space
are pruned. In contrast, this information is not available when each permutation is
checked by separate calls of the solver. Thus, exact synthesis with output permuta-
tion is possible in feasible run-time when learning is exploited.

5.3.3 Heuristic Approach

To apply SWOP to a heuristic approach, the transformation-based algorithm de-


scribed in Sect. 3.1.2 is considered. To avoid the construction of all possible permu-
tations (which would lead to a complexity increase of n!, since the transformation-
based approach does not support garbage outputs), a SWOP-based synthesis heuris-
tic using a sifting algorithm is proposed. This algorithm is inspired by [Rud93] and
106 5 Embedding of Irreversible Functions

Fig. 5.7 Heuristic SWOP (1) HeuristicSWOP(f : Bn → Bn )


(2) // f is given in terms of a truth table
(3) perm = (1, 2, . . . , n);
(4) dbest = synthesize(perm);
(5) best_perm = perm;
(6) for i = 0 to n − 2 do
(7) for j = i + 1 to n − 1 do
(8) tmp_perm = swap(perm, i, j );
(9) dtmp = synthesize(tmp_perm);
(10) if (dtmp < dbest )
(11) best_perm = tmp_perm;
(12) perm = best_perm;

reduces the above mentioned complexity to n2 . Because of the heuristic behavior of


sifting, maybe not the best permutation is determined. However, as the experiments
in Sect. 5.3.4 show, significant improvements can be achieved in feasible run-times.
The pseudo-code for the sifting algorithm is given in Fig. 5.7. First, an initial
permutation (given by the function) is chosen and the circuit for this function is
synthesized (line 3 and line 4). The gate count of this first realization is stored. Af-
ter this, for each output the best position within the current permutation is searched.
This is done by swapping the position of the current output with each of the other po-
sitions leading to new permutations (line 8). For each of these new permutations, the
respective circuit is synthesized (line 9). If the gate count of such a circuit is smaller
than the current best known gate count (line 10), the current permutation is stored as
being the best one (line 11). When each position for one output has been checked,
the best permutation of these checks is used for the remaining outputs (line 12).
In summary, for each of the first n − 1 outputs, the algorithm will find a new
position that will result in a realization with the fewest gates—when synthesized
with the transformation-based approach. Therewith the complexity of SWOP can
be reduced while still improving the obtained results as the next section will show.

5.3.4 Experimental Results

This section provides experimental results for SWOP. In total, four different as-
pects are studied: (1) the reduction of the complexity of SWOP when garbage out-
puts are considered, (2) the results of exact SWOP in comparison to the common
exact synthesis, (3) the results of heuristic SWOP in comparison to the common
transformation-based approach, and (4) the quality (with respect to the number of
gates) of the circuits synthesized by SWOP in comparison to the currently best
known realizations.
For exact synthesis, the SWORD approach introduced in Sect. 4.3.1 has been
used. The SWOP extension was implemented on top of this approach. As heuristic
approach, the transformation-based approach (including template matching as de-
scribed in [MDM05]) was applied. The respective benchmark functions have been
5.3 Synthesis with Output Permutation 107

Table 5.5 SWOP considering garbage outputs


F UNCTION SWOP OPT. SWOP
n!
n g d n! T IME g! T IME I MPR

4mod5 5 4 5 120 233.18 5 7.37 31.6


decod24 4 0 5 24 0.10 24 0.10 1.0
gt4 4 3 3 24 <0.01 4 <0.01 1.0
gt5 4 3 1 24 0.01 4 <0.01 >1.0
low-high 4 3 4 24 3.71 4 0.39 9.51
zero-one-two 4 1 4 24 0.03 24 0.02 1.5
maj4_1 5 4 6 120 3500.90 5 2125.62 1.6
maj4_2 5 4 5 120 191.92 5 4.19 45.8
alu 5 4 6 120 2013.72 5 61.24 32.9
mini_alu_1 4 2 5 24 0.28 12 0.19 1.5
mini_alu_2 5 3 7 120 930.60 20 474.42 1.9
mini_alu_3 5 3 5 120 9.60 20 2.07 4.6

taken from RevLib [WGT+08]. All experiments have been carried out on an AMD
Athlon 3500+ with 1 GB of main memory. All run-times are given in CPU seconds.
The timeout was set to 3600 CPU seconds.

5.3.4.1 SWOP with Garbage Outputs

In a first series of experiments the different complexities are compared which occur
if Toffoli circuits for functions containing garbage outputs are synthesized. Here,
n!
instead of n! permutations, only g! permutations are considered.
Table 5.5 shows the results of the exact SWOP approach with n! permutations
n!
and with g! permutations for functions containing garbage outputs. The first three
columns provide the name of the function, the number of circuit lines n, and the
number of garbage outputs g, respectively. The minimal number of gates of the ob-
tained Toffoli circuits is given in column d. Then, the run-times of SWOP with n!
n!
and with g! permutations are given, respectively (denoted by T IME). Furthermore,
n!
the improvement of the optimized SWOP (i.e. the synthesis with only g! permuta-
tions) in comparison to SWOP with all n! permutation is provided (i.e. run-time of
SWOP divided by run-time of OPT. SWOP).
As expected the reduction of permutations leads to better run-times for all func-
tions. Improvements of up to a factor of 45 can be achieved in the best case.

5.3.4.2 Exact SWOP

In this section, exact SWOP is compared to the exact approach from Sect. 4.3.1.
The results are shown in Table 5.6. Here again, the first column provides the name
108 5 Embedding of Irreversible Functions

Table 5.6 Comparison of exact synthesis to exact SWOP


F UNCTION E XACT E XACT SWOP-T IME vs. n!
S YN -T IME g!
S YNTHESIS SWOP
n g d T IME d T IME

4mod5 5 4 5 0.9 5 7.4 8.2 > 5


decod24 4 0 6 0.1 5 0.1 1.0 < 24
gt4 4 3 4 <0.1 3 <0.1 1.0 < 4
gt5 4 3 3 <0.1 1 0.1 1.0 < 4
low-high 4 3 5 0.2 4 0.4 2.0 < 4
zero-one-two 4 1 5 <0.1 4 <0.1 1.0 < 24
maj4_1 5 4 6 438.0 6 2125.6 4.9 < 5
maj4_2 5 4 6 13.6 5 4.2 0.3 < 5
alu 5 4 7 423.3 6 61.2 0.1 < 5
mini_alu_1 4 2 5 <0.1 5 0.2 2.0 < 12
mini_alu_2 5 3 8 2460.0 7 474.4 0.2 < 20
mini_alu_3 5 3 5 0.2 5 2.1 10.5 < 20
3_17 3 0 6 <0.1 5 <0.1 1.0 > 6
graycode6 6 0 5 <0.1 5 13.5 13.5 < 720
mod5d1 5 0 7 11.8 7 184.1 15.6 < 120
mod5d2 5 0 8 9.9 8 1097.6 110.8 < 120
mod5mils 5 0 5 0.1 5 1.7 17.0 < 120

of the function, while n and g denote the number of variables and the number of
garbage outputs, respectively. The next columns give the minimal number d of gates
determined by the two approaches and the corresponding run-times. The last column
shows information relating to the complexity, i.e. the run-time overhead when output
permutation is considered ( SWOP -Time n!
Syn-Time ) compared to the factor g! .
It can be seen that for many functions, SWOP found smaller circuits than the ones
generated by the previous exact synthesis approach. Thus, removing the restriction
for the output order leads to smaller circuits for many of the well known benchmark
functions.
As expected, the run-time for SWOP is higher in comparison to the run-time of
the pure exact synthesis. This is, because the search space is obviously larger due
to the number of output permutations that can be chosen. However, the increase is
n!
not as high as the worst case complexity ( g! ). This can be seen in the last column
of Table 5.6. For all benchmarks (except 4mod5 and 3_17) the run-time of SWOP
divided by the run-time of the previous synthesis approach is significantly smaller
n!
than g! . As explained, this is due to search space pruning, possible if the encoding
is extended so that all permutations are checked in parallel. Moreover, for some
benchmarks (e.g. maj4_2 or alu) the run-time of SWOP is even smaller than for a
single exact solution. This reduction is caused by the fact that smaller circuits are
found and thus the synthesis terminates earlier.
5.3 Synthesis with Output Permutation 109

5.3.4.3 Heuristic SWOP

The results obtained by common heuristic synthesis (i.e. by the transformation-


based approach including templates [MDM05]) are given in Table 5.7. More pre-
cisely, the gate counts of the resulting circuits as well as the run-time needed for
their synthesis is given for (1) the original algorithm (denoted by H EURISTIC S YN -
THESIS ), (2) for the SWOP-based synthesis where all permutations are considered
(denoted by ALL PERMS), and (3) for the SWOP-based synthesis where the sifting
algorithm introduced in Sect. 5.3.3 is used (denoted by S IFTING).
As can clearly be seen, the effect of output permutation is significant for most
of the functions. For example, for the function aj-e13 the realization is reduced by
30 percent from 40 gates to 28 gates. The best absolute reduction of gates can be ob-

Table 5.7 Comparison of heuristic synthesis to heuristic SWOP


F UNCTION H EURISTIC H EURISTIC SWOP
S YNTHESIS
ALL PERMS S IFTING
n d T IME d T IME d T IME

3_17 3 6 0.03 6–7 0.32 6 0.25


4_49 4 17 0.40 14–22 4.09 16 1.09
4mod5 5 9 0.03 9–21 10.02 9 0.75
5mod5 6 18 0.13 14–37 254.14 18 3.59
aj-e10 5 33 0.63 22–51 107.03 30 8.21
aj-e11 4 12 0.09 11–22 2.46 11 0.55
aj-e12 5 26 0.35 25–57 103.37 25 8.11
aj-e13 5 40 0.97 28–51 112.70 34 12.31
ex1 3 4 <0.1 4–8 0.08 4 0.06
graycode3 3 2 <0.1 2–5 0.01 2 0.01
graycode4 4 3 0.01 3–9 0.32 3 0.07
graycode5 5 4 0.03 4–13 4.72 4 0.31
graycode6 6 5 0.08 5–18 67.25 5 1.08
hwb3 3 7 0.06 6–11 0.32 7 0.29
hwb4 4 15 0.35 10–21 3.70 10 0.69
hwb5 5 55 1.66 38–62 153.71 44 16.49
prime5 6 15 0.20 13–40 227.05 13 3.09
prime5a 6 16 0.10 14–41 291.58 14 3.92
ham3 3 5 0.01 3–5 0.02 4 0.03
hwb6 6 125 7.08 – >3600.00 91 89.20
hwb7 7 283 33.26 – >3600.00 259 656.82
hwb8 8 676 152.13 – >3600.00 641 4525.22
ham7 7 23 0.34 – >3600.00 23 49.47
rd53 7 16 0.26 – >3600.00 13 10.04
110 5 Embedding of Irreversible Functions

Table 5.8 Best results


obtained by SWOP F UNCTION B EST KNOWN d SWOP d Δd

decod24 6 5 1
gt5 3 1 2
3_17 6 5 1
4_49 16 14 2
aj-e13 40 28 12
hwb4 11 10 1

served for function hwb8. Here, 35 gates are saved in total when output permutation
is applied.
But, not only the improvements are of interest. Even a comparison of the best and
the worst permutation (shown in column d for ALL PERMS) give some interesting
insight. For example, consider the function hwb5. One output permutation results
in a circuit with 38 gates, while another permutation results in 62 gates. Since a
heuristic synthesis procedure is used, the results will most likely not be optimal. In
fact, according to Lemma 5.1 the best case improvement (i.e. the difference between
the best and the worst permutation) for hwb5 cannot be greater than 3 · (5 − 1) = 12
for minimal realizations—yet it is 24. This can be explained by the heuristic nature
of the approach that does not guarantee minimality.
Finally it is shown that sifting provides good results in a fraction of the run-time.
For most functions with more than six variables it is not feasible to minimize the cir-
cuit considering all permutations. However, sifting offers significant improvements
for these cases.

5.3.4.4 Reductions Achieved by SWOP

Finally, the quality (with respect to the number of gates) of some circuits synthesized
by SWOP is compared to the currently best known realizations obtained by com-
mon synthesis approaches. Table 5.8 shows a selection of functions with the gate
count of the currently best known circuit realization (denoted by B EST KNOWN).
The obtained gate count when output permutation is considered is given in col-
umn SWOP.
Synthesis with output permutation enables the realization of smaller circuits than
the currently best known ones. As an interesting example, the realizations of the
hwb4 function are observed in more detail. For the original function, a minimal
realization with 11 gates has been synthesized by the exact approach. Now, using
output permutation it is possible to synthesize a smaller realization with only 10
gates using a heuristic approach.
5.4 Summary and Future Work 111

5.4 Summary and Future Work

Embedding irreversible functions is a new synthesis step, which is not needed in


traditional circuit design. Nevertheless, it is a no less important aspect, since many
practically relevant functions are irreversible and therewith need an embedding
to become synthesizable in reversible logic. Usually, the embedding is done in a
straightforward manner—Sect. 3.1.1 shows a possible approach. But, the way how
the embedding is done is crucial to the resulting circuit sizes. This chapter gave
examples showing this correlation.
More precisely, two aspects have been investigated in detail. First, how to assign
don’t cares was addressed. Don’t cares result if additional lines are added to make
an irreversible function reversible. Second, the effect of different output permuta-
tions on the resulting circuits has been observed. Approaches have been proposed
that exploit the respective possibilities so that significantly smaller circuits result.
In doing so, it was not only possible to synthesize many important functions with
smaller size; the described techniques (in particular the exact SWOP formulation
from Sect. 5.3.2) also have been used to determine the respective BDD node substi-
tutions for the BDD-based synthesis introduced in Sect. 3.2. In this sense, the made
contributions are crucial in particular for synthesis of irreversible (sub-)functions
that often occur in hardware design.
In future work, the determination of “good” embeddings should be lifted from
the truth table level to higher function representations. As an example, a 1-bit adder
can efficiently be synthesized using the introduced techniques with minimal num-
ber of garbage lines, proper don’t care assignments, and the best output permutation.
But, this cannot be ensured for a 32-bit adder with 232 = 4.3 · 109 truth table lines.
Therefore, other synthesis techniques as e.g. the BDD-based synthesis approach
from Sect. 3.2 must be applied. However, the embeddings generated by the BDD-
based synthesis are not optimal with respect to the number of additional lines. Even
simply cascading 32 optimally embedded 1-bit adders is not satisfying, since each
1-bit adder includes at least one constant input so that the final circuit would con-
tain 32 additional lines. In fact, a 32-bit adder can be represented with at most one
additional line only [CDKM05]. While for special functions like the adder “good”
embeddings already have been manually found, developing new techniques that au-
tomatically generate embeddings for large functions are left for future work.
Therewith, this chapter concludes the consideration of reversible logic synthesis
in this book. In the next chapters the remaining aspects of an upcoming design flow,
namely optimization as well as verification and debugging, are considered in detail.
Chapter 6
Optimization

The primary task of synthesis approaches is to generate circuits that realize the
desired functions. Secondarily, it should be ensured that the resulting circuits are as
compact as possible. However, the results obtained by synthesis approaches often
are sub-optimal. For example, the transformation-based synthesis method described
in Sect. 3.1.2 tends to produce circuits with very costly Toffoli gates (i.e. Toffoli
gates with a large number of control lines). Hierarchical synthesis approaches like
the BDD-based method from Sect. 3.2 or the SyReC-based approach from Sect. 3.3
lead to circuits that are not optimal with respect to the number of lines. Besides that,
technology specific constraints are often not considered by synthesis approaches.
Consequently, in common design flows optimization approaches are applied after
synthesis.
For reversible logic, only first attempts in optimization have been made in the
last years. In particular, reducing the quantum cost of given circuits has been con-
sidered. For example, template matching [IKY02, MDM05, MYDM05] is a search
method which looks for gate sequences that can be replaced by alternative cascades
of lower cost. For many circuits, substantial improvements are achieved using this
method. But, for large circuits or a high number of applied templates, respectively,
this approach suffers from high run-time. As a second example, the work in [ZM06]
showed how analyzing cross-point faults can identify redundant control connections
in reversible circuits. Removing such control lines reduces the cost of the circuit.
However, the computation needed to determine redundant control connections is
extremely high.
In this chapter, three new optimization approaches are introduced—each with
an own focus on a particular cost metric. The first one considers the reduction of
the well-established quantum cost (used in quantum circuits) and the transistor cost
(used in CMOS implementations), respectively. Therefore, a (small) number of ad-
ditional signal lines are added to the circuit that are used to “buffer” factors of con-
trol lines [MWD10]. Then, these factors can be reused by other gates in the circuit
which reduces the size of the gates and thus decreases the cost of the circuit. A fast
algorithm is presented along with results showing that even for a small number of
additional lines (even 1) a significant amount of cost can be saved.
R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 113
DOI 10.1007/978-90-481-9579-4_6, © Springer Science+Business Media B.V. 2010
114 6 Optimization

The second approach consideres the line count in a circuit. While adding a small
number of additional lines may be worth to reduce e.g. the quantum cost of a cir-
cuit, usually this number should be kept small (in particular for quantum circuits
where circuit lines or qubits, respectively, are a limited resource). But as already
mentioned above, in particular hierarchical approaches lead to a significant amount
of additional circuit lines. To reduce these lines, a post-synthesis approach is intro-
duced which re-synthesizes parts of the circuits so that lines with constant inputs
can be merged [WSD10]. Therewith, notable line reductions can be achieved.
Finally, an optimization method is introduced which takes a cost metric be-
yond the established quantum cost, transistor cost, and line count into account.
This is motivated by new physical realizations of reversible and quantum circuits
(see e.g. [DV02, Mas07, RO08]) leading to further limitations and restrictions. By
means of so called Nearest Neighbor Cost (NNC), it is shown how reversible cir-
cuits can be optimized with respect to the resulting new cost metrics [WSD09].
NNC is important if Linear Nearest Neighbor (LNN) architectures [FDH04, Kut06,
Mas07] are addressed as target technology. Here, only adjacent gates are allowed
(i.e. gates where control line and target line are on adjacent circuit lines). Since en-
suring adjacent gates in a naive way increases the quantum cost by about one order
of magnitude, optimization approaches are introduced that significantly reduce this
increase.
At the end of this chapter, all results are summarized and future work is sketched.

6.1 Adding Lines to Reduce Circuit Cost

This section shows how circuit cost can be significantly reduced, if additional lines
are added to the circuit. Therefore, the general idea is firstly introduced in Sect. 6.1.1
before an algorithm exploiting the made observation is proposed in Sect. 6.1.2. Fi-
nally, the experimental results in Sect. 6.1.3 demonstrate the effect of the proposed
optimization approach.

6.1.1 General Idea

Optimization approaches, such as the two noted above preserved the number of
lines in the circuit to be optimized. In contrast, this section shows how extending
the circuit by additional signal lines can improve the cost of a reversible circuit. The
additionally added lines are thereby denoted as helper lines in the following.

Definition 6.1 Let G be a reversible circuit. Then, a helper line is an additionally


added circuit line
• whose input is set to a constant value 0 and
• whose output is used as a garbage output.
6.1 Adding Lines to Reduce Circuit Cost 115

Fig. 6.1 Illustrating the general idea of factoring

Having a helper line available, values can be “buffered” on this line so that they
can be later reused by other gates. In doing so, control lines can be saved as shown
by the following definition.

Definition 6.2 Let G be a reversible circuit and h be a helper line. Then, a gate
MCT(C, t) of G can be replaced by the sequence MCT(F, h), MCT(h ∪ Ĉ, t),
MCT(F, h) where C = F ∪ Ĉ, F ∩ Ĉ = ∅, and F = ∅. In the following this re-
placement is referred as factoring the initial gate, where F is a factor of MCT(C, t).

Remark 6.1 The terminology “factoring” and “factor” are natural, since partitioning
the control set C into F and Ĉ is essentially factoring the AND function for the
control lines. This factoring depends on the fact that 0 ⊕ x1 x2 . . . xk = x1 x2 . . . xk ,
i.e. that the result of a factor can be “buffered” by a constant line assigned to 0.

Applying Definition 6.2 to gates of a circuit, control lines can be removed. Since
the number of control lines defines the amount of the circuit cost, this may lead
to less costly circuits. However, this is only the case, if the total cost of the added
gates is less than the cost saved by factorizing the control lines. By substituting a
single gate only, this cannot happen for the transistor cost model but it can for the
quantum cost model. If more than one gate can be substituted, higher cost savings
are achieved (then also reductions for the transistor cost model are observed).
These ideas are illustrated in the following example.

Example 6.1 Consider the cascade of Toffoli gates depicted in Fig. 6.1(a). The gates
in this cascade have a common control factor F = {x0 , x1 }. Hence, the cost of this
circuit can be reduced as shown in Fig. 6.1(b) by adding an additional line h (at
the top of the circuit) as well as the Toffoli gates MCT(F, h) before and after the
cascade. This leads to additional quantum cost of 2 · 5 = 10. However, the factored
gates reuse the result of F leading to a reduction of one control line per gate (dashed
116 6 Optimization

rectangle in Fig. 6.1(b)). The removed control lines are shown as white circles. In
total this reduces the quantum cost from 104 to 59 and the transistor cost from 144
to 136, respectively.
Note that the added line is set to constant input 0. Furthermore, the right most
Toffoli gate operating on the added line is only needed if the line is to be used for
another factor.

Remark 6.2 In previous work, it has already been observed that more circuit lines
usually lead to lower (quantum) cost (see e.g. [BBC+95] or also the results of the
BDD-based approach discussed in Sect. 3.2.4). Moreover, the authors of [SPMH03]
even showed that some functions cannot be synthesized for certain gate libraries
unless one additional line is added. However, here these observations are exploited
for the first time by proposing a constructive post-synthesis optimization approach
for reversible logic.

In the following section, the algorithm is now presented in detail.

6.1.2 Algorithm

Based on the ideas presented in the last section, now an algorithm is proposed that
adds one helper line and then employs a straightforward search procedure to use
that line for optimizing the circuit. More precisely, it is shown how to extract factors
from Toffoli and Fredkin gates in the circuit (the circuit may contain other types
of gates). The algorithm can be applied repeatedly to add more than one helper
line. It can also be iterated to add lines until adding a further line results in no cost
reduction. The transistor cost model or the quantum cost model can be used and in
fact the algorithm is readily adapted to any other gate-based cost model.
Consider a reversible circuit G consisting of the cascade of gates G = g0 g2 . . .
gd−1 . Let Ci denote the set of control lines for gi and let Ti denote the set of target
lines for gi . Then, four steps are performed in total.
1. Add a single helper line h.
2. Find the highest cost reducing factor across the circuit.
Therefore, the whole circuit is traversed (i.e. every gate gi with 0 ≤ i < d is
considered). If gi is a reversible gate gi (Ci , Ti ) and the helper line h is available
(i.e. it is not used by a previously applied factor at this point in the circuit), then
for every partitioning of Ci into {F, Ĉ} with F not empty:
a. Find the lowest j ≥ i so that j = d − 1 or (F ∩ (Tj +1 ∪ h)) = ∅, i.e. find
the next gate gj that manipulates one of the factors in F so that the value of
the helper line cannot be reused any longer. If the outputs of the circuit are
reached use gd−1 instead.
b. Determine the cost reduction that would result from applying this factor to
all applicable gates between gi and gj , including the cost of introducing two
instances of the factor gate MCT(F, h).
6.1 Adding Lines to Reduce Circuit Cost 117

c. Keep a record of the factor and the gate range that leads to the largest cost
reduction.
3. If no cost reducing factor is found in Step 2, then terminate.
4. Otherwise, apply the best factor found and repeat from Step 2 on the revised
circuit.
Note that as already mentioned above, the rightmost MCT(F, h) gate operating
on the helper line is only added if the helper line is going to be used for another
factor.

Example 6.2 Figure 6.2 shows the result of applying the algorithm to the circuit rep-
resenting the function rd53 (depicted in Fig. 6.2(a)) using the quantum cost metric.
The applied factors are highlighted by brackets at the bottom of Fig. 6.2(b) (with one
helper line) and Fig. 6.2(c) (with two helper lines), respectively. While the original
circuit has quantum cost of 128, that can be reduced with one helper line to 83 or
with two helper lines to 66. Adding a third helper line does not reduce the quantum
cost of this circuit further.

The order in which factors are considered typically has an effect. As a result, the
algorithm is applied to the circuit as given and then to the circuit found by reversing
the order of the original circuit. The better of the two final circuits is taken as the
result. Thus, the presented algorithm is a heuristic. But as the experiments in the
next section show, already this leads to good results.

6.1.3 Experimental Results

This section provides experimental results for the proposed approach. To this end,
the method described above has been implemented in C and was applied to all
benchmarks from RevLib [WGT+08]. All experiments have been carried out on
an AMD Athlon 3500+ with 1 GB of memory.
Since some of the circuits in RevLib already have been optimized using various
approaches (e.g. extensive template post-synthesis optimization, output permutation
optimization, and other techniques), to provide an even basis all circuits have been
previously optimized. To this end, the approach described in [MDM05] together
with a basic set of 14 templates has been used.1 Afterwards, the proposed optimiza-
tion method has been applied to the resulting circuits. In doing so, all considered
circuits already went through an optimization and it can be shown that, indepen-
dently from this, further significant reductions can be achieved if helper lines and
the algorithm introduced above are used.2

1 This took over 10 hours of computation time. Furthermore, the application to the urf series of

circuits (which are quite large) has been aborted because they required too much run-time.
2 Of course, similar results are also achieved if the proposed approach is directly applied to non-

optimized circuits.
118 6 Optimization

Fig. 6.2 Reversible circuits for rd53 with one helper line and two helper lines

Table 6.1 summarizes the obtained results for one and two helper lines, respec-
tively. The first three columns give the name of the circuit (including the unique
identifier of the circuit realization as used in RevLib), the number of circuit lines (n),
as well as the number of gates of the initial (already optimized) circuit (d), re-
spectively. In the following columns, the obtained results for quantum cost and
transistor cost models are presented. Therefore, the proposed approach has been
applied with one and with two helper lines to both, the circuit as given as well
as in the reversed order. Afterwards, the better result has been chosen. The re-
Table 6.1 Experimental results for RevLib circuits
C IRCUIT Q UANTUM C OST M ODEL T RANSISTOR C OST M ODEL M AX
I NITIAL A DD 1 L INE A DD 2 L INES I NITIAL A DD 1 L INE A DD 2 L INES T IME
C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR . C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR .
n d C OST C OST C OST C OST

cycle17_3_112 20 48 6063 1877 69.04 1233 79.66 3272 2040 37.65 1768 45.97 1.63
cycle10_2_110 12 19 1202 420 65.06 290 75.87 800 568 29.00 512 36.00 0.17
plus63mod8192_164 13 492 38462 16902 56.06 11533 70.01 21840 18576 14.95 17560 19.60 4.19
plus63mod4096_163 12 429 25843 11804 54.32 9016 65.11 16200 13896 14.22 13336 17.68 3.30
urf2_153 8 638 17027 9570 43.80 7324 56.99 18240 16352 10.35 15712 13.86 3.28
6.1 Adding Lines to Reduce Circuit Cost

hwb8_118 8 633 16510 9333 43.47 7334 55.58 17648 15920 9.79 15384 12.83 3.20
hwb8_115 8 610 14679 8302 43.44 6478 55.87 15304 13912 9.10 13424 12.28 2.83
urf5_159 9 499 24523 13979 43.00 8894 63.73 18640 15632 16.14 13704 26.48 3.16
urf2_154 8 620 16152 9298 42.43 7164 55.65 17432 15768 9.55 15184 12.90 3.16
urf3_156 10 2732 128172 75644 40.98 53584 58.19 103680 91600 11.65 83160 19.79 20.16
hwb8_114 8 614 11941 7066 40.83 5869 50.85 14488 13640 5.85 13376 7.68 2.83
urf1_150 9 1517 48952 29120 40.51 21511 56.06 48616 43040 11.47 40824 16.03 9.03
urf3_157 10 2674 121716 72562 40.38 51833 57.41 100544 88928 11.55 81296 19.14 19.64
urf1_151 9 1487 45855 27740 39.50 20765 54.72 47024 41784 11.14 39760 15.45 8.73
hwb9_123 9 1959 22482 13665 39.22 10996 51.09 28696 27760 3.26 27376 4.60 6.30
hwb8_113 8 637 13460 8251 38.70 6763 49.75 16896 15912 5.82 15648 7.39 3.19
hwb9_120 9 1538 44684 27425 38.62 20574 53.96 46400 41496 10.57 39584 14.69 8.68
hwb9_122 9 1535 44635 27402 38.61 20548 53.96 46336 41456 10.53 39544 14.66 8.62
hwb8_117 8 748 7010 4346 38.00 3776 46.13 10520 10192 3.12 10112 3.88 2.28
hwb8_116 8 749 6976 4338 37.82 3781 45.80 10536 10216 3.04 10144 3.72 2.31
119
Table 6.1 (Continued)
120

C IRCUIT Q UANTUM C OST M ODEL T RANSISTOR C OST M ODEL M AX


I NITIAL A DD 1 L INE A DD 2 L INES I NITIAL A DD 1 L INE A DD 2 L INES T IME
C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR . C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR .
n d C OST C OST C OST C OST

plus127mod8192_162 13 910 61425 38710 36.98 28856 53.02 39984 35304 11.70 32648 18.35 8.36
hwb9_119 9 1544 35967 23269 35.30 18939 47.34 44344 41448 6.53 40376 8.95 8.70
hwb9_121 9 1541 35973 23280 35.28 18940 47.35 44304 41408 6.54 40336 8.96 8.65
mod8-10_177 5 14 84 55 34.52 44 47.62 144 144 0.00 144 0.00 0.05
hwb7_60 7 166 1754 1153 34.26 1010 42.42 3168 2960 6.57 2912 8.08 0.82
4gt4-v0_73 5 17 57 38 33.33 38 33.33 144 144 0.00 144 0.00 0.07
hwb7_59 7 289 3939 2632 33.18 2310 41.36 6800 6480 4.71 6400 5.88 1.33
hwb7_62 7 331 2608 1775 31.94 1624 37.73 4632 4512 2.59 4496 2.94 1.01
4gt12-v0_86 5 14 47 32 31.91 32 31.91 136 104 23.53 104 23.53 0.05
alu-v2_30 5 18 111 76 31.53 62 44.14 240 208 13.33 176 26.67 0.08
ham7_104 7 23 83 58 30.12 58 30.12 272 272 0.00 272 0.00 0.07
hwb6_56 6 126 1329 932 29.87 871 34.46 2456 2392 2.61 2392 2.61 0.51
alu-v2_31 5 13 45 32 28.89 32 28.89 144 128 11.11 128 11.11 0.06
hwb7_61 7 236 3261 2319 28.89 2105 35.45 5592 5432 2.86 5408 3.29 1.04
4gt4-v0_78 5 13 53 38 28.30 38 28.30 144 112 22.22 112 22.22 0.05
rd53_136 7 15 72 52 27.78 45 37.50 200 200 0.00 200 0.00 0.05
ham15_108 15 70 403 294 27.05 257 36.23 992 968 2.42 968 2.42 0.22
4gt12-v0_87 5 10 43 32 25.58 32 25.58 104 104 0.00 104 0.00 0.05
rd53_135 7 16 68 51 25.00 51 25.00 224 216 3.57 216 3.57 0.05
hwb6_57 6 65 433 326 24.71 299 30.95 976 928 4.92 928 4.92 0.28
6 Optimization
Table 6.1 (Continued)
C IRCUIT Q UANTUM C OST M ODEL T RANSISTOR C OST M ODEL M AX
I NITIAL A DD 1 L INE A DD 2 L INES I NITIAL A DD 1 L INE A DD 2 L INES T IME
C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR . C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR .
n d C OST C OST C OST C OST

rd53_137 7 16 65 49 24.62 49 24.62 176 176 0.00 176 0.00 0.05


rd53_131 7 28 106 81 23.58 81 23.58 224 224 0.00 224 0.00 0.07
ham15_107 15 132 1174 916 21.98 823 29.90 2456 2360 3.91 2336 4.89 0.67
sym6_145 7 36 348 279 19.83 265 23.85 744 728 2.15 728 2.15 0.18
urf6_160 15 10740 53700 43893 18.26 42232 21.36 171840 159120 7.40 157448 8.38 227.23
6.1 Adding Lines to Reduce Circuit Cost

rd53_130 7 30 230 190 17.39 174 24.35 344 344 0.00 344 0.00 0.08
mod5adder_127 6 21 121 100 17.36 100 17.36 216 216 0.00 216 0.00 0.06
one-two-three-v0_97 5 11 65 54 16.92 54 16.92 200 200 0.00 200 0.00 0.04
urf2_161 8 3250 20465 17235 15.78 16594 18.92 54416 53664 1.38 53664 1.38 12.03
ham15_109 15 109 206 176 14.56 169 17.96 1008 1008 0.00 1008 0.00 0.27
hwb5_53 5 55 286 247 13.64 242 15.38 824 824 0.00 824 0.00 0.18
rd53_132 7 27 114 99 13.16 99 13.16 176 176 0.00 176 0.00 0.06
4gt13_91 5 10 31 27 12.90 27 12.90 128 96 25.00 96 25.00 0.05
urf3_155 10 26468 132340 115731 12.55 113918 13.92 423488 414016 2.24 413656 2.32 182.19
hwb5_54 5 24 72 63 12.50 63 12.50 240 240 0.00 240 0.00 0.09
cnt3-5_180 16 20 120 105 12.50 105 12.50 320 320 0.00 320 0.00 0.09
rd53_133 7 12 73 64 12.33 64 12.33 240 232 3.33 232 3.33 0.05
urf1_149 9 11554 57770 51125 11.50 50424 12.72 184864 181816 1.65 181696 1.71 51.47
hwb5_55 5 24 98 87 11.22 87 11.22 296 296 0.00 296 0.00 0.07
sym9_148 10 210 1250 1126 9.92 1074 14.08 3448 3432 0.46 3416 0.93 1.11
121
122

Table 6.1 (Continued)


C IRCUIT Q UANTUM C OST M ODEL T RANSISTOR C OST M ODEL M AX
I NITIAL A DD 1 L INE A DD 2 L INES I NITIAL A DD 1 L INE A DD 2 L INES T IME
C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR . C OST O PTIMIZED % I MPR . O PTIMIZED % I MPR .
n d C OST C OST C OST C OST

urf2_152 8 5030 25150 22693 9.77 22518 10.47 80480 79880 0.75 79840 0.80 18.30
urf5_158 9 10276 51380 46513 9.47 45662 11.13 164416 162496 1.17 162408 1.22 44.47
4gt5_76 5 13 26 24 7.69 24 7.69 112 104 7.14 104 7.14 0.04
urf4_187 11 32004 160020 148333 7.30 146890 8.21 512064 501824 2.00 501280 2.11 2135.51
sys6-v0_111 10 20 71 69 2.82 68 4.23 280 264 5.71 256 8.57 0.06
rd53_138 8 12 43 42 2.33 42 2.33 176 168 4.55 168 4.55 0.04
rd73_140 10 20 75 74 1.33 74 1.33 288 280 2.78 280 2.78 0.06
sym9_146 12 28 108 107 0.93 107 0.93 384 376 2.08 376 2.08 0.09
rd84_142 15 28 111 110 0.90 110 0.90 408 400 1.96 400 1.96 0.08
6 Optimization
6.1 Adding Lines to Reduce Circuit Cost 123

sulting cost and the percentage improvement are shown for each case relative to
the initial circuit cost (i.e. the cost after template application). Finally, the last
column gives the maximum CPU time (in seconds) measured for a single run
for each benchmark. Results for small circuits with less than five lines and less
than ten gates are omitted. Furthermore, the circuits 4gt11_82, 4gt13_90, decod24-
enable_126, mod5adder_128, mod5adder_129, hwb6_58, ham7_105, ham7_106,
rd73_141, sys6-v0_144, sym9_147, 0410184_169, 0410184_170, rd84_143, cnt3-
5_179, and add8_172 gave no improvement and thus are not listed in Table 6.1.
Considering quantum cost, for most of the circuits significant cost reductions
can be observed—even if only a single line is added. Over all circuits (including the
ones that gave no improvement), adding a single line reduces the quantum cost by
22.51% on average—in the best case (cycle17_3_112) by just over 69%. This can be
further improved if another line is added leading to reductions of additional 5.10%
on average. If transistor cost is considered, the reductions are somewhat smaller but
still significant. When adding a single line the transistor cost is reduced by 5.83% on
average—in the best case (cycle17_3_112) by 37%. Adding a second line reduces
the transistor cost by further 1.65%. Since the number of lines is negligible in CMOS
technologies, this is a notable reduction as well. In addition, these optimizations
can be achieved in very short run-time. Even for circuits including thousands of
gates, the approach terminates after some minutes—in most of the cases after some
seconds.
Besides that, the effect of adding a certain number of helper lines on the resulting
improvement has been evaluated in detail. More precisely, the proposed method has
been applied with one to five helper lines to all the circuits from RevLib (including
the small ones that have been omitted in Table 6.1). Again, all these circuits already
have been optimized using templates as noted above. A total of 95 of the 177 circuits
show an improvement in quantum cost when a single helper line is added. Of the
other 82 circuits, 64 ones have a very few number of lines (less than or equal to 5)
and are already highly optimized due to their relatively small size.
Figure 6.3 shows the improvement in quantum cost of the remaining circuits
(both, for the respective benchmarks in the plot diagram as well as on average in the
table). As already discussed above, a significant improvement can be observed if a
single helper line is added. This is further increased if more lines are applied. How-
ever, the improvements diminish with increasing number of helper lines. Finally, no
further improvement has been observed, if a sixth line is applied. This is the ex-
pected behavior, since multiple helper lines are only useful when multiple factors
sharing common gates are present.
Altogether, applying the proposed approach significant cost reductions can be
achieved if a single line is added to the circuit (even on already optimized realiza-
tions). Further (diminishing) improvements result if more than one helper lines are
applied. The most critical issue is the fact that additional lines must be added to
enable these optimizations. While this is negligible for reversible CMOS technolo-
gies, in design for quantum circuits the designer must trade off if these additional
expenses are worth the additional qubit(s). Since up to 70% of the quantum cost can
be saved, this may be the case for many circuits.
124 6 Optimization

Fig. 6.3 Improvement for up to five helper lines

6.2 Reducing the Number of Circuit Lines

While adding a small number of additional lines may be worth to reduce e.g. the
quantum cost of a circuit (as shown in the previous section), usually circuit lines are
a highly limited resource (caused by the fact that the number of circuit lines cor-
responds to the number of qubits). Furthermore, a high number of lines (or qubits,
respectively) may decrease the reliability of the resulting system. Thus, this number
should be kept as small as possible. In the best case, only the minimal number of
circuit lines should be used. However, to ensure minimality of circuit lines, the un-
deryling function must be given in terms of a truth table or similar descriptions (see
Sect. 3.1.1). But, if larger functions should be synthesized, only hierarchical meth-
ods (like the BDD-based method from Sect. 3.2 or the SyReC-based approach from
Sect. 3.3) are available so far. These often require a significant number of additional
circuit lines (with constant inputs) and, thus, lead to circuits with a large line count.
As an example, consider the reversible realization of the AND function and
the OR function as shown in Fig. 6.4(a) and (b), respectively. Composing these cir-
cuits (as done by hierarchical approaches), a realization with two additional circuit
lines (including constant inputs) results (see Fig. 6.4(c)). But, both functions com-
bined can be realized with one additional circuit line only (see Fig. 6.4(d)). Thus, the
question is how the number of additional lines in reversible circuits can be reduced.
In this section, a post-process optimization method is proposed that addresses
this problem. Garbage outputs (i.e. circuit lines whose output value is don’t care) are
thereby exploited. A multi-stage approach is introduced that (1) identifies garbage
outputs producing don’t cares, (2) re-synthesizes parts of the circuit so that instead
of these don’t cares concrete constant values are computed, and (3) connects the
resulting outputs with appropriate constant inputs. In other words, circuit structures
are modified so that they can be merged with constant inputs resulting in a line
reduction. For the respective re-synthesis step, existing synthesis methods are used.
6.2 Reducing the Number of Circuit Lines 125

Fig. 6.4 Composition of circuits

Experimental results show that applying this approach, the number of circuit lines
can be reduced by 17% on average—in the best case by more than 40%. Further-
more, depending on the used synthesis approach, these line reductions are possible
only with a small increase in the number of gates and the quantum costs, respec-
tively. In some cases the costs even can be reduced. In this sense, drawbacks of
scalable but line-costly synthesis approaches are minimized.
The remainder of this section is structured as follows. Section 6.2.1 illustrates
the general idea of the proposed approach. Afterwards, the concrete algorithm ex-
ploiting the made observations is described in Sect. 6.2.2. Finally, Sect. 6.2.3 reports
experimental results.

6.2.1 General Idea

In this section, the idea how to reduce the number of lines in large reversible circuits
is presented. As discussed above, ensuring minimality of circuit lines is only pos-
sible for small functions from which a truth table description can be derived. Thus,
line reduction is considered as a post-optimization problem. The proposed approach
thereby exploits a structure often occurring in circuits generated by scalable syn-
thesis approaches or by composed reversible sub-circuits. This is illustrated by the
following running example.

Example 6.3 Consider the circuit G = g1 . . . g12 depicted in Fig. 6.5(a) representing
a 3-bit adder that has been created by composing three single (minimal) 1-bit adders.
This circuit consists of three additional circuit lines (with constant inputs). Not all
of them are necessarily required. Furthermore, there are a couple of garbage outputs
whose values are don’t care.

In particular of interest in this circuit is the first usage of a line with a constant
input and the last usage of a line with a garbage output. For example, the constant
126 6 Optimization

Fig. 6.5 Reducing the number of lines in a 3-bit adder circuit

input at line 4 is firstly used by the fifth gate, while at the same time the value of the
last line is not needed anymore after the second gate. Since the value of the garbage
output doesn’t matter (because it is a don’t care), this might offer the possibility
to merge the line including the constant input with the line including the garbage
output. More precisely, if it is possible to modify the circuit so that a garbage output
returns a constant value (instead of an arbitrary value), then this constant value can
be used in the rest of this circuit. At the same time, a constant input line can be
removed. More formally:

Proposition 6.1 Without loss of generality, let G = g1 . . . gd be a reversible circuit


with a constant input at line lc and a garbage output at line lg (lc = lg ). Furthermore,
let gi be the first gate connected with line lc (including the constant input) and let gj
with j < i be the last gate connected with line lg (including the garbage output). If it
j
is possible to modify the sub-circuit G1 = g1 . . . gj so that line lg becomes assigned
to a constant value, then line lc can be removed from G. For all gates formerly
connected with line lc , line lg can be used instead.
6.2 Reducing the Number of Circuit Lines 127

Note that the constant value of the selected line lc is thereby of no importance. If
necessary, the needed value can easily be generated by an additional NOT gate (i.e. a
Toffoli gate without any control lines). Furthermore, constant outputs can only be
produced if the considered circuit includes additional lines with constant inputs.

Example 6.4 Reconsider the adder circuit G = g1 . . . g12 in the running example.
The constant input at line 1 is firstly used by gate g9 , while the values of the garbage
outputs at line 5, line 6, line 9, and line 10, respectively, are not needed anymore
after gate g8 . Since the sub-circuit G81 = g1 . . . g8 can be modified so that e.g. the
garbage output at line 5 becomes assigned to the constant value 0 (see dashed rect-
angle in Fig. 6.5(b)), line 1 can be removed and the newly created constant value
from line 5 can be used instead. The resulting circuit is depicted in Fig. 6.5(b). Now,
this circuit consists of 9 instead of 10 lines.

Note that the respective modification of a sub-circuit is not always possible. For
example, consider the constant input at line 4 (firstly used by gate g5 ) and the
garbage outputs at line 9 and line 10 (not needed anymore after gate g4 ). This
might offer the possibility to remove one more circuit line. But, the sub-circuit
G41 = g1 . . . g4 cannot be modified accordingly since a realization of the 1-bit ad-
dition together with an additional constant output requires more garbage outputs.
Using these observations an algorithm for reducing the number of lines in re-
versible circuits can be formulated. The next section describes the respective steps
in detail. Afterwards, the experiments in Sect. 6.2.3 show that significant reductions
can be obtained with this approach.

6.2.2 Algorithm

Based on the ideas presented in the last section, now an algorithm for circuit line
reduction is proposed. The respective steps are illustrated by means of an example in
Fig. 6.6. At first, an appropriate sub-circuit is determined (a). Afterwards, it is tried
to re-synthesize the sub-circuit so that one of the (garbage) outputs returns a constant
value (b). If this was successful, the re-synthesized sub-circuit is inserted into the
original circuit (c). Finally, the newly created constant output is merged with a line
including a constant input (d). The algorithm terminates if no appropriate sub-circuit
can be determined anymore. In the following the respective steps are described in
detail.

6.2.2.1 Determine an Appropriate Sub-circuit

In the considered context, appropriate sub-circuits are characterized by the fact that
they include at least one garbage output which can be later used to replace a constant
input. Therefore, it is important to know when lines of a circuit are used for the first
time and when they are not needed anymore, respectively. This is formalized by the
following two functions:
128 6 Optimization

Fig. 6.6 Reducing the number of circuit lines in four steps

Definition 6.3 Let G = g1 . . . gd be a reversible circuit. Furthermore, let l ∈


{1, . . . , n} be a line of this circuit. Then, the function firstly_used(l) returns i ∈
{1, . . . , d} iff gi is the first gate connected with line l. Accordingly, the function
lastly_used(l) returns i ∈ {1, . . . , d} iff gi is the last gate connected with line l.

Using these functions, the flow to determine appropriate sub-circuits can be de-
scribed as follows:
1. Traverse all circuit lines lg of the circuit G = g1 . . . gd that include a garbage
output.
2. Check if line lg can be merged with another line lc including a constant input,
i.e. if there is a constant input line lc so that firstly_used(lc )> lastly_used(lg ). If
this check fails, continue with the next garbage output line lg .
3. Check if the Gk1 = g1 . . . gk with k = lastly_used(lg ) can be modified so that
line lg outputs a constant value. If this check fails, continue with the next line lg
in Step 2. Otherwise, Gk1 is an appropriate sub-circuit.

Example 6.5 Consider the circuit G = g1 . . . g6 depicted in Fig. 6.6(a). Applying


the steps introduced above, the sub-circuit G31 = g1 g2 g3 (marked by the dashed
rectangle) is determined.

Note that the order in which the garbage output lines lg are considered typically
has an effect. Here, the lines with the smallest value of lastly_used(lg ) is considered
first. This is motivated by the fact that firstly_used(lc ) > lastly_used(lg ) is a neces-
sary condition which, in particular, becomes true for small values of lastly_used(lg ).
Besides that, the check in Step 3 is strongly related to the re-synthesis of the sub-
circuit which is described next.
6.2 Reducing the Number of Circuit Lines 129

6.2.2.2 Re-synthesize the Sub-circuit

Given an appropriate sub-circuit Gk1 , the next task is to re-synthesize it so that one
garbage output returns a constant value (instead letting it a don’t care). Generally,
any available synthesis approach can be applied for this purpose. But since the num-
ber of circuit lines should be reduced, approaches that generate additional circuit
lines should be avoided. Thus, synthesis methods that require a truth table descrip-
tion (and therewith ensure minimality with respect to circuit lines) are used. Conse-
quently, only sub-circuits with a limited number of primary inputs are considered.
To address this issue, not the whole sub-circuit Gk1 is re-synthesized. Instead, a
bounded cascade of gates which affects the respective garbage output is considered.
More precisely, starting at the output of line lg , the circuit is traversed towards the
inputs of the circuit. Each passed gate as well the lines connected with them are
added to the following consideration.3 The traversal stops, if the number of consid-
ered lines reaches a given threshold λ (in the experimental evaluations, it turned out
that λ = 6 is a good choice).
From the resulting cascade, a truth table description is determined. Afterwards,
the truth table is modified, i.e. the former garbage output at line lg is replaced by
a constant output value. It is thereby important that the modification preserves the
reversibility of the function. If this is not possible, the sub-circuit is skipped and the
next line with a garbage output is considered (see Step 3 from above). Otherwise,
the modified truth table can be passed to a synthesis approach.
Note that the modification of the truth table is only possible, if constant values
at the primary inputs of the whole circuit are incorporated. Constant inputs restrict
the number of possible assignments to the inputs of the considered cascade. This
enables a reversible embedding with a constant output.

Example 6.6 Consider the cascade highlighted by the dashed rectangle in Fig. 6.6(a)
which is considered for re-synthesis. Incorporating the constant values at the pri-
mary inputs of the whole circuit, only the patterns shown in Table 6.2(a) have to be
considered. The outputs for the remaining patterns are not of interest. This function
can be modified so that one of the garbage outputs returns a constant value, while
still reversibility of the overall function is preserved (see Table 6.2(b)). Synthesizing
the modified function, the circuit shown on the right-hand side of Fig. 6.6(b) results.
This circuit can be used to remove a constant line.

As shown by the example, re-synthesizing the respective cascades in the de-


scribed manner might lead to an increase in the number of gates as well as in the
quantum costs. This is an expected behavior since circuit lines can be exploited to
buffer temporary values. If such lines are removed, additional gates may be required
to recompute these values.

3 In other words, the cone of influence of the garbage output line lg is considered.
130 6 Optimization

Table 6.2 Truth tables of the sub-circuit


(a) Original (b) After modification
a b 0 0 – b
– f1 a b 0 0 – b
– f1

0 0 0 0 – 0 – 0 0 0 0 0 0 0 – 0
... – – – – ... – – – –
0 1 0 0 – 1 – 1 0 1 0 0 0 1 – 1
... – – – – ... – – – –
1 0 0 0 – 1 – 0 1 0 0 0 0 1 – 0
... – – – – ... – – – –
1 1 0 0 – 0 – 1 1 1 0 0 0 0 – 1
... – – – – ... – – – –

6.2.2.3 Insert the Sub-circuit and Merge the Lines

If re-synthesis was successful, the last two steps are straightforward. At first, the
considered sub-circuit is replaced by the newly synthesized one. Afterwards, the
considered garbage output line lg is merged with the respective constant input line lc ,
i.e. the respective gate connections as well as possible primary outputs are adjusted.
Finally, line lc which is not needed anymore is removed.

Example 6.7 Consider the circuit shown in Fig. 6.6. Replacing the highlighted
sub-circuit with the re-synthesized one from Example 6.6, the circuit shown in
Fig. 6.6(c) results. Here, line 1 and line 5 can be merged leading to the circuit
depicted in Fig. 6.6(d) where line 5 has been removed.

6.2.3 Experimental Results

The proposed approach for line reduction has been implemented in C++ and eval-
uated using a set of reversible circuits with a large number of constant inputs. As
synthesis method for step (b) of the optimization (see Sect. 6.2.2.2), two different
approaches have been evaluated, namely
1. an exact synthesis approach (based on the principles described in Chap. 4 and
denoted by exact synthesis in the following) that realizes a circuit with minimal
number of gates but usually requires a significant amount of run-time and
2. a heuristic synthesis approach (namely the transformation-based method intro-
duced in Sect. 3.1.2; in the following denoted by heuristic synthesis) that does
not ensure minimality but is very efficient regarding run-time.
As benchmarks, reversible circuits obtained by the BDD-based synthesis ap-
proach (from Sect. 3.2) were used. These circuits include a significant number of
constant inputs that originated from the synthesis and thus cannot be easily removed.
6.3 Optimizing Circuits for Linear Nearest Neighbor Architectures 131

The experiments have been carried out on an Intel Core 2 Duo 2.26 GHz with 3 GB
of main memory.
The results of the evaluation are presented in Table 6.3. The first four columns
give the name (Benchmark), the number of circuit lines4 (Lines), the gate count (d),
and the quantum cost (QC) of the original circuits. In the following columns, the
respective values after line reduction as well as the run-time needed for optimization
(in CPU seconds) are reported. It is thereby distinguished between results obtained
by applying exact synthesis and results obtained by applying heuristic synthesis in
Step (b).
As can be seen by the results, the number of lines can be significantly reduced for
all considered reversible circuits. On average, the number of lines can be reduced
by 17%—in the best case (spla with exact synthesis) by more than 40%.5 As already
mentioned in Sect. 6.2.2.2, reducing the circuit lines might lead to an increase in the
number of gates as well as in the quantum costs. This is also observable in the
results.
In this sense, the differences between the applied synthesis approaches provide
interesting insights. While the application of exact synthesis leads to larger run-
times (in the worst case more than 3 CPU hours are required), results from the
heuristic method are available within minutes. But, the differences in the respec-
tive number of gates and the quantum costs, respectively, are significant. If exact
synthesis is applied, the increase in number of gates and quantum cost can be kept
small—for some circuits (e.g. cordic and spla) even reductions have been achieved.

6.3 Optimizing Circuits for Linear Nearest Neighbor


Architectures
So far, circuits have been synthesized or optimized, respectively, mainly with re-
spect to gate count, quantum cost, or line count. In the last years, both criteria
have been established as quality criterion to evaluate the results obtained by syn-
thesis approaches. However, with new (physical) realizations of reversible logic
also new criteria emerge. As an example, Linear Nearest Neighbor (LNN) archi-
tectures [FDH04, Kut06, Mas07] require adjacent quantum gates (i.e. gates where
control line and target line are on adjacent circuit lines).
In this section, optimization approaches to determine circuits for LNN architec-
tures are introduced. To this end, in Sect. 6.3.1 a new cost metric, namely Nearest
Neighbor Cost (NNC), is introduced that denotes the effort needed to make an arbi-
trary quantum circuit consisting of adjacent gates only. Furthermore, it is reviewed
how NNC optimality can be achieved (i.e. how quantum circuits consisting of ad-
jacent gates only can be determined) in a straightforward way. Since this approach

4 Including both, the number of primary inputs/outputs as well as the number of additional circuit

lines.
5 Note that thereby still the number of primary inputs/outputs are considered which cannot be re-

duced.
132

Table 6.3 Experimental results


Benchmark Initial Line reduction with exact synthesis Line reduction with heuristic synthesis
Lines d QC Lines ΔLines d Δd QC ΔQC Time Lines ΔLines d Δd QC ΔQC Time

4mod5 7 8 24 6 (−1) 10 (2) 26 (2) 0.00 6 (−1) 10 (2) 26 (2) 0.00


mini-alu 10 19 59 9 (−1) 19 (0) 88 (29) 1.96 9 (−1) 95 (76) 670 (611) 0.02
rd53 13 34 98 12 (−1) 35 (1) 99 (1) 0.12 12 (−1) 50 (16) 227 (129) 0.01
sym6 14 28 92 11 (−3) 28 (0) 92 (0) 41.11 11 (−3) 177 (149) 1340 (1248) 0.05
9sym 27 61 205 24 (−3) 61 (0) 209 (4) 3489.19 22 (−5) 362 (301) 2845 (2640) 0.51
sym9 27 61 205 24 (−3) 61 (0) 209 (4) 3485.33 22 (−5) 362 (301) 2845 (2640) 0.54
hwb5 28 88 276 26 (−2) 89 (1) 342 (66) 3913.73 25 (−3) 146 (58) 915 (639) 0.19
mod5adder 32 96 292 25 (−7) 96 (0) 381 (89) 1899.34 25 (−7) 223 (127) 1379 (1087) 487.93
rd84 34 102 302 25 (−9) 105 (3) 362 (60) 3056.40 25 (−9) 424 (322) 3416 (3114) 1.84
cycle10_2 39 71 195 31 (−8) 76 (5) 253 (58) 1290.73 30 (−9) 585 (514) 3867 (3672) 1.35
ham15 45 152 308 37 (−8) 164 (12) 365 (57) 2117.42 37 (−8) 677 (525) 4652 (4344) 3.82
hwb6 46 159 507 41 (−5) 172 (13) 572 (65) 2141.84 41 (−5) 485 (326) 3293 (2786) 1.38
cordic 52 100 324 40 (−12) 80 (−20) 264 (−60) 6280.85 39 (−13) 805 (705) 5909 (5585) 5.77
hwb7 73 281 909 66 (−7) 288 (7) 1010 (101) 1204.57 65 (−8) 586 (305) 4385 (3476) 5.12
bw 87 286 922 71 (−16) 292 (6) 1167 (245) 11255.00 72 (−15) 467 (181) 2623 (1701) 1.76
hwb9 170 699 2275 152 (−18) 718 (19) 2545 (270) 4656.58 152 (−18) 1354 (655) 8437 (6162) 254.10
ex5p 206 625 1821 165 (−41) 615 (−10) 2250 (429) 2625.17 171 (−35) 909 (284) 5651 (3830) 108.52
spla 489 1669 5885 267 (−222) 1007 (−662) 3805 (−2080) 342.75 329 (−160) 2945 (1276) 22392 (16507) 894.81
6 Optimization
6.3 Optimizing Circuits for Linear Nearest Neighbor Architectures 133

significantly increases the quantum cost of the resulting circuits (an important cost
criteria for LNN architectures as well), improvements are suggested in Sect. 6.3.2.
Finally, the effect of this new optimization method is experimentally evaluated in
Sect. 6.3.3.

6.3.1 NNC-optimal Decomposition

As described in Sect. 2.1.3, quantum circuits can be obtained using reversible cir-
cuits as a basis which are afterwards mapped to a cascade of quantum gates. Al-
ternatively, quantum circuits can be directly addressed as e.g. done by the BDD-
based synthesis approach described in Sect. 3.2 or by the exact method described
in Sect. 4.2.2. However, the resulting quantum circuits often include non-adjacent
gates and thus are not applicable to LNN architectures. To have a distinct measure-
ment of this, the following definition introduces the new NNC cost metric.

Definition 6.4 Consider a 2-qubit quantum gate q where its control and target are
placed at the cth and tth line (0 ≤ c, t < n), respectively. The Nearest Neighbor Cost
(NNC) of q is defined as |c − t − 1|, i.e. the distance between control and target lines.
The NNC of a single qubit gate is defined as 0. The NNC of a circuit is defined as
the sum of the NNCs of its gates. Optimal NNC for a circuit is 0 when all quantum
gates are either 1-qubit gates or 2-qubit gates performed on adjacent qubits.

Example 6.8 Figure 6.7(a) shows the standard decomposition of a Toffoli gate lead-
ing to an NNC value of 1.

In a naive way, NNC optimality can easily be achieved, by applying adjacent


SWAP gates whenever a non-adjacent quantum gate occurs in the standard decom-
position. More precisely, SWAP gates are added in front of each gate q with non-
adjacent control and target lines to “move” a control (target) line of q towards the
target (control) line until they become adjacent. Afterwards, SWAP gates are added
to restore the original order of circuit lines. In total, this leads to additional quantum
costs given by the following lemma:

Lemma 6.1 Consider a quantum gate q where its control and target are placed at
the cth and tth line, respectively. Using adjacent SWAP gates as proposed, addi-
tional quantum cost of 6 · |c − t − 1| are needed.

Proof In total, |c −t −1| adjacent SWAP operations are required to move the control
line to the target line so that both become adjacent. Another |c − t − 1| SWAP
operations are needed to restore the original order. Considering quantum cost of
3 for each SWAP operation, this leads to the additional quantum cost of 6 · |c −
t − 1|. 

By applying this method consecutively to each non-adjacent gate, a quantum


circuit with NNC of 0 can be determined in linear time.
134 6 Optimization

Fig. 6.7 Different decompositions of a Toffoli gate

Example 6.9 Consider the standard decomposition of a Toffoli gate as depicted in


Fig. 6.7(a). As can be seen, the first gate is non-adjacent. Thus, to achieve NNC-
optimality, SWAP gates in front and after the first gate are inserted (see Fig. 6.7(b)).
Since each SWAP gate is decomposed into 3 quantum gates, this increases the total
quantum cost to 11 but leads to an NNC value of 0.

In the rest of this section, this method is denoted by naive NNC-based decom-
position. Scenarios like this also have been applied to construct circuits for LNN
architectures so far (see e.g. [CS07, Kha08]). However, this naive method might
lead to a significant increase in quantum cost. Thus, in the next section more elabo-
rated approaches for synthesizing NNC-optimal circuits are proposed.

6.3.2 Optimizing NNC-optimal Decomposition

Two improved approaches for NNC-optimal quantum circuit generation are intro-
duced. The first one exploits exact synthesis techniques, while the second one ma-
nipulates the circuit and specification, respectively.

6.3.2.1 Exploiting Exact Synthesis

In Chap. 4, exact synthesis approaches have been introduced that ensure minimality
of the resulting circuits. The synthesis problem is thereby expressed as a sequence of
Boolean satisfiability (SAT) instances. For a given function f , it is checked whether
6.3 Optimizing Circuits for Linear Nearest Neighbor Architectures 135

Table 6.4 List of available


macros n M ACRO C OST I MPR
NAIVE E XACT

3 P(a,b,c), P(c,b,a) 12 8 33%


3 P(a,c,b), P(c,a,b) 24 12 50%
4 P(a,b,d), P(d,c,a) 30 11 63%
3 MCT({a,b},c), MCT({c,b},a) 11 9 18%
4 MCT({a,b},d), MCT({d,c},a) 29 12 59%
3 MCT({a,c},b) 17 13 24%
4 MCT({d,b},a), MCT({a,c},d) 29 13 55%

a circuit with d gates realizing f exists. Furthermore, d is initially assigned to 1 and


increased in each iteration if no realization is found. More formally, for a given d
and a reversible function f : Bn → Bn , the following SAT instance (similar to the
one introduced in Definition 5.1 on p. 104) is created:
n −1
2
Φ∧ ([inpi ]2 = i ∧ [outi ]2 = f (i)),
i=0

where
• inpi is a Boolean vector representing the inputs of the circuit to be synthesized
for truth table line i,
• outi is a Boolean vector representing the outputs of the circuit to be synthesized
for truth table line i, and
• Φ is a set of constraints representing the synthesis problem for quantum circuit
as described in Sect. 4.2.2.
Applying this formulation for synthesis of quantum circuits, NNC-optimality can
be ensured by modifying the constraints in Φ so that they do not represent the whole
set of quantum gates, but only adjacent gates. In doing so, exact synthesis is per-
formed that determines minimal circuits not only with respect to quantum gates, but
also with respect to NNC. Consequently, significant better NNC-optimal decompo-
sition than the one from Fig. 6.7(b) can be synthesized.
However, the applicability of such an exact method is limited to relatively small
functions. In this sense, the proposed method is sufficient to construct minimal de-
compositions for a set of Toffoli and Peres gate configurations as shown in Table 6.4.
But nevertheless, these results can be exploited to improve the naive NNC-based de-
composition: Once an exact NNC-optimal quantum circuit for a reversible gate is
available (denoted by macro in the following), it can be reused as shown by the
following example.

Example 6.10 Reconsider the decomposition of a Toffoli gate as depicted in


Fig. 6.7. Using the proposed exact synthesis approach, a minimal quantum circuit
136 6 Optimization

Fig. 6.8 Circuit of


Example 6.10

(with respect to both quantum cost and NNC) as shown in Fig. 6.7(c) is determined.
In comparison to the naive method (see Fig. 6.7(b)), this reduces the quantum cost
from 11 to 9 while still ensuring NNC optimality. Furthermore, the realization can
be reused as a macro while decomposing larger reversible circuits. For example,
consider the circuit shown in Fig. 6.8. Here, for the second gate the naive method
is applied (i.e. standard decomposition is performed and SWAPs are added), while
for the remaining ones the obtained macro is used. This enables a quantum cost
reduction from 96 to 92.

In total, 13 macros have been generated as listed in Table 6.4 together with the
respective costs in comparison to the costs obtained by using the naive method. As
can be seen, exploiting these macros reduces the cost for each gate by up to 63%.
The effect of these macros on the decomposition of reversible circuits is considered
in detail in the experiments.

6.3.2.2 Reordering Circuit Lines

Applying the approaches introduced so far always leads to an increase in the quan-
tum cost for each non-adjacent gate. In contrast, by modifying the order of the circuit
lines (similar to the SWOP approach introduced in Sect. 5.3), some of the additional
costs can be saved. As an example, consider the circuit in Fig. 6.9(a) with quantum
cost of 3 and an NNC value of 6. By reordering the lines as shown in Fig. 6.9(b),
the NNC value can be reduced to 1 without increasing the total quantum cost. To
determine which lines should be reordered, two heuristic methods are proposed in
the following. The former one changes the order of the circuit lines according to a
global view, while the latter one applies a local view to assign the line order.

Global Reordering After applying the standard decomposition, a cascade of


1- and 2-qubit gates is generated. Now, an order of the circuit lines which reduces
the total NNC value is desired. To do that, the “contribution” of each line to the total
NNC value is calculated. More precisely, for each gate q with control line i and
target line j , the NNC value is determined. This value is added to variables impi
and impj which are used to save the impacts of the circuit lines i and j on the total
NNC value, respectively. Next, the line with the highest NNC impact is chosen for
reordering and placed at the middle line (i.e. swapped with the middle line). If the
selected line is the middle line itself, a line with the next highest impact is selected.
This procedure is repeated until no better NNC value is achieved. Finally, SWAP
6.3 Optimizing Circuits for Linear Nearest Neighbor Architectures 137

Fig. 6.9 Reordering circuit


lines

Fig. 6.10 Global and local reordering

operations as described in the previous sections are added for each non-adjacent
gate.

Example 6.11 Consider the circuit depicted in Fig. 6.10(a). After calculating the
NNC contributions, impx0 = 1.5, impx1 = 0, impx2 = 0.5, and impx3 = 1 result.
Thus, line x0 (highest impact) and line x2 (middle line) are swapped. Since further
swapping does not improve the NNC value, reordering terminates and SWAP gates
are added for the remaining non-adjacent gates. The resulting circuit is depicted in
Fig. 6.10(b) and has quantum cost of 9 in comparison to 21 that results if the naive
method is applied.

Local Reordering In order to save SWAP gates, line reordering can also be ap-
plied according to a local schema as follows. The circuit is traversed from the inputs
to the outputs. As soon as there is a gate q with an NNC value greater than 0, a
SWAP operation is added in front of q to enable an adjacent gate. However, in con-
trast to the naive NNC-based decomposition, no SWAP operation is added after q.
Instead, the resulting order is used for the rest of the circuit (i.e. propagated through
the remaining circuit). This process is repeated until all gates are traversed.

Example 6.12 Reconsider the circuit depicted in Fig. 6.10(a). The first gate is not
modified, since it has an NNC of 0. For the second gate, a SWAP operation is applied
to make it adjacent. Afterwards, the new line order is propagated to all remaining
gates resulting in the circuit shown in Fig. 6.10(c). This procedure is repeated until
the whole circuit has been traversed. Finally, again a circuit with quantum cost of 9
(in contrast to 21) results.
138 6 Optimization

6.3.3 Experimental Results

In this section, experimental results obtained with the introduced approaches are
presented. The methods introduced in Sects. 6.3.1 and 6.3.2, respectively, are evalu-
ated by measuring the overhead needed to synthesize circuits with an optimal NNC
value of 0. The approaches have been implemented in C++ and applied to bench-
mark circuits from RevLib [WGT+08] using an AMD Athlon 3500+ with 1 GB of
main memory.
The results are shown in Table 6.5. The first column gives the names of the
circuits. Then, the number of circuit lines (n), the gate count (d), the quantum
cost (QC), and the NNC value of the original (reversible) circuits are shown. The fol-
lowing columns denote the quantum cost of the NNC-optimal circuits obtained by
using the naive method (NAIVE), by additionally exploiting macros (+Macros), and
by applying reordering (G LOBAL, L OCAL, or both), respectively. The next column
gives the percentage of the best quantum cost reduction obtained by the improve-
ments in comparison to the naive method (B EST I MPR). The last column shows the
smallest overhead in terms of quantum cost needed to achieve NNC-optimality in
comparison to the original circuit (OVERHEAD FOR NNC OPTIMALITY). All run-
times are less than one CPU minute and thus are omitted in the table.
As can be seen, decomposing reversible circuits to have NNC-optimal quantum
circuits is costly. Using the naive method, the quantum cost increases by one order
of magnitude on average. However, this can be significantly improved if macros
or reordering are applied. Even if reordering may worsen the results in some few
cases (e.g. for local reordering in 0410184_169 or add64_184), in total this leads
to an improvement of 50% on average—in the best case 83% improvement was
observed. If the respective methods are separately considered, it can be concluded
that the combination of global and local reordering (i.e. G LOB .+L OC .) leads to the
best improvements over all benchmarks. As a result, NNC-optimal circuits can be
synthesized with a moderate increase of quantum cost.

6.4 Summary and Future Work

Since synthesis results often are not optimal, optimization is an established part of
today’s design flows. In this chapter, three optimization approaches for reversible
logic have been introduced. While the first one reduces the circuit cost in general
(i.e. the sizes of the respective gates and therewith quantum or transistor cost, re-
spectively), the second one reduces the number of lines, and the third one addresses
a more dedicated technology specific cost criterion.
These approaches clearly show that post-synthesis optimization often has to be
done with respect to the desired technology. For example, if quantum circuits are ad-
dressed the designer has to trade-off if an up to 70% cost reduction justifies to add a
new circuit line (and therewith to spend one more qubit). Moreover, the NNC-based
Table 6.5 Results of NNC-optimal synthesis
C IRCUIT D ECOMPOSED (NNC- OPTIMAL ) C IRCUITS OVERHEAD
NAIVE +M ACROS R EORDERING B EST FOR NNC
I MPR OPTIMALITY
G LOBAL L OCAL G LOB .+L OC .
n d QC NNC QC QC QC QC QC

0410184_169 14 46 90 24 234 197 234 423 423 16% 2.19


3_17_13 3 6 14 3 32 28 32 32 32 13% 2.00
4_49_17 4 12 32 21 158 120 128 98 98 38% 3.06
6.4 Summary and Future Work

4gt10-v1_81 5 6 34 41 282 282 258 150 147 48% 4.32


4gt11_84 5 3 7 7 49 47 25 22 16 67% 2.29
4gt12-v1_89 5 5 42 80 525 525 321 171 168 68% 4.00
4gt13-v1_93 5 4 16 26 173 173 77 56 53 69% 3.31
4gt4-v0_80 5 5 34 55 366 364 168 138 141 62% 4.06
4gt5_75 5 5 21 20 142 138 118 82 79 44% 3.76
4mod5-v1_23 5 8 24 25 174 155 114 78 78 55% 3.25
4mod7-v0_95 5 6 38 36 256 256 352 127 121 53% 3.18
add16_174 49 64 192 95 762 473 762 1104 1104 38% 2.46
add32_183 97 128 384 191 1530 953 1530 3744 3744 38% 2.48
add64_184 193 256 768 383 3066 1913 3066 13632 13632 38% 2.49
add8_172 25 32 96 47 378 233 378 360 360 38% 2.43
aj-e11_165 4 13 45 39 280 260 280 181 181 35% 4.02
alu-v4_36 5 7 31 35 242 238 218 113 104 57% 3.35
cnt3-5_180 16 20 120 416 2621 2591 1457 731 728 72% 6.07
cycle10_2_110 12 19 1126 3368 21420 21420 21420 8046 8046 62% 7.15
decod24-v3_46 4 9 9 9 63 63 39 21 24 67% 2.33
ham15_108 15 70 453 2506 15494 15390 14030 2627 2588 83% 5.71
ham7_104 7 23 83 158 1035 1027 657 342 333 68% 4.01
139
Table 6.5 (Continued)
140

C IRCUIT D ECOMPOSED (NNC- OPTIMAL ) C IRCUITS OVERHEAD


NAIVE +M ACROS R EORDERING B EST FOR NNC
I MPR OPTIMALITY
G LOBAL L OCAL G LOB .+L OC .
n d QC NNC QC QC QC QC QC

hwb4_52 4 11 23 14 107 83 107 65 65 39% 2.83


hwb5_55 5 24 104 119 823 817 595 337 340 59% 3.24
hwb6_58 6 42 142 193 1304 1160 1268 614 545 58% 3.84
hwb7_62 7 331 2325 4236 27967 27869 25939 13390 12955 54% 5.57
hwb8_118 8 633 14260 28803 187272 186880 182196 87495 87498 53% 6.14
hwb9_123 9 1959 18124 47373 304659 304540 302481 124068 124041 59% 6.84
mod5adder_128 6 15 83 154 1011 978 675 330 333 67% 3.98
mod8-10_177 5 14 88 147 975 969 621 372 363 63% 4.13
plus127mod8192_162 13 910 57400 165415 1057946 1057804 1057946 503516 503516 52% 8.77
plus63mod4096_163 12 429 25492 63732 407926 407784 407926 210400 210400 48% 8.25
plus63mod8192_164 13 492 32578 99482 633994 633852 633994 279016 279016 56% 8.56
rd32-v0_67 4 2 10 5 38 19 20 32 20 50% 1.90
rd53_135 7 16 77 124 822 750 702 330 303 63% 3.94
rd73_140 10 20 76 119 790 739 646 304 295 63% 3.88
rd84_142 15 28 112 234 1516 1465 1696 556 586 63% 4.96
sym9_148 10 210 4368 12184 77556 77556 67428 20643 25023 73% 4.73
sys6-v0_144 10 15 67 96 638 587 842 263 308 59% 3.93
urf1_149 9 11554 57770 122802 794582 735170 659150 238475 238490 70% 4.13
urf2_152 8 5030 25150 45338 297178 276882 297178 101683 101683 66% 4.04
urf3_155 10 26468 132340 331578 2121808 2038584 1933372 596368 596371 72% 4.51
urf5_158 9 10276 51380 114784 740084 706412 667484 208709 208706 72% 4.06
urf6_160 15 10740 53700 239034 1487904 1478080 1334916 320412 320409 78% 5.97
6 Optimization
6.4 Summary and Future Work 141

optimization is only needed, if quantum circuits for the mentioned LNN architec-
tures should be designed. Thus, optimization approaches should be available that
can be applied by the designer according to his current needs.
In future work, optimization approaches for further cost metrics are needed. The
constraints considered in this chapter are not complete. Besides quantum cost, tran-
sistor cost, number of lines, and nearest neighbor cost, many more cost metrics exist
(see e.g. [WSD09]). But so far, previous synthesis and optimization approaches only
take these metrics into account. Thus, extending these methods so that further crite-
ria are considered is a promising task for the future.
Chapter 7
Formal Verification and Debugging

This chapter introduces approaches for formal verification and debugging and there-
with completes the proposed approaches towards a design flow for reversible logic.
Verification is an essential step that ensures whether obtained designs in fact realize
the desired functionality or not. This is important as with increasing complexity also
the risk of errors due to erroneous synthesis and optimization approaches as well as
imprecise specifications grows. Considering traditional circuits, verification has be-
come one of the most important steps in the design flow. As a result, in this domain
very powerful approaches have been developed, ranging from simulative verifica-
tion (see e.g. [YSP+99, Ber06, YPA06, WGHD09]) to formal equivalence checking
(see e.g. [Bra93, DS07]) and model checking (see e.g. [CGP99, BCCZ99]), respec-
tively. For a more comprehensive overview, the reader is referred to [Kro99, Dre04].
For reversible logic, verification is still at the beginning. Even if first approaches
in this area exist (e.g. [VMH07, GNP08, WLTK08]), they are often not applicable
(e.g. circuits representing incompletely-specified functions are not supported). Fur-
thermore, with new synthesis approaches (e.g. the BDD-based or the SyReC-based
method introduced in Chap. 3), circuits can be designed that contain 100 and more
circuit lines and ten thousands of gates—with upward trend. This increase in cir-
cuit size and complexity cannot be handled manually and thus efficient automated
verification approaches are required. At the same time, existing approaches for tra-
ditional circuits cannot be directly applied, since they do not support the specifics of
reversible logic such as new gate libraries, quantum values, or different embeddings
of the same target function.
In the first part of this chapter, equivalence checkers [WGMD09] are proposed
that fulfill these requirements. More precisely, two approaches are introduced that
check whether two circuits realize the same target function regardless of how the
target function has been embedded or whether the circuit contains quantum logic
or pure Boolean logic. The first approach employs decision diagram techniques,
while the second one uses Boolean satisfiability. Experimental results show that for
both methods, circuits with up to 27,000 gates, as well as adders with more than
100 inputs and outputs, are handled in under three minutes with reasonable memory
requirements.
R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 143
DOI 10.1007/978-90-481-9579-4_7, © Springer Science+Business Media B.V. 2010
144 7 Formal Verification and Debugging

However, while verification methods can be used to detect the existence of er-
rors, they do not provide any information about the source of the error. Thus, in
the second part of this chapter, it is shown how the debugging process can be ac-
celerated by using an automatic debugging method (where first results have been
published in [WGF+09]). This method gets an erroneous circuit as well as a set of
counterexamples as input, and determines a set of gates (so called error candidates)
whose replacements with other gates fixes the counterexamples. Having this set of
error candidates, the debugging problem is significantly simplified, since only small
parts of the circuit must be considered to find the error. The proposed debugging
approach thereby also uses Boolean satisfiability and is inspired by traditional cir-
cuit debugging [SVAV05]. Moreover, this approach is further extended so that also
the concrete error locations are determined, i.e. gate replacements that do not only
fix the counterexamples but additionally ensures that the specification is preserved.
Experiments show and discuss the effect of these methods.
The following two sections describe both approaches in detail, i.e. equivalence
checking is addressed in Sect. 7.1 and automated debugging in Sect. 7.2, respec-
tively. Finally, the chapter is summarized and future work is sketched in Sect. 7.3.

7.1 Equivalence Checking

In this section, two approaches to the equivalence checking problem are presented.
Realizations of both, completely or incompletely specified functions are supported.
The circuits can be composed of reversible gates and quantum gates, and can thus
assume multiple internal values, but the primary inputs and outputs of the circuits
are restricted to pure (non-quantum) logic states.
The proposed approaches build on well-known proof techniques for formal ver-
ification of irreversible circuits, i.e. decision diagrams and satisfiability. The first
approach employs Quantum Multiple-valued Decision Diagrams (QMDDs) (see
Sect. 2.2.2). It involves the manipulation of unitary matrices describing the circuits
and additional matrices specifying the total or partial don’t cares. The second ap-
proach is based on Boolean satisfiability (SAT) (see Sect. 2.3.1). It is shown that
the equivalence checking problem can be transformed to a SAT instance supporting
constant inputs and garbage outputs. Additional constraints are added to deal with
total and partial don’t cares. Experiments on a large set of benchmarks show that
both approaches are very efficient, i.e. circuits with up to 27,000 gates, as well as
adders with more than 100 inputs and outputs, are handled in less than three minutes
with reasonable memory requirements.
The remainder of this section is structured as follows. The circuit equivalence
checking problem is defined in Sect. 7.1.1. Sections 7.1.2 and 7.1.3 present the
QMDD-based and the SAT-based approach, respectively. Finally, experimental re-
sults are given in Sect. 7.1.4.
7.1 Equivalence Checking 145

7.1.1 The Equivalence Checking Problem

The goal of equivalence checking is to prove whether two reversible (or quantum)
circuits designed to realize the same target functionality are equivalent or not. In
the latter case, additionally a counterexample is generated, i.e. an input assignment
showing the different output values of the two circuits. It is thereby assumed that
the two circuits have the same labels for the primary inputs and primary outputs,
respectively.
Since all considered circuits are reversible, circuits representing irreversible
functions (e.g. any n-input m-output function with n = m) might contain constant
inputs, garbage outputs, and don’t care conditions (see Sect. 3.1.1). Thus, five types
of functions must be considered:
• Completely specified: A completely specified reversible function is given.
• Constant input: At least one input is assigned to a fixed logic value. For the other
assignments to these inputs, all respective outputs are don’t cares.
• Garbage output: At least one output is unspecified for all input assignments.
• Total don’t care condition: The values of all outputs are unspecified for a given
assignment to the inputs.
• Partial don’t care conditions: The values of a proper subset of the outputs are
unspecified for a given assignment to the inputs.
Table 7.1 shows truth tables of a completely specified function, a function with
a constant input, a function with a garbage output, a function with total don’t care
conditions, and a function with partial don’t care conditions, respectively. A specifi-
cation with constant inputs, garbage outputs, or any don’t care conditions is denoted
as incompletely specified function. Total and partial don’t cares are inherited from
the irreversible function whereas constant input and garbage output don’t cares usu-
ally arise from embedding the irreversible function in a reversible specification.
In the next sections both, a QMDD-based and a SAT-based approach for checking
the equivalence of two circuits with respect to constant inputs, garbage outputs, and
don’t care conditions is proposed, respectively.

7.1.2 QMDD-based Equivalence Checking

In this section, the QMDD-based approach for equivalence checking of reversible


circuits is presented. First, the completely specified case is considered followed by
a description of how constant inputs, garbage outputs, and don’t care conditions can
be handled.

7.1.2.1 The Completely Specified Case

Given a reversible circuit with gates g0 g1 . . . gd−1 , the matrix describing the cir-
cuit is given by M = Md−1 × · · · × M1 × M0 where Mi is the matrix for gate gi .
146 7 Formal Verification and Debugging

Table 7.1 Different function types


x0 x1 x2 f0 f1 f2 1 x1 x2 f0 f1 f2 x0 x1 x2 f0 f1 g

0 0 0 1 1 1 0 0 0 – – – 0 0 0 1 1 –
0 0 1 1 1 0 0 0 1 – – – 0 0 1 1 1 –
0 1 0 1 0 1 0 1 0 – – – 0 1 0 1 0 –
0 1 1 1 0 0 0 1 1 – – – 0 1 1 1 0 –
1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 –
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 –
1 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 –
1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 –
(a) Completely spec. (b) Constant input (c) Garbage output

x0 x1 x2 f0 f1 f2 x0 x1 x2 f0 f1 f2

0 0 0 1 1 1 0 0 0 1 1 1
0 0 1 1 1 0 0 0 1 1 1 –
0 1 0 – – – 0 1 0 1 0 1
0 1 1 1 0 0 0 1 1 1 – 0
1 0 0 0 1 1 1 0 0 0 1 1
1 0 1 – – – 1 0 1 0 1 0
1 1 0 0 0 1 1 1 0 – 0 1
1 1 1 0 0 0 1 1 1 0 0 0
(d) Total don’t cares (e) Part. don’t cares

As reviewed in Sect. 2.2.2, QMDDs are a data-structure for the representation and
manipulation of r n × r n complex-valued matrices with r pure logic states. More-
over, for a given order the QMDDs of two identical functions are canonical [MT06,
MT08].
Thus, for the completely specified case, two reversible circuits that realize the
same function and adhere to the same variable order have the same matrix descrip-
tion. Because of this uniqueness of QMDDs, to check the equivalence of two circuits
it is sufficient to verify that the top edges of the two QMDDs point to the same node
with the same weight. A traversal of the QMDDs is not required. Note that sorting is
required when the inputs or outputs, respectively, are not aligned in the two circuits.
Therefore, swap gates are added to achieve the same order for both circuits.

7.1.2.2 Constant Inputs, Garbage Outputs

A constant input means that the input space is restricted to those assignments con-
taining that value, all others do not occur. To support this, the matrix is adjusted.
Consider the case when the constant input is the top-level partition variable with
7.1 Equivalence Checking 147

constant value j . The following equations show the transformation of the input
space (denoted by γ ) to the output space (denoted by δ) for the general case and
for the case with a constant input, respectively.1
         
δ0 M0 M1 γ0 δ0 Mj φ γj
= , = .
δ1 M2 M3 γ1 δ1 Mj +2 φ φ

The variable φ thereby denotes an empty sub-matrix or sub-vector of appropriate


dimension. For QMDDs, an empty sub-matrix is represented by a null edge pointer.
Thus, performing these operations before comparing both QMDDs, the equivalence
can be checked even if constant inputs occur.
In a similar way, garbage outputs are handled. Suppose the top-level partition
variable is a garbage output. In this case, the output of the circuit regardless of the
value of that variable is of interest. To this end, the matrix is reduced to
    
δ̂ M0 + M 2 M 1 + M 3 γ0
= ,
φ φ φ γ1

where δ̂ denotes the output after removal of the garbage output. To explain the ad-
dition of sub-matrices, recall that the circuit inputs and outputs are assumed to be
in pure logic states, i.e. one element of γ is 1 and the others are 0. The same is true
for δ. Further, M is a permutation matrix (a special case of a unitary matrix).
In general, constant inputs and garbage outputs can correspond to any variables
in the circuit’s QMDD. This can be handled by performing a depth-first traversal
of the QMDD applying the above reductions to each node as it is encountered. In
a depth-first traversal, the reductions are applied to a node’s descendants before
applying them to the node itself. Note that a variable can be both a constant input
and a garbage output. The order of applying the two reductions is unimportant. This
traversal reduces sub-matrices as required throughout the full matrix.

7.1.2.3 Don’t Care Conditions

Let M̂ denote the matrix for a circuit after the constant input and garbage output re-
ductions are applied. To deal with total don’t cares in the target function, a diagonal
matrix D is constructed such that Di,i = 0 if the corresponding output position is a
total don’t care, and Di,i = 1 otherwise. Then M̂ × D is computed. The effect is to
force all total don’t care situations to 0 by ignoring the input states corresponding
to don’t care output assignments. This ensures that when the reduced matrices are
compared for two circuits, differences cannot arise in total don’t care positions. Note
that the easiest way to construct a QMDD for D is to start from a diagonal matrix
and then use a depth-first traversal to zero the diagonal elements corresponding to
total don’t cares.

1 Note that both matrices already are partitioned to correspond to the QMDD partitioning.
148 7 Formal Verification and Debugging

Partial don’t care conditions can be handled in a similar fashion. The difference
is that partial don’t care conditions apply only to a subset and not to all outputs. The
simplest approach is to treat the outputs for which a set of partial don’t cares does
not apply as pseudo-garbage outputs and construct a new matrix for this situation by
reducing the pseudo-garbage. A diagonal matrix is then constructed for those don’t
cares and the equivalence check proceeds as above. This is repeated for each subset
of the outputs that have shared partial don’t cares.

7.1.3 SAT-based Equivalence Checking

In this section, the SAT-based equivalence checker for reversible logic is described.
The general idea is to encode the problem as an instance of Boolean satisfiability
to be solved by a SAT solver (see Sect. 2.3.1). If the SAT solver returns unsatisfi-
able, then the checked circuits are equivalent. Otherwise, a counterexample can be
extracted from the satisfying assignment of the instance.

7.1.3.1 The Completely Specified Case

To formulate the problem, a so-called miter structure as proposed in [Bra93] for


traditional (irreversible) circuits is built. By applying the input assignments to both
circuits G1 and G2 , differences at corresponding outputs are observed by XOR op-
erations. If at least one XOR evaluates to 1 (determined by an additional OR opera-
tion), the two circuits are not equivalent.

Example 7.1 The miter structure for two circuits containing three lines is shown in
Fig. 7.1. Note that the added XOR and OR operations are only used in formulating
circuit equivalence checking as a SAT instance. They are not actually added to the
circuits.

To encode this miter as a SAT instance, a new free variable is introduced for each
signal in the circuit. Furthermore, each reversible gate as well as the additional XOR
and OR operations of the miter structure are represented by a set of clauses. Finally,
the output of the OR gate is constrained to the value 1. These transformations can be
performed in linear time and space with respect to the given circuit sizes, since only
local operations (adding a few number of clauses per gate) are required. In doing
so, the SAT instance becomes satisfiable iff there exists an input assignment to the
circuits where at least one pair of the corresponding outputs assumes different val-
ues. In this case, from the satisfying assignment a counterexample can be extracted
just by obtaining the assignments of all SAT variables representing circuit lines. If
both circuits are equal, then no such assignment exists and thus the SAT solver must
return unsatisfiable.
If quantum circuits are considered, V and V+ gates may produce non-Boolean
values. Thus, the variables for the associated signals employ a multiple-valued rather
7.1 Equivalence Checking 149

Fig. 7.1 SAT formulation for


completely specified
functions

than a Boolean encoding. More precisely, each signal of the circuit is represented
by two Boolean variables y and z, while
• 00 represents the Boolean value 0,
• 01 represents the non-Boolean value V0 ,
• 10 represents the Boolean value 1, and
• 11 represents the non-Boolean value V1 .
So, both reversible as well as quantum circuits can be checked.

7.1.3.2 Constant Inputs, Garbage Outputs, and Don’t Care Conditions

If circuits realizing an incompletely specified function are checked for equivalence,


constant inputs, garbage outputs, as well as total and partial don’t care conditions
must be considered. To this end, the SAT formulation introduced in the last section
is extended as follows:
• Constant inputs: The associated SAT variables are restricted to the appropriate
constant values. This can be applied by using unit clauses.
• Garbage outputs: Garbage outputs are by definition don’t cares and can be ig-
nored in the SAT miter structure.
• Total and partial don’t care conditions: In these cases, new variables t, pf0 , . . . ,
pfn−1 and additional AND operations are added to the miter structure. The vari-
able t evaluates to 0, iff an input assignment leading to a total don’t care condition
is applied. The variables pfi (0 ≤ i < n) evaluate to 0, iff an input assignment
leading to a partial don’t care condition at output fi is applied. The AND opera-
tions ensure that if t (pfi ) is assigned to 0, all outputs (the respective output) are
ignored by the miter. Hence, only differences in output values without don’t care
conditions are detected.

Example 7.2 Figure 7.2 shows the extended miter for two circuits realizing an in-
completely specified function. A truth table showing the garbage output, the don’t
care conditions, as well as the resulting values for t and pfi is given in the left part
of Fig. 7.2. Note that the first half of the truth table includes don’t cares due to the
constant input.
150 7 Formal Verification and Debugging

Fig. 7.2 SAT formulation for incompletely specified functions

7.1.4 Experimental Results

This section provides experimental results. The QMDD package from [MT08] has
been used with the default sizes for the computational tables, a garbage collection
limit of 250,000, and a maximum of 200 circuit lines. For the SAT-based approach
the SAT solver MiniSAT [ES04] has been applied. In total, two kinds of experi-
ments have been conducted (typically using different gate libraries): experiments
with equivalent circuits and with non-equivalent circuits, respectively. The experi-
ments have been carried out on an AMD Athlon 3500+ with 1 GB of memory with
a timeout of 500 CPU seconds. All considered benchmark circuits were taken from
RevLib [WGT+08].
Table 7.2 shows the results obtained by comparing equivalent and non-equivalent
circuits, respectively. The first column gives the name of the circuit. For equivalent
circuits, two numbers following the name give the unique identifier of the circuit
realizations as used in RevLib. For non-equivalent circuits, only one number is given
which identifies the circuit from the corresponding equivalent test with the larger
number of gates. That circuit is used as given in RevLib and in a modified form by
arbitrarily altering, adding or deleting gates. Column DC shows the types of don’t
cares (see table note a for the coding). In column GT the gate types used in each
circuit are provided (see table note b for the coding). Column n presents the number
of inputs and column d gives the number of gates for the first and second circuit,
respectively. In the next three columns, the data for the QMDD-based approach is
shown, i.e. peak number of QMDD nodes, run-time in CPU seconds, and memory
in MByte. The peak number of nodes is the maximum number of active nodes at
any point in building the circuit QMDDs and checking for equivalence. Finally, the
last four columns provide the data for the SAT-based method. First, the number of
variables and the number of clauses of the SAT instance are shown. Then, the run-
time as well as the required memory are given.
Both approaches prove or disprove the equivalence for all considered bench-
marks (except one for the SAT-based approach) very quickly. The maximum run-
time observed was less than three minutes. Several experiments with more than
10,000 gates are included. The largest has nearly 27,000 gates. Even for these cases,
the proof times are very fast. The largest circuit in terms of the number of inputs
Table 7.2 Experimental results
C IRCUITS QMDD- BASED SAT- BASED
NAME DCa GTb n d N ODES T IME M EM . VARS C LSES T IME M EM .

E QUIVALENT C IRCUITS
0410184 (170, 169) none NCV/NCT 14 74/46 2924 0.01 12.54 2843 4640 0.05 3.70
add16 (175, 174) CG CV/CT 49 96/64 7841 0.02 12.72 12788 18356 0.08 6.31
7.1 Equivalence Checking

add32 (185, 183) CG CV/CT 97 192/128 28761 0.03 13.82 50148 70500 0.29 15.14
add64 (186, 184) CG CV/CT 193 384/256 109769 0.12 17.53 198596 276164 1.12 48.10
alu-v0 (26, 27) G NCT/NCT 5 6/6 171 0.01 12.53 72 159 <0.01 2.70
alu-v1 (28, 29) G NCT/NCT 5 7/7 188 <0.01 12.53 82 180 <0.01 2.70
alu-v2 (30, 33) G NCT/NCT 5 18/7 295 <0.01 12.53 145 340 <0.01 2.70
alu-v3 (34, 35) G NCT/NCT 5 7/7 177 <0.01 12.53 83 185 <0.01 2.70
alu-v4 (36, 37) G NCT/NCT 5 7/7 223 <0.01 12.53 84 189 <0.01 2.70
c2 (182, 181) none NCV/NCT 35 305/116 40767 0.08 14.36 26018 40774 0.52 10.67
urf1 (149, 151) none T/CT 9 11554/1487 250311 5.05 23.91 130419 302871 32.10 55.24
urf2 (152, 154) none T/CT 8 5030/620 250020 1.09 23.89 50849 119592 3.82 23.86
urf3 (155, 157) none T/CT 10 26468/2674 250441 22.71 23.91 320578 735809 86.81 134.67
urf5 (158, 159) none T/CT 9 10276/499 250061 3.32 23.91 107764 249191 19.15 48.18
urf6 (160, 160) none T/T 15 10740/10740 258140 97.95 24.25 343712 751876 >500.00 –
cnt3-5 (179, 180) CGT CT/CT 16 25/20 204874 0.64 21.95 2338 21976 0.43 9.56
decod24-v2 (43, 44) CT NCT/NCV 4 6/9 192 <0.01 12.52 115 214 <0.01 2.70
hwb4 (52, 49) none CT/CT 4 11/17 208 <0.01 12.52 133 336 0.01 2.82
hwb5 (55, 53) none NCT/CT 5 24/55 972 <0.01 12.52 448 1111 0.02 2.94
hwb6 (56, 58) none CT/CT 6 126/42 3588 0.01 12.53 1146 2846 0.03 3.21
151
Table 7.2 (Continued)
152

C IRCUITS QMDD- BASED SAT- BASED


NAME DCa GTb n d N ODES T IME M EM . VARS C LSES T IME M EM .

hwb7 (62, 60) none NCT/F 7 331/166 22010 0.04 13.44 3730 9249 0.14 4.49
hwb8 (116, 115) none NCT/CTP 8 749/614 82942 0.26 16.19 11599 27786 1.02 7.42
hwb9 (123, 122) none NCT/CTP 9 1959/1541 250162 1.89 23.88 33454 79798 6.28 16.73
mod10 (171, 176) T NCT/NCT 4 10/7 122 <0.01 12.53 97 265 <0.01 2.70
mod8-10 (178, 177) GTP NCT/NCT 5 9/14 235 <0.01 12.53 161 491 0.01 2.70
rd53 (135, 134) CGT CT/TP 7 16/12 593 <0.01 12.53 224 523 0.01 2.83
sym9 (146, 147) CGT CT/CTP 12 28/28 8825 0.02 12.89 727 1582 0.02 3.37

N ON - EQUIVALENT C IRCUITS
0410184 (170) none NCV/NCV 14 74/73 2550 <0.01 12.59 4318 6385 0.04 4.16
add16 (175) CG CV/NCV 49 96/97 9516 0.02 13.22 19270 23732 0.10 7.23
add32 (185) CG CV/NCV 97 192/193 21892 0.04 14.11 75398 90522 0.38 19.31
add64 (186) CG CV/NCV 193 384/386 91417 0.15 17.93 298632 353478 1.53 63.91
alu-v0 (26) G NCT/NCT 5 6/5 116 <0.01 12.52 67 148 <0.01 2.70
alu-v1 (28) G NCT/CT 5 7/6 119 <0.01 12.52 77 172 <0.01 2.70
alu-v2 (30) G NCT/NCT 5 18/16 346 <0.01 12.52 198 480 0.01 2.70
alu-v3 (34) G NCT/CT 5 7/6 202 <0.01 12.52 79 178 <0.01 2.70
alu-v4 (36) G NCT/CT 5 7/6 204 0.01 12.52 81 186 <0.01 2.70
c2 (182) none NCV/NCV 35 305/304 34107 0.15 14.21 43648 64262 0.32 15.84
urf1 (149) none T/CT 9 11554/11554 250311 9.53 23.91 231099 531527 4.58 97.09
urf2 (152) none T/T 8 5030/5029 229154 2.00 22.97 90549 211280 2.12 39.08
urf3 (155) none T/T 10 26468/26464 250441 44.14 23.91 582274 1323351 13.72 243.19
7 Formal Verification and Debugging
Table 7.2 (Continued)
C IRCUITS QMDD- BASED SAT- BASED
NAME DCa GTb n d N ODES T IME M EM . VARS C LSES T IME M EM .
7.1 Equivalence Checking

urf5 (158) none T/CT 9 10276/10276 250163 6.91 23.90 205539 472739 4.85 84.53
urf6 (160) none T/CT 15 10740/10740 260196 144.93 24.50 343711 751873 16.48 143.23
cnt3-5 (179) CGT CT/NCT 16 25/26 204745 0.63 21.94 2429 22158 0.43 9.74
decod24-v2 (43) CT NCT/NCT 4 6/5 89 <0.01 12.52 60 143 <0.01 2.70
hwb (4) none CT/NCT 4 11/18 216 <0.01 12.52 137 344 0.01 2.83
hwb5 (55) none NCT/CT 5 24/54 961 <0.01 12.53 442 1094 0.01 2.94
hwb6 (56) none CT/CT 6 126/125 3922 0.01 12.53 1733 4357 0.03 3.50
hwb7 (62) none NCT/NCT 7 331/331 22541 0.06 13.45 4865 11553 0.08 4.94
hwb8 (116) none NCT/NCT 8 749/750 61990 0.31 15.28 12438 28977 0.16 8.05
hwb9 (123) none NCT/NCT 9 1959/1958 250162 2.55 23.91 36244 83568 0.57 17.39
mod (10) T NCT/NCT 4 10/10 107 <0.01 12.53 109 296 <0.01 2.70
mod8 (10) GTP NCT/NCT 5 9/15 269 <0.01 12.52 166 501 0.01 2.70
rd53 (135) CGT CT/CT 7 16/16 611 <0.01 12.54 252 582 0.01 2.82
sym9 (146) CGT CT/CT 12 28/27 10045 0.02 12.93 714 1553 0.01 3.36
a Don’t-care: none = completely-specified, C = constant input, G = garbage output, T = total don’t-care, P = partial don’t-care
b Gate type: N = NOT, C = controlled-NOT, F = multiple control Fredkin, P = Peres, T = multiple control Toffoli, V = V or V+
153
154 7 Formal Verification and Debugging

and outputs, add64 with n = 193, is a 64 bit ripple carry adder which includes 64
constant inputs and 128 garbage outputs to achieve reversibility. Comparing the run-
times of the two approaches, the QMDD method is faster in the case of equivalent
circuits, while in the non-equivalent case the SAT-based approach often is faster.
Regarding memory usage it can be seen that both approaches do not blow up even
for instances containing tens of thousands gates.

7.2 Automated Debugging and Fixing


After a circuit has been shown erroneous, the designer must locate the concrete er-
ror location to detect the source of the error. Without any additional support, this
is a manual and therewith time-consuming task. Thus, for traditional circuits, au-
tomated debugging approaches have been introduced that support the designer in
this task (see e.g. [LCC+95, VH99, HK00, SVAV05, ASV+05, FSVD06]). They
use the counterexamples obtained by verification to reduce the set of gates that still
have to be considered during debugging. In particular the approach from [SVAV05],
which uses Boolean satisfiability, has been shown to be efficient and robust. How-
ever, applying this method to reversible circuits does not lead to the desired results,
i.e. already for single errors no gate reduction can be achieved. Consequently, as for
the approaches introduced in the former chapters and sections, respectively, a newly
developed approach for reversible logic is needed.
In this section, a first approach to automatically determine error candidates ex-
plaining the erroneous behavior of a reversible circuit is introduced. More precisely,
given an erroneous circuit and a set of counterexamples describing the error(s), the
approach returns sets of gates, whose replacements with other gates fix the coun-
terexamples. Besides the automatic debugging approach, also theoretical results are
presented. For a restricted error model, i.e. assuming single missing control line er-
rors, already the given number of counterexamples allows to exclude a significant
number of gates from being an error candidate. Thus, the size of the SAT instance
is reduced or in the best case no SAT call is needed leading to a speed-up of the
overall debugging process.
However, as in irreversible debugging error candidates only give an approxima-
tion of the real error location. For single errors this is not a big problem, since the
reduction achieved by the error candidates often is significant. In contrast, for mul-
tiple errors a second problem besides approximation arises: the determined error
candidates often are misleading, i.e. an error candidate pinpoints to parts of the cir-
cuit which cannot be used for repair. Hence, an improved debugging approach is
introduced that determines the concrete error locations. That is, strengthened error
candidates are generated ensuring that for each gate of the error candidate, a gate re-
placement is possible such that not only the counterexamples are fixed, but also the
specification is preserved. Experiments demonstrate the behavior of this approach.
While concrete error locations obviously give the best possible debugging quality
(pinpointing to the exact error and offering a fix), the run-times increase for larger
circuits. Hence, there is a quality vs. time trade-off.
7.2 Automated Debugging and Fixing 155

Finally, a theoretical result is presented that can be applied to automatically fix


an erroneous circuit. Therefore, a single gate must be replaced by a fixing cascade
which—due to reversibility—can be computed in time linear in the size of the cir-
cuit. Altogether, the approaches presented in this section help designers to determine
error candidates, to improve these candidates so that the concrete error location re-
sults, and to fix the erroneous circuit.
The remainder of this section is structured as follows: The next section intro-
duces the debugging problem in detail and briefly reviews SAT-based debugging
of traditional circuits. Having this as a basis, determining error candidates for re-
versible circuits is proposed and further improvements are described in Sect. 7.2.2.
How to determine error locations instead of the approximating error candidates is
explained in Sect. 7.2.3, while Sect. 7.2.4 introduces the automatic fixing approach.
Finally, Sect. 7.2.5 provides experimental results for all proposed approaches.

7.2.1 The Debugging Problem

To keep the following descriptions self-contained, this section defines the debugging
problem and briefly reviews the SAT-based method for debugging of traditional cir-
cuits.

7.2.1.1 Problem Description

As their classic counterparts, reversible circuits may contain errors e.g. because of
bugs in synthesis as well as optimization tools or imprecise specifications, respec-
tively. These errors can be detected e.g. by verification as introduced in the last sec-
tion. However, to find the source of an error, the circuit must be debugged—often a
manual and time-consuming process.
Thus, automatic approaches are desired that support the designer to reduce the
possible error locations. Therefore, error models have been defined that represent
frequently occurred errors. Possible error models for reversible logic include:

Definition 7.1 Let g = MCT(C, t) be a Toffoli gate of a circuit G. Then,


1. a missing control line error appears if g is replaced by TOF(C  , t), where C  =
C \ {xi } with xi ∈ C (i.e. a control line is removed),
2. an additional control line error appears if gate g is replaced by TOF(C  , t),
where C  = C ∪ {xi } with xi ∈ / C ∪ {t} (i.e. a control line is added),
3. a wrong target line error appears if g is replaced by TOF(C  , t  ), where t  = t
and C  may be different from C (i.e. g is replaced by a gate with another target
line), and
4. a wrong gate error appears if g is replaced by TOF(C  , t  ), where t  = t and/or
C  is different from C (i.e. g is replaced by another gate).
156 7 Formal Verification and Debugging

Fig. 7.3 Circuit with missing


control error

Remark 7.1 Note that some error models are supersets of other models. For exam-
ple, all control line errors are also wrong gate errors. As shown later, the automatic
debugging approaches can be improved, if the designer is able to restrict the error
to a particular model.

Given an error model together with an erroneous circuit G as well as a set of


counterexamples, the goal of automatic debugging approaches is to determine a
set of error candidates that may explain the erroneous behavior of G. An error
candidate is a set of gates gi that can be replaced by other gates (according to the
error model) such that for each counterexample the correct output values result. The
size of an error candidate is given by the number of gates (denoted by k).

Example 7.3 Figure 7.3 shows an erroneous circuit G together with a counterexam-
ple (applied to the inputs of G). At the outputs, the wrong values (determined by
the counterexamples) as well as the expected (i.e. the correct) values are annotated.
For this example, {g5 } is an error candidate, since replacing g5 with another gate
(namely a gate with one more control line) would correct the output values. In this
case, the counterexample detects a missing control error.

Having error candidates, the debugging process can be significantly accelerated,


since only a small part of the circuit (highlighted by the error candidates) must be
inspected. Moreover, in many cases determining error candidates directly leads to
the desired error location.

7.2.1.2 Debugging of Traditional Circuits

To determine error candidates, automatic approaches have been introduced for tradi-
tional circuits (see e.g. [LCC+95, VH99, HK00, SVAV05, ASV+05, FSVD06]). In
particular, methods based on Boolean satisfiability (SAT) have been demonstrated
to be very effective for debugging irreversible logic [SVAV05]. Here, the erroneous
circuit and a set of counterexamples are used to create a SAT instance. Solving this
instance using well engineered SAT solvers (see e.g. [ES04]) returns solutions from
which the desired set of error candidates can be determined.
The general structure for the debugging problem that is encoded as a SAT in-
stance is shown in Fig. 7.4. For each counterexample, a copy of the circuit is cre-
ated, whose inputs are assigned to values provided by the given counterexamples
(denoted by cex0 , . . . , cex|CEX| ). The outputs are assigned to the correct values. Fur-
thermore, each gate gi is extended by additional logic, i.e. a multiplexor with select
7.2 Automated Debugging and Fixing 157

Fig. 7.4 SAT-based


debugging approach

line si is added. If si is assigned to 0, then the output value of gate gi is passed


through, i.e. the gate works as usual. Otherwise (if si = 1), an unrestricted value is
used (available via a new free variable w). Therefore, if gi is an erroneous gate, the
SAT solver can assign si = 1 and choose the correct gate value to enable correct
values at the output of the circuit.
As depicted in Fig. 7.4, the same select value si is used for a gate gi with respect
to all duplications. This ensures that free values of the respective output signals are
only used, if circuit outputs are corrected for all counterexamples. Furthermore, the
total number of selects si set to 1 is limited to k. Starting with k = 1, k is iteratively
increased until the SAT instance becomes satisfiable. Then, each satisfying assign-
ment yields an error candidate of size k. All gates with si set to 1 are contained in
the error candidate. By performing all solution SAT (i.e. determining all solutions
from the instance), all error candidates for a given k are calculated.

7.2.2 Determining Error Candidates

In this section, the SAT-based debugging formulation for reversible circuits is pre-
sented. It is shown that the formulation for traditional circuits cannot be directly
applied. Nevertheless, a similar concept is exploited. Furthermore, for specific error
models (namely all control line errors) improvements are introduced.

7.2.2.1 SAT Formulation

The debugging approach described above has been demonstrated to be very effective
for determining error candidates of irreversible circuits. One-output gates for AND,
158 7 Formal Verification and Debugging

Fig. 7.5 Applying traditional debugging to reversible logic

OR, XOR, etc. are thereby considered. Thus, only a single multiplexor as shown
in Fig. 7.5(a) is added to express whether a gate gi can become part of an error
candidate or not.
In contrast, reversible logic builds on a different gate library where each gate
always has n outputs. Indeed, it is possible to convert Toffoli gates into respective
AND gate and OR gate combinations (see Fig. 7.5(b) for a Toffoli gate with two
control lines) and apply traditional debugging methods afterwards. But, this would
lead to several drawbacks. First, more than one multiplexor with different select
lines are needed to express whether a Toffoli gate gi can become part of an error
candidate. Thus, the value k does not represent the number of error candidates any
longer. For example, a single missing control line error may show up on two multi-
plexors and hence complicates debugging. Furthermore, only one single line of the
Toffoli gate is considered with this formulation. Thus, errors like misplaced target
lines cannot be detected.
Alternatively, n multiplexors with the same select si might be added to the de-
bugging formulation as shown in Fig. 7.5(c). However, also this lead to meaningless
results as the following lemma shows.

Lemma 7.1 Let G be an erroneous circuit. Using the traditional debugging ap-
proach with the additional logic formulation depicted in Fig. 7.5(c) and an arbitrary
set of counterexamples, for each gate gi (0 ≤ i < d) of G a satisfying solution with
si = 1 exists.

Proof Let G = G1 gi G2 be an erroneous circuit with the set of counterexamples


CEX. A gate gi is determined as error candidate, if a satisfying assignment with
si = 1 exists such that the correct output value for each counterexample cex ∈ CEX
can be calculated. Using the additional logic formulation depicted in Fig. 7.5(c),
assigning si = 1 enables unrestricted values for all n outputs of the gate gi . To
obtain the values leading to the correct circuit output, just G−1
2 has to be applied to
the correct output values. This can be performed for each gate gi of G. 
7.2 Automated Debugging and Fixing 159

Fig. 7.6 Proposed debugging


formulations

As a result, each gate will be identified as an error candidate. This lemma shows
that the existing SAT-based debugging formulation for irreversible circuits is too
general for reversible circuits. In fact, assigning si to 1 should imply that the output
values of gate gi cannot be chosen arbitrarily, but with respect to the functionality
of Toffoli gates. The two main properties of Toffoli gates are:
• At most one line (the target line) is inverted if the respective control lines are
assigned to 1 and
• all remaining lines are passed through.
A new formulation respecting these properties is given in Fig. 7.6(a). For each
output of a gate gi , a second multiplexor with a new select sib is added (0 ≤ b < n).
By restricting the sum si0 + · · · + sin−1 to 1 it is ensured that the value of at most
one line is modified, if si is set to 1. All remaining values are passing through.
Therewith, the multi-output behavior including the reversibility is reflected in the
debugging formulation.2 In the following example the application of the debugging
formulation is demonstrated.

Example 7.4 Consider the circuit realization of function 3_17 with an injected miss-
ing control error at gate g5 depicted in Fig. 7.7. The missing control error leads to
four counterexamples shown in the first four rows below the circuit in Fig. 7.7. For
k = 1, besides {g5 } the proposed debugging formulation also returns {g4 } as an error
candidate (marked by a dashed rectangle). This is, because replacing g4 with a NOT
gate at line c leads to correct output values for all counterexamples as shown in the
first four rows: The bold values will be inverted such that they match the propagated
values from the output of the circuit.
However, as in traditional debugging an error candidate always is an approxima-
tion and thus may not necessarily be the error location. In fact, g4 is not an error
location, since for the NOT gate replacement an incorrect output (with respect to

2 Note that using multiplexors obviously makes the considered circuits non-reversible. However,

this formulation is only used as a logic encoding of the debugging problem.


160 7 Formal Verification and Debugging

Fig. 7.7 Circuit with single


error

the specification of 3_17) using 011 as input is computed as can be seen in the fifth
line of the figure. Nevertheless, the number of gates that have to be considered to
detect the error is reduced from 6 to 2.

As shown by the experiments in Sect. 7.2.5, a significant amount of gates can be


automatically identified as non-relevant using the proposed approach, i.e. to fix the
error only a very small fraction of all gates has to be considered. Besides that, some
further improvements are possible as described in the following.

7.2.2.2 Improvements for Control Line Errors

The proposed debugging formulation needs a substantial amount of additional logic.


This can be reduced, if a restricted error-model is assumed. In this section, a simpler
debugging formulation for control line errors is described. This simplification leads
to a faster calculation of error candidates and thus should be applied, if the source
of an error can be limited to this type of errors.
Control line errors include both, missing as well as additional control lines in a
Toffoli gate. They may occur, when e.g. optimization approaches or the designer
himself manipulates control lines of Toffoli gates. In particular, deleting control
lines is used by optimization approaches (see e.g. [ZM06] or the approach described
in Sect. 6.1), since they reduce the quantum costs for the considered circuit. Errors
caused by deleting control lines can be seen as missing control errors, too.
Since missing control errors (as well as additional control errors) only affect the
target line of a Toffoli gate, the debugging formulation can be simplified to the one
shown in Fig. 7.6(b). Here, multiplexors are only added for the target line of each
gate. If a gate gi includes a control line error, only the value of the target line can be
erroneous. By assigning si to 1, the SAT solver can choose the correct value and thus
enable correct outputs. In this case, gi becomes an element of an error candidate.
Furthermore, if a single missing control error is assumed, the following holds:
7.2 Automated Debugging and Fixing 161

Lemma 7.2 Let G be a reversible circuit with a single missing control error and
|CEX| the total number of counterexamples for this error. Then, the erroneous gate
includes c = n − 1 − log2 |CEX| control lines.

Proof Let G be a reversible circuit with a missing control error in gate gi containing
c control lines. To detect the erroneous behavior, (1) all control lines of gi have to be
assigned to 1 and (2) another line of gi (the missing control line) has to be assigned
to 0. Due to the reversibility, these values can be propagated to the inputs of the
circuit leading to |CEX| = 2n−c−1 different counterexamples in total. From this,
one can conclude

|CEX| = 2n−c−1 ,
log2 |CEX| = log2 2n−c−1 ,
log2 |CEX| = n − c − 1,
c = n − 1 − log2 |CEX|. 

Remark 7.2 If the total number of counter-examples is not available, Lemma 7.2
can still be used as an upper bound. This is, because the value of CEX can only
be increased leading to a smaller value of c. Thus, all gates including more than
n − 1 − log2 |CEX| control lines don’t have to be considered during debugging
(where in this case |CEX| is the rounded up number of available counterexamples).

Exploiting Lemma 7.2, the number of gates that have to be considered can be
reduced significantly. In some cases, this reduction already leads to a single gate
and therewith to the desired error location (see experiments in Sect. 7.2.5). But, even
if additionally the SAT-based part to determine error candidates has to be invoked,
improvements can be observed, since additional logic as depicted in Fig. 7.6(b) has
to be added only to gates containing exactly c = n − 1 − log2 |CEX| control lines.

7.2.3 Determining Error Locations

The debugging approach proposed in the last section basically uses a modified mul-
tiplexor formulation compared to debugging for irreversible circuits. However, tra-
ditional debugging approaches suffer from the problem that the obtained error can-
didates only approximate the error location(s), i.e. the verification engineer may be
pinpointed to misleading parts of the circuit which cannot be used for repair. In con-
trast, for reversible logic the debugging formulation can be extended to overcome
these limitations. As a result, the real source of an error can be calculated—the error
location.
In this section, first the limits of traditional debugging and error candidates are
discussed in more detail, followed by a more detailed description of error locations.
Then, the new debugging algorithm for computing error locations is presented.
162 7 Formal Verification and Debugging

Fig. 7.8 Circuit with


multiple error

7.2.3.1 Limits of Error Candidates

As in irreversible debugging, error candidates are only an approximation of the real


source of the error (as shown in Example 7.4). In case of single errors (i.e. for
k = 1), this can be accepted, since the number of gates that have to be considered
is significantly reduced. However, if multiple errors occur in the design, the set of
error candidates may be completely misleading as the following example shows.

Example 7.5 In Fig. 7.8, a circuit realization of the function alu is depicted. In this
circuit two missing control line errors have been injected: at gate g2 and at gate g3 ,
respectively. If the proposed debugging approach is applied, already for k = 1 a so-
lution (namely {g2 }) is returned. However, by exhaustive enumeration it has been
checked that no replacement for gate g2 exists such that the circuit realizes the func-
tion specification. In fact, an appropriate replacement of gate g2 only fixes the coun-
terexamples (similar to Example 7.4), while the correct behavior according to the
specification of the circuit is not preserved. Thus, this error candidate is misleading.

The example clearly demonstrates the need for strengthened error candidates.
This results in the formalization of error locations. An error location is an error
candidate where for each gate of the error candidate there is a single gate replace-
ment which not only fixes all counterexamples, but also preserves the overall spec-
ification. Having a single error location available, the real error is automatically
highlighted in the circuit and no further manual inspection is necessary. Since error
locations are strengthened error candidates, this concept guarantees to give better re-
sults than just determining error candidates. In the following, an automatic approach
for determining error locations is described.

7.2.3.2 Approach

The general idea of the debugging approach for calculating error locations is as
follows. For increasing sizes k of error candidates, it is checked whether an error
candidate is an error location or not. To determine the error candidates, first the
debugging formulation of Sect. 7.2.2 is applied. Then, a second SAT instance is
created that checks whether there are gate replacements such that the specification
7.2 Automated Debugging and Fixing 163

Fig. 7.9 SAT-based formulation to determine error locations of size k = 2

is fulfilled or not. In the following, first the SAT formulation for this check is de-
scribed. Afterwards, the overall algorithm (that uses this formulation) is introduced
and illustrated by means of an example.
Usually, in the debugging process a reference circuit F (used to obtain the coun-
terexamples) is available. Having this, a method is applied which is inspired by
SAT-based equivalence checking (as introduced in Sect. 7.1.3) and exact synthesis
of reversible logic (as introduced in Chap. 4). To check the existence of appropriate
gates which replace each gate of a given error candidate of size k, a miter structure
as depicted in Fig. 7.9 is built. Note that this figure illustrates the structure only for
a concrete example, i.e. a circuit with three inputs/outputs and an error candidate of
size k = 2 containing the gates gp and gq .
By applying the inputs to both, to the reference circuit F as well as to the er-
roneous circuit G, the identity for corresponding outputs must result which is en-
forced by XNOR gates. An additional AND gate, where the output is set to value 1,
constrains that both circuits must produce the same outputs for the same input as-
signment.
Then, it is allowed that each gi of the current error candidate can be of any arbi-
i
trary type. To this end, free variables tlog , ti , . . . , t1i and c1i , c2i , . . . , cn−1
i
2 n log2 n−1
are introduced (for brevity denoted by ti and ci in the following). According to the
assignment to ti and ci , the gate gi is modified. Thereby, ti is used as a binary
encoding of a natural number t i ∈ {0, . . . , n − 1} which defines the chosen target
line. In contrast, ci denotes the control lines. More precisely, assigning cli = 1 (with
1 ≤ l ≤ n − 1) means that line (t i + l) mod n becomes a control line of the Toffoli
gate gi . The same encoding has already been used in Chap. 4 for exact synthesis.
Figure 4.5 on p. 62 gives some examples for assignments to ti and ci with their
respective Toffoli gate representation.
Finally, this formulation is duplicated for each possible input of the circuit, i.e. 2n
times. The same variables ti and ci are thereby used for each duplication. In doing
so, a functional description is constructed which is satisfiable, iff there is a valid
assignment to ti and ci (i.e. iff there is a gate replacement of all gates gi ) such that
164 7 Formal Verification and Debugging

Fig. 7.10 Main flow of error (1) determineErrorLocation(F , G, CEX)


location determination (2) EC = ∅; // stores error candidate
(3) k = 1;
(4) while (true) do
(5) inst = formulateErrCandInstance(G, CEX, k);
(6) for each (EC = solution(inst))
(7) inst = formulateErrLocInstance(F, G, EC);
(8) if (solution(inst) == SAT)
(9) return EC;
(10) k = k + 1;

for all inputs the same input-output mapping results. Then, a fix can be extracted
from the assignments to ti and ci . If there is no such assignment (i.e. the instance is
unsatisfiable), it has been proven that the considered error candidate is not an error
location.
Note that despite this SAT formulation, it is also possible to exhaustively enu-
merate all gate combinations for an error candidate. However, using modern SAT
solvers sophisticated techniques (in particular search space pruning by means of
conflict analysis [MS99] as well as efficient implication techniques [MMZ+01]) are
exploited that significantly accelerates the solving process. In doing so, the worst
case complexity still remains exponential but as the experiments in Sect. 7.2.5 show,
error locations can be determined for many reversible circuits.
Having this SAT formulation as a basis, the overall approach to determine error
locations is summarized in Fig. 7.10. First, it is aimed to find an error location
including one gate only, i.e. error candidates with k = 1 are determined while EC
contains the current error candidate (line 5 and 6). Then, it is checked whether EC
is an error location using the SAT formulation as described above. If this is the case
(line 8), then EC is an error location and thus is returned (line 9). Otherwise, the
remaining error candidates of same size are checked. If no error location of size k
has been found, then k is increased and the respective steps are repeated (line 10).

Example 7.6 Consider again the circuit shown in Fig. 7.8 with two injected errors.
The debugging approach described in Sect. 7.2.2 first identifies EC = {g2 } as an
error candidate. However, using the SAT formulation introduced above it can be
verified that there is no gate replacement for EC that preserves the original circuit
specification. Thus, further error candidates are generated. Since this is not possible
for k = 1, k is increased leading to EC = {g2 , g3 }. Because an appropriate gate
replacement can be found for this candidate, EC is the desired error location.

Note that sometimes more than one error location is possible. As an example,
consider the circuit given in Fig. 7.11. Here, a single missing control error has been
injected at gate g1 . Nevertheless, in total there are two repairs for the erroneous
circuit: g0 and g1 , respectively. Thus, if the designer wants to know if more than a
single error location exists, the algorithm in Fig. 7.10 must not terminate at line 9
but iterated as long as the desired number of checks are performed.
7.2 Automated Debugging and Fixing 165

Fig. 7.11 Erroneous circuit


with two error locations

In summary, using the proposed formulation, error locations can be automati-


cally determined. This does not only pinpoint the designer to the concrete source of
an error, but also allows an automatic repair, since the concrete gate replacements
that fixes the erroneous behavior can be obtained by the assignments to ti and ci ,
respectively.

7.2.4 Fixing Erroneous Circuits

So far, the goal was to determine error candidates or locations that explain the erro-
neous behavior in a circuit, respectively. However, the reversibility of the considered
circuits additionally allows an easy computation of fixes (even easier than creating
a fixing formulation and solving the corresponding SAT instance as described in
the previous section). Therefore, a single gate must be replaced by a fixing cascade
which—due to reversibility—can be computed in time linear in the size of the cir-
cuit. More precisely, fixes can be automatically generated by applying the following
lemma.

Lemma 7.3 Let F be an error-free reference circuit and G = G1 gi G2 be an erro-


neous realization/optimization of F . Then, G can be fixed by replacing an arbitrary
gate gi of G with a cascade of gates Gi = G−1 −1
fix
1 F G2 .

Proof Since G−1 G realizes the identity function, it holds:


fix
G1 Gi G2 = F ⇔

G−1 −1
= G−1 −1
fix
1 G1 Gi G2 G2 1 F G2 ⇔

Gi = G−1 −1
fix
1 F G2 .

fix
Thus, replacing gi with Gi fixes the erroneous circuit G. 

At a first glance, applying this lemma for fixing the erroneous circuit G, i.e. re-
fix
placing some gi by Gi (which includes the circuit F ), leads to a larger circuit than
fix
F itself. However, in many cases Gi can be reduced to a few gates. In particular,
fix
if the chosen gate gi is a location of a single error, Gi can be simplified to a single
166 7 Formal Verification and Debugging

Fig. 7.12 Fixing an


erroneous circuit

fix
Table 7.3 Sizes of Gi G ATE |Gi |
fix fix
Simplified |Gi |

g0 13 6
g1 13 3
g2 13 1
g3 13 3
g4 13 3

gate. As a consequence, Lemma 7.3 can also be applied to determine error locations
of single errors.
The application of Lemma 7.3 is illustrated by the following example.

Example 7.7 Consider the circuits F and G for function ham3 as depicted in
Figs. 7.12(a) and (b), respectively. While F realizes the desired function, G is an er-
roneous optimization. Applying Lemma 7.3 for each gate gi gives the results shown
in Table 7.3. The first column gives the considered gate. In the second column the
fix
number of gates of Gi after applying the lemma is shown, and the last column pro-
fix
vides the same information after simplification of Gi . As can be seen, nearly all
fixes can be significantly reduced. For gate g2 even a reduction to a single gate can
be observed.

For simplification any synthesis or optimization approach can be used. Thereby,


fix
to determine the smallest possible fix, in the worst case the simplification of Gi
has to be executed for each possible gate of the circuit, since the best position is
not known upfront. However, in many cases even the exact synthesis approach de-
scribed in Chap. 4 leads to good results in feasible time (in particular, if it is tried
fix
to simplify Gi to a single gate only). In other cases, the transformation-based ap-
proach described in Sect. 3.1.2 has been shown to be quite effective. So, different
positions can be tried until a satisfying one is found. Recall that in case of multiple
errors, this technique repairs the circuit by substituting one gate by the simplified
fix. However, this gives no information where the errors are located.
7.2 Automated Debugging and Fixing 167

7.2.5 Experimental Results

The proposed methods have been implemented in C++ and were evaluated on a set
of reversible circuits taken from RevLib [WGT+08]. In this section, the results of
the experimental studies are presented. First, the behavior of the various debugging
methods applied to single errors is discussed, followed by a consideration for mul-
tiple errors. Afterwards, the results of the automatic fixing approach are presented.
For all benchmark circuits, single and multiple errors have been randomly in-
jected to circuits taken from [WGT+08]. More precisely, a gate has been replaced
with another gate (leading to a wrong gate error) or a control line has been re-
moved (leading to a missing control error), respectively. Counterexamples showing
the errors were generated using the SAT-based equivalence checker introduced in
Sect. 7.1.3.
For solving the respective instances, the SAT solver MiniSAT [ES04] has been
used. The documented run-times include the times for instance generation and solv-
ing. All experiments have been carried out on an AMD Athlon 3500+ with 1 GB of
main memory. The timeout was set to 5000 CPU seconds.

7.2.5.1 Single Errors

In a first series of experiments, the debugging approaches are considered for deter-
mining single error candidates. For debugging wrong gate errors, the approach pro-
posed in Sect. 7.2.2 has been used (denoted by D BG EC). For missing control errors
additionally the improvements are applicable, namely the consideration of target
lines only (denoted by TARGET L INES ONLY) and the application of Lemma 7.2
(denoted by |CEX|-based Reduction).
The results are summarized in Table 7.4. Column C IRCUIT gives the circuit
name. Column d, column n, and column |CEX| give the number of gates, the num-
ber of lines, and the number of counterexamples (the number within the brackets de-
notes the number of counterexamples for which the circuit has been duplicated), re-
spectively. Furthermore, for each approach the number of obtained error candidates
(denoted by C AND .) and the overall run-time in CPU seconds (denoted by T IME)
are provided. Column C AND . for |CEX|-BASED R EDUCTION includes two values.
The first denotes the remaining gates after applying Lemma 7.2, the second gives
the final number of error candidates after running the SAT-based debugging ap-
proach. Finally, column R ED . lists the best reduction obtained by the approaches,
i.e. the percentage of gates that are identified as non-relevant (meaning the error is
not located at these gates).
As shown in the table, a significant amount of gates can be automatically iden-
tified as non-relevant for debugging the error. Reductions of at least two third—for
larger circuits of more than 90%—are achieved. As an example, for the wrong gate
error in circuit hwb9 with 1544 gates, two error candidates are obtained in less
than 100 CPU seconds. The quality of the resulting set of error candidates often
depends on the used strategy. For example, to identify the missing control error
168 7 Formal Verification and Debugging

Table 7.4 Determining error candidates for single errors


(a) Wrong gate errors
C IRCUIT |CEX| D BG EC R ED .
d n C AND . T IME

3_17 6 3 4(2) 2 <0.01 66.7 %


4_49 16 4 8(2) 1 <0.01 87.5 %
4gt4 6 5 4(2) 1 <0.01 83.3 %
ham3 5 3 4(2) 1 <0.01 60.0 %
ham7 23 7 16(1) 2 <0.01 95.7 %
hwb4 17 4 8(2) 4 <0.01 94.1 %
hwb5 55 5 8(2) 8 0.01 92.7 %
hwb6 126 6 16(1) 16 0.03 87.3 %
hwb7 289 7 16(1) 3 0.05 95.2 %
hwb8 637 8 4(2) 1 8.23 99.7 %
hwb9 1544 9 30(3) 2 89.75 99.9 %
plus63mod4096 429 12 32(3) 1 14.60 96.0 %
plus63mod8192 492 13 8(2) 2 12.17 99.2 %
plus127mod8192 910 13 16(1) 12 35.81 89.2 %
urf1 11554 9 128(12) – >5000.00 –
urf2 5030 8 64(6) 2 1563.64 99.9 %
urf3 26468 10 256(25) – >5000.00 –
(b) Missing control errors
C IRCUIT |CEX| D BG EC TARGET LINES |CEX|-BASED R ED .
d n ONLY R EDUCTION
C AND . T IME C AND . T IME C AND . T IME

3_17 6 3 2(2) 2 <0.01 2 <0.01 4/2 <0.01 66.7 %


4_49 16 4 8(2) 10 <0.01 2 <0.01 2/1 <0.01 93.8 %
4gt4 6 5 2(2) 3 <0.01 3 <0.01 2/2 <0.01 66.7 %
ham3 5 3 4(2) 3 <0.01 1 <0.01 1/1 <0.01 80.0 %
ham7 23 7 32(3) 5 <0.01 2 0.01 17/1 <0.01 95.7 %
hwb4 17 4 8(2) 4 <0.01 2 <0.01 1/1 <0.01 94.1 %
hwb5 55 5 8(2) 6 0.02 8 0.01 25/1 <0.01 98.2 %
hwb6 126 6 16(1) 7 0.07 6 0.03 20/1 0.01 99.2 %
hwb7 289 7 32(3) 250 0.71 13 0.05 28/5 0.02 98.3 %
hwb8 637 8 8(2) 34 5.57 2 0.18 167/1 0.04 99.8 %
hwb9 1544 9 16(1) 68 221.42 3 0.56 443/1 0.87 99.9 %
plus63mod4096 429 12 16(1) 296 78.81 26 1.00 84/12 0.19 97.2 %
plus63mod8192 492 13 512(51) 101 426.74 92 25.13 43/4 0.99 99.2 %
plus127mod8192 910 13 64(6) 237 198.14 206 69.70 128/8 0.31 99.1 %
urf1 11554 9 128(12) – >5000.00 1 63.28 1/1 <0.01 99.9 %
urf2 5030 8 64(6) 12 2143.22 6 6.40 1/1 <0.01 99.9 %
urf3 26468 10 256(25) – >5000.00 4 358.74 1/1 <0.01 99.9 %
7.2 Automated Debugging and Fixing 169

in circuit hwb7 still 250 (out of 289) error candidates have to be considered af-
ter applying the D BG EC approach. Here, restricting the error model and using the
improvements not only leads to a speed-up, but also to a smaller number of error
candidates. This is also effective for other circuits (e.g. hwb4 and urf1). Here, just
by applying Lemma 7.2 the set of error candidates is reduced to the single erroneous
gate and no further SAT call is required.
For determining error locations (instead of error candidates), under the single
error assumption two approaches can be used: The SAT-based formulation intro-
duced in Sect. 7.2.3 (using the extended miter formulation) and the approach based
fix
on Lemma 7.3 (i.e. generate Gi for an error candidate determined by the debug-
ging approach a nd try to simplify it to a single gate). Table 7.5 summarizes the
results of both approaches where the former one is denoted by D BG EL and the lat-
ter one by D BG EC+F IX. Again, it is distinguished between wrong gate errors and
missing control errors (where |CEX|-BASED R EDUCTION can be applied). Columns
C IRCUIT, d, n, and C AND . denote the name, the number of gates, the number of
lines, and the number of obtained error candidates of each benchmark, respectively.
The documented overall run-times of the respective approaches are given in col-
umn T IME.3
As can be clearly seen, error locations for single errors can be determined
even for circuits consisting of more than 25.000 gates. Applying Lemma 7.3
(i.e. D BG EC+F IX) is thereby more efficient than using the miter-formulation
(i.e. D BG EL). In contrast, the D BG EL-approach is also applicable to multiple errors
which is considered in the next section.

7.2.5.2 Multiple Errors

If multiple errors occur, error candidates may be misleading. This has been observed
in a second series of experiments, where circuits including multiple errors have been
applied to the method for error candidate determination described in Sect. 7.2.2 (de-
noted by D BG EC) and the method for error location determination described in
Sect. 7.2.3 (denoted by D BG EL), respectively. Results for multiple missing con-
trol errors injected to a set of circuits are given in Table 7.6. Results are thereby
presented for the G ENERAL C ASE (without an error assumption) and for the case,
where a control line error is assumed so that TARGET LINES ONLY can be consid-
ered. The denotation of the respective columns is analog to the ones in the previous
tables.
First of all, it can be seen that determining error candidates only (D BG EC), in fact
is misleading for many cases. As examples, for the benchmarks 3_17, ham7, hwb5,
3_17-3, hwb5-3, and hwb7-3 error candidates with lower cardinality k than for the
approved error location result. Thus, replacing these gates would fix the counterex-
amples but does not preserve the correct behavior according to the specification of

3 Note that in both cases this also includes the debugging run-time needed to obtain the error can-

didates.
170 7 Formal Verification and Debugging

Table 7.5 Determining error locations for single errors


C IRCUIT D BG EC+F IX D BG EL
d n C AND . T IME T IME

W RONG GATE ERRORS


3_17 6 3 2 <0.01 <0.01
4_49 16 4 1 <0.01 <0.01
4gt4 6 5 1 <0.01 <0.01
ham3 5 3 1 <0.01 <0.01
ham7 23 7 2 <0.01 0.02
hwb4 17 4 4 <0.01 <0.01
hwb5 55 5 8 0.01 0.06
hwb6 126 6 16 0.03 0.03
hwb7 289 7 3 0.05 0.80
hwb8 637 8 1 8.23 15.22
hwb9 1544 9 2 89.75 133.18
plus63mod4096 429 12 1 14.60 >5000.00
plus63mod8192 492 13 2 12.17 >5000.00
plus127mod8192 910 13 12 35.81 >5000.00
urf1 11554 9 – >5000.00 >5000.00
urf2 5030 8 2 1563.64 1570.01
urf3 26468 10 – >5000.00 >5000.00

M ISSING CONTROL ERRORS


3_17 6 3 2 <0.01 <0.01
4_49 16 4 2 <0.01 <0.01
4gt4 6 5 3 <0.01 <0.01
ham3 5 3 1 <0.01 <0.01
ham7 23 7 2 0.01 0.02
hwb4 17 4 2 <0.01 <0.01
hwb5 55 5 8 0.01 0.06
hwb6 126 6 6 0.03 0.33
hwb7 289 7 13 0.05 0.80
hwb8 637 8 2 0.18 14.22
hwb9 1544 9 3 0.56 44.51
plus63mod4096 429 12 26 1.00 >5000.00
plus63mod8192 492 13 92 25.13 >5000.00
plus127mod8192 910 13 206 69.70 >5000.00
urf1 11554 9 1 63.28 >5000.00
urf2 5030 8 6 6.40 37.45
urf3 26468 10 4 358.74 >5000.00
7.2 Automated Debugging and Fixing 171

Table 7.6 Determining error locations and error candidates for multiple errors
C IRCUIT D BG EL D BG EC
G ENERAL C ASE TARGET TARGET
LINES ONLY LINES ONLY
d n |CEX| P OS . k T IME P OS . k T IME C AND . k T IME

2 injected errors
3_17 6 3 4(2) 1 2 0.00 1 2 0.00 2 1 <0.01
4_49 16 4 10(4) 1 2 0.03 1 2 0.00 3 2 <0.01
4gt4 6 5 8(4) 1 2 0.04 1 2 0.00 3 2 <0.01
ham3 5 3 4(2) 1 2 0.00 1 2 0.00 1 2 <0.01
ham7 23 7 30(12) 1 2 0.68 1 2 0.08 4 1 <0.01
hwb4 17 4 12(5) 1 2 0.04 1 2 <0.01 2 2 <0.01
hwb5 55 5 3(2) 1 2 0.52 1 2 0.04 2 1 <0.01
hwb6 126 6 22(9) 1 2 27.77 1 2 0.37 2 2 0.29
hwb7 289 7 80(32) 1 2 1204.28 1 2 5.49 13 2 0.87
hwb8 637 8 42(17) – – >5000.00 1 2 59.18 4 2 45.39
hwb9 1544 9 36(15) – – >5000.00 1 2 348.51 2 2 332.51
urf1 1517 9 30(12) – – >5000.00 – – >5000.00 4 2 1007.09
urf3 2674 10 141(57) – – >5000.00 – – >5000.00 4 2 2932.35

3 injected errors
3_17-3 6 3 4(2) 1 3 0.00 1 3 <0.01 2 1 <0.01
4_49-3 16 4 10(4) 1 3 1147 1 3 0.01 79 3 <0.01
4gt4-3 6 5 18(8) 1 3 0.64 1 3 0.00 6 3 <0.01
ham7-3 23 7 46(19) 1 3 8.94 1 3 0.09 4 3 <0.01
hwb4-3 17 4 12(5) 1 3 0.48 1 3 0.05 65 3 <0.01
hwb5-3 55 5 24(10) 1 3 4241.11 1 3 1.03 2 2 0.18
hwb6-3 126 6 41(17) – – >5000.00 1 3 7.38 4 3 6.82
hwb7-3 289 7 94(38) – – >5000.00 1 3 311.04 7 2 1.10
hwb8-3 637 8 97(39) – – >5000.00 1 3 2575.31 12 3 2606.05

the circuit (as also discussed in Example 7.5). These examples confirm the need for
determining error locations.
However, as also shown in Table 7.6, determining error locations is expensive.
For some benchmarks (urf1 and urf3) it was not possible to obtain the error locations
within the given timeout. For other benchmarks (hwb8, hwb9, hwb6-3, hwb7-3, and
hwb8-3), this was only possible if the TARGET LINES ONLY improvement has ad-
ditionally been applied. Nevertheless, for all other benchmarks error locations can
be determined. In comparison to traditional debugging (where only approaches that
determine error candidates exist), this is a significantly stronger result.
172 7 Formal Verification and Debugging

Table 7.7 Fixing for single AVG .


and double errors C IRCUIT |Gfix |
T IME
|G| |F | n min max

Single errors
4_49 11 16 4 1 24 0.01
4gt4 6 17 5 1 3 0.01
hwb5 23 55 5 1 75 0.01
hwb6 41 126 6 1 157 0.02
hwb7 235 331 7 1 384 0.05
hwb8 613 749 8 1 678 0.14
hwb9 1543 1959 9 1 2645 0.84
urf1 1516 11554 9 1 2229 0.95
urf2 3249 5030 8 1 1176 0.41

Double errors
4_49 11 16 4 3 22 0.01
4gt4 5 17 5 16 22 0.01
hwb5 23 55 5 3 77 0.01
hwb6 41 126 6 24 190 0.02
hwb7 235 331 7 69 431 0.05
hwb8 613 749 8 3 1202 0.21
hwb9 1543 1959 9 489 2689 0.83
urf1 1516 11554 9 361 2655 1.03
urf2 3249 5030 8 904 1251 0.40

7.2.5.3 Automatic Fixing

Finally, fixing of erroneous circuits without determining error locations is con-


sidered. For this purpose, Lemma 7.3 was applied to all gates gi of an erro-
neous circuit G using an additional circuit F as reference. Afterwards, the respec-
fix
tive Gi has been simplified by extracting the function and synthesizing it using the
transformation-based approach of Sect. 3.1.2. In doing so, the resulting sizes of the
fixing cascade have been evaluated. Table 7.7 presents the results. Again, the first
column gives the name of the circuit. The next two columns provide the sizes of
the erroneous circuit G denoted by |G| and the reference circuit F denoted by |F |.
Then, the minimal size (min) and the maximal size (max), as well as the average
run-time to determine one fix (AVG . T IME) are shown, respectively.
Obviously, the size of the respective fixes depends on both, the size of the er-
roneous circuit G as well as on the reference circuit F (since Gfix = G−1 −1
1 F G2 ).
Nevertheless, very often the resulting Gfix can be simplified to a significantly smaller
cascade. For single errors, always a single gate fix (identical to the error location)
is found. For double errors, the fixes become larger. Nevertheless, also for this case
very compact fixes can be efficiently computed. In fact, for most of the considered
7.3 Summary and Future Work 173

benchmarks a smaller (fixed) circuit than originally given by F can be obtained


(sometimes even if the worst case fixes given in column max are applied). Since ad-
ditionally never more than approx. one CPU second is needed to determine a fix, the
application of Lemma 7.3 is an efficient alternative to simply fix erroneous circuits.

7.3 Summary and Future Work


Verification is crucial to ensure the correctness (and therewith the quality) of the
circuits. Two aspects are particularly covered in the traditional design flow: Model
checking verifies if the initial description of a circuit (e.g. given in an HDL) matches
the specification while equivalence checking verifies the correctness of the follow-
ing (e.g. optimization) steps. In this chapter, approaches to close the gap for the lat-
ter aspect have been proposed. A typical scenario results from the application of the
approaches introduced in the former chapters: Using synthesis approaches (e.g. the
ones introduced in Chap. 3) a circuit for the desired function is synthesized. Since
this circuit does not satisfy the cost requirements it is optimized (e.g. by one of the
approaches introduced in Chap. 6 or by some manual optimization). Then, equiv-
alence checking as proposed in this chapter is applied to verify that the optimized
circuit still represents the desired function.
Therefore, approaches based on decision diagrams and based on Boolean satisfi-
ability have been introduced. Both check the equivalence with respect to the target
functionality, i.e. regardless of the used embedding, the applied output permutation,
the number of additional circuit lines, etc. Using these approaches, circuits con-
sisting of thousands of gates or more than hundred variables have been efficiently
verified.
In contrast, model checking (in particular property checking) still is an open is-
sue. In the traditional design flow, this aspect is often related to a hardware de-
scription language which is used to implement a given specification. The approach
introduced in Sect. 3.3 is a first step towards such a language for reversible logic. To
check the correctness of the respective implementations appropriate model checkers
as well as property specification languages are needed. Thus, research in these ar-
eas is the next step. The methods described in this chapter for equivalence checking
may provide the basis for that.
However, while verification is used to detect the existence of an error only, it
gives no support what to do if a circuit has been shown erroneous. In this case, the
designer only has a set of counterexamples which can be used to manually debug
the design. In particular for large circuits, this can be a time-consuming task. For
this reason, in traditional circuit design automatic debugging approaches have been
introduced that support the designer by identifying “non-relevant” gates and instead
highlighting so called error candidates. However, in this chapter it was shown that a
one-to-one adaption of such approaches does not lead to the desired results. Hence,
a new formulation has been devised that integrates the properties of reversible gates.
As a result, methods for error candidate determination now are also available for re-
versible circuits. The experiments showed that applying these methods, the number
174 7 Formal Verification and Debugging

of gates which have to be considered during debugging can be reduced by 66.7% to


99.9%.
Moreover, also approaches that determine concrete error locations have been in-
troduced. In particular, for multiple errors this avoids misleading results and directly
pinpoints the designer to the source of an error. Finally, for the case an erroneous
circuit only has to be fixed, an efficient fixing approach has been proposed as well.
This approach exploits properties of reversible logic and thus allows an easy fix
which only has to be simplified afterwards.
Future work is mainly driven by the current performance of the proposed
approaches—in particular if multiple errors are considered. Here, improvements
with respect to the run-time are needed. For particular error models, first promis-
ing results have already been proposed in [FWD10, JFWD10]. But besides that, in
particular the formal methods need improvement. Thus, more sophisticated solving
engines and formulations are required. The exploitation of unsatisfiable cores, as
e.g. done in [SFBD08] for multiple error debugging of irreversible circuits, might
be a promising direction. Furthermore, also the consideration of more specialized
error models (similar to the ones identified in [PFBH05] for reversible testing) can
be studied in future work.
Chapter 8
Summary and Conclusions

Traditional technologies more and more start to suffer from the increasing miniatur-
ization and the exponential growth of the number of transistors in integrated circuits.
To face the upcoming challenges, alternatives are needed. Reversible logic provides
such an alternative that may replace or at least enhance traditional computer chips.
In the area of quantum computation and low-power design, first very promising re-
sults have been obtained already today. Nevertheless, research in reversible logic is
still at the beginning. No continuous design flow exists so far. Instead, approaches
only for individual steps (e.g. for synthesis) have been proposed. But, most of these
methods are applicable to very small functions or circuits, respectively. This is not
sufficient to design complex reversible systems.
In this book, first steps towards a design flow for reversible logic have been pro-
posed. That is, methods ranging from synthesis, embedding, optimization, verifica-
tion, and debugging have been introduced and experimentally evaluated. By BDD-
based synthesis, it is possible to synthesize functions with more than 100 variables.
More complex reversible systems can be realized using the SyReC language. There-
fore, also techniques for exact synthesis as well as for embedding have been utilized
to determine the respective building blocks. Three optimization approaches have
been proposed that lead to more compact circuits with respect to different criteria
(i.e. quantum cost, transistor cost, number of lines, or nearest neighbor cost) and
therewith with respect to different technologies. To prove that e.g. the optimization
was correct, equivalence checking with the help of decision diagrams or Boolean
satisfiability, respectively, has been introduced. In case of a failed verification, ap-
proaches have been proposed that help the designer to find the error or to fix the
circuit. Altogether, using the approaches introduced in this book, complex functions
can be synthesized and circuits with thousands of gates can be optimized, verified,
and debugged—everything in a very efficient automated way.
Combining the respective approaches, a first design flow results that already
can handle functions and circuits of notable size. The uniform RevLib-format
for reversible functions and circuits (see www.revlib.org) thereby builds the ba-
sis to link the respective steps together. The resulting tools can be obtained under
www.revkit.org. So, designer of reversible circuits have a first continuous and con-
sistent flow to create their circuits.
R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 175
DOI 10.1007/978-90-481-9579-4_8, © Springer Science+Business Media B.V. 2010
176 8 Summary and Conclusions

Besides that, the methods proposed in this book build the basis for further exten-
sions towards a design flow that covers more elaborated designs needs. In particular,
extensions “on the top” and “on the bottom” of the flow are promising.
More precisely, synthesis of reversible logic should reach the system level.
Therefore, it is vital to have appropriate hardware description languages as well
as corresponding synthesis approaches. Only then, design of complex reversible cir-
cuits is possible. The SyReC language proposed in Sect. 3.3 is a first promising step
in this direction. Followed by this, also new verification issues will emerge. In par-
ticular for complex circuits specified using e.g. hardware description languages, it
often cannot be ensured that the design was implemented as intended. Thus, devel-
oping methods for property checking is a promising next step.
Furthermore, questions related to test of reversible circuits will emerge in future.
Already today, first models and approaches in this area exist (see e.g. [PHM04,
PBL05, PFBH05]). But due to the absence of large physical realizations, it is hard to
evaluate the suitability of them. Additionally, existing approaches cover only some
possible technologies. With ongoing progress in the development of further (and
larger) physical quantum computing or reversible CMOS realizations, new models
and approaches are needed to efficiently test them. Then, at the latest, also the design
flow for reversible logic needs a comprehensive consideration of testing issues.
Besides this “global view” on upcoming challenges in this domain, further ideas
for future work already have been discussed in the respective chapters. Overall,
the development of an elaborated design flow that is comparable to the one for
traditional circuit design (that has been developed in the last 25 years) will take
further years of research. In this context, the contributions in this book provide a
good starting point.
References

[Abr05] S. Abramsky, A structural approach to reversible computation. Theor. Comput. Sci.


347(3), 441–464 (2005)
[ASV+05] M.F. Ali, S. Safarpour, A. Veneris, M.S. Abadir, R. Drechsler, Post-verification
debugging of hierarchical designs, in Int’l Conf. on CAD (2005), pp. 871–876
[BB09] R. Brummayer, A. Biere, Boolector: An efficient SMT solver for bit-vectors and ar-
rays, in Tools and Algorithms for the Construction and Analysis of Systems (2009),
pp. 174–177
[BBC+95] A. Barenco, C.H. Bennett, R. Cleve, D.P. DiVinchenzo, N. Margolus, P. Shor,
T. Sleator, J.A. Smolin, H. Weinfurter, Elementary gates for quantum computation.
Am. Phys. Soc. 52, 3457–3467 (1995)
[BBC+05] M. Bozzano, R. Bruttomesso, A. Cimatti, T. Junttila, P. Rossum, S. Schulz, R. Se-
bastiani, The MathSAT 3 system, in Int. Conf. on Automated Deduction (2005),
pp. 315–321
[BCCZ99] A. Biere, A. Cimatti, E. Clarke, Y. Zhu, Symbolic model checking without BDDs,
in Tools and Algorithms for the Construction and Analysis of Systems. LNCS, vol.
1579 (Springer, Berlin, 1999), pp. 193–207
[BDL98] C.W. Barrett, D.L. Dill, J.R. Levitt, A decision procedure for bit-vector arithmetic,
in Design Automation Conf. (1998), pp. 522–527
[Ben73] C.H. Bennett, Logical reversibility of computation. IBM J. Res. Dev. 17(6), 525–
532 (1973)
[Ben05] M. Benedetti, sKizzo: a suite to evaluate and certify QBFs, in Int’l Conf. on Auto-
mated Deduction (2005), pp. 369–376
[Ber06] J. Bergeron, Writing Testbenches Using Systemverilog (Springer, Berlin, 2006)
[Bie05] A. Biere, Resolve and expand, in Theory and Applications of Satisfiability Testing.
LNCS, vol. 3542 (Springer, Berlin, 2005), pp. 59–70
[Bra93] D. Brand, Verification of large synthesized designs, in Int’l Conf. on CAD (1993),
pp. 534–537
[BRB90] K.S. Brace, R.L. Rudell, R.E. Bryant, Efficient implementation of a BDD package,
in Design Automation Conf. (1990), pp. 40–45
[Bry86] R.E. Bryant, Graph-based algorithms for Boolean function manipulation. IEEE
Trans. Comput. 35(8), 677–691 (1986)
[BW96] B. Bollig, I. Wegener, Improving the variable ordering of OBDDs is NP-complete.
IEEE Trans. Comput. 45(9), 993–1002 (1996)
[CA87] R. Cuykendall, D.R. Andersen, Reversible optical computing circuits. Opt. Lett.
12(7), 542–544 (1987)
[CBRZ01] E.M. Clarke, A. Biere, R. Raimi, Y. Zhu, Bounded model checking using satisfia-
bility solving. Form. Methods Syst. Des. 19(1), 7–34 (2001)

R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 177


DOI 10.1007/978-90-481-9579-4, © Springer Science+Business Media B.V. 2010
178 References

[CDKM05] S.A. Cuccaro, T.G. Draper, S.A. Kutin, D.P. Moulton, A new quantum ripple-carry
addition circuit, in Workshop on Quantum Information Processing (2005)
[CGP99] E.M. Clarke, O. Grumberg, D. Peled, Model Checking (MIT Press, Cambridge,
1999)
[Coo71] S.A. Cook, The complexity of theorem proving procedures, in Symposium on The-
ory of Computing (1971), pp. 151–158
[CS07] A. Chakrabarti, S. SurKolay, Nearest neighbour based synthesis of quantum
boolean circuits. Eng. Lett. 15, 356–361 (2007)
[DB98] R. Drechsler, B. Becker, Binary Decision Diagrams—Theory and Implementation
(Kluwer Academic, Dordrecht, 1998)
[DBW+07] S. Deng, J. Bian, W. Wu, X. Yang, Y. Zhao, EHSAT: An efficient RTL satisfiability
solver using an extended DPLL procedure, in Design Automation Conf. (2007),
pp. 588–593
[DDG00] R. Drechsler, N. Drechsler, W. Günther, Fast exact minimization of BDDs. IEEE
Trans. CAD 19(3), 384–389 (2000)
[DEF+08] R. Drechsler, S. Eggersglüß, G. Fey, A. Glowatz, F. Hapke, J. Schloeffel, D. Tille,
On acceleration of SAT-based ATPG for industrial designs. IEEE Trans. CAD 27,
1329–1333 (2008)
[DLL62] M. Davis, G. Logeman, D. Loveland, A machine program for theorem proving.
Commun. ACM 5, 394–397 (1962)
[DM06a] B. Dutertre, L. Moura, A fast linear-arithmetic solver for DPLL(T), in Computer
Aided Verification. LNCS, vol. 4114 (Springer, Berlin, 2006), pp. 81–94
[DM06b] B. Dutertre, L. Moura, The YICES SMT solver (2006). Available at http://
yices.csl.sri.com/
[DP60] M. Davis, H. Putnam, A computing procedure for quantification theory. J. ACM 7,
506–521 (1960)
[Dre04] R. Drechsler, Advanced Formal Verification (Kluwer Academic, Dordrecht, 2004)
[DS07] S. Disch, C. Scholl, Combinational equivalence checking using incremental SAT
solving, output ordering, and resets, in ASP Design Automation Conf. (2007),
pp. 938–943
[DV02] B. Desoete, A. De Vos, A reversible carry-look-ahead adder using control gates.
Integr. VLSI J. 33(1–2), 89–104 (2002)
[EFD05] R. Ebendt, G. Fey, R. Drechsler, Advanced BDD Optimization (Springer, Berlin,
2005)
[ES04] N. Eén, N. Sörensson, An extensible SAT solver, in SAT 2003. LNCS, vol. 2919
(Springer, Berlin, 2004), pp. 502–518
[FDH04] A.G. Fowler, S.J. Devitt, L.C.L. Hollenberg, Implementation of Shor’s algorithm on
a linear nearest neighbour qubit array. Quantum Inf. Comput. 4, 237–245 (2004)
[FSVD06] G. Fey, S. Safarpour, A. Veneris, R. Drechsler, On the relation between simulation-
based and SAT-based diagnosis, in Design, Automation and Test in Europe (2006),
pp. 1139–1144
[FT82] E.F. Fredkin, T. Toffoli, Conservative logic. Int. J. Theor. Phys. 21(3/4), 219–253
(1982)
[FWD10] S. Frehse, R. Wille, R. Drechsler, Efficient simulation-based debugging of re-
versible logic, in Int’l Symp. on Multi-Valued Logic (2010), pp. 156–161
[GAJ06] P. Gupta, A. Agrawal, N.K. Jha, An algorithm for synthesis of reversible logic cir-
cuits. IEEE Trans. CAD 25(11), 2317–2330 (2006)
[GCDD07] D. Große, X. Chen, G.W. Dueck, R. Drechsler, Exact SAT-based Toffoli network
synthesis, in ACM Great Lakes Symposium on VLSI (2007), pp. 96–101
[GD07] V. Ganesh, D.L. Dill, A decision procedure for bit-vectors and arrays, in Computer
Aided Verification (2007), pp. 519–531
[GLMS02] T. Grötker, S. Liao, G. Martin, S. Swan, System Design with SystemC (Kluwer
Academic, Dordrecht, 2002)
[GN02] E. Goldberg, Y. Novikov, BerkMin: a fast and robust SAT-solver, in Design, Au-
tomation and Test in Europe (2002), pp. 142–149
References 179

[GNP08] S. Gay, R. Nagarajan, N. Papanikolaou, QMC: A model checker for quantum sys-
tems, in Computer Aided Verification (2008), pp. 543–547
[GWDD08] D. Große, R. Wille, G.W. Dueck, R. Drechsler, Exact synthesis of elementary quan-
tum gate circuits for reversible functions with don’t cares, in Int’l Symp. on Multi-
Valued Logic (2008), pp. 214–219
[GWDD09a] D. Große, R. Wille, G.W. Dueck, R. Drechsler, Exact multiple control Toffoli net-
work synthesis with SAT techniques. IEEE Trans. CAD 28(5), 703–715 (2009)
[GWDD09b] D. Große, R. Wille, G.W. Dueck, R. Drechsler, Exact synthesis of elementary quan-
tum gate circuits. J. Mult.-Valued Log. Soft Comput. 15(4), 270–275 (2009)
[HK00] D.W. Hoffmann, T. Kropf, Efficient design error correction of digital circuits, in
Int’l Conf. on Comp. Design (2000), pp. 465–472
[HL05] F.S. Hillier, G.J. Lieberman, Introduction to Operations Research (McGraw-Hill,
New York, 2005)
[HSY+06] W.N.N. Hung, X. Song, G. Yang, J. Yang, M. Perkowski, Optimal synthesis of
multiple output Boolean functions using a set of quantum gates by symbolic reach-
ability analysis. IEEE Trans. CAD 25(9), 1652–1663 (2006)
[IKY02] K. Iwama, Y. Kambayashi, S. Yamashita, Transformation rules for designing
CNOT-based quantum circuits, in Design Automation Conf. (2002), pp. 419–424
[JFWD10] J.C. Jung, S. Frehse, R. Wille, R. Drechsler, Enhancing debugging of multiple miss-
ing control errors in reversible logic, in Great Lakes Symp. VLSI (2010)
[JSWD09] J.C. Jung, A. Sülflow, R. Wille, R. Drechsler, SWORD v1.0, Satisfiability Modulo
Theories Competition (2009)
[Ker04] P. Kerntopf, A new heuristic algorithm for reversible logic synthesis, in Design
Automation Conf. (2004), pp. 834–837
[Kha08] M.H.A. Khan, Cost reduction in nearest neighbour based synthesis of quantum
boolean circuits. Eng. Lett. 16, 1–5 (2008)
[Kro99] T. Kropf, Introduction to Formal Hardware Verification (Springer, Berlin, 1999)
[Kut06] S.A. Kutin, Shor’s algorithm on a nearest-neighbor machine, in Asian Conference
on Quantum Information Science (2006). arXiv:quant-ph/0609001v1
[Lan61] R. Landauer, Irreversibility and heat generation in the computing process. IBM J.
Res. Dev. 5, 183 (1961)
[Lar92] T. Larrabee, Test pattern generation using Boolean satisfiability. IEEE Trans. CAD
11, 4–15 (1992)
[LCC+95] C.-C. Lin, K.-C. Chen, S.-C. Chang, M. Marek-Sadowska, K.-T. Cheng, Logic syn-
thesis for engineering change, in Design Automation Conf. (1995), pp. 647–651
[LL92] H. Liaw, C. Lin, On the OBDD-representation of general Boolean functions. IEEE
Trans. Comput. 41, 661–664 (1992)
[LSU89] R. Lipsett, C. Schaefer, C. Ussery, VHDL: Hardware Description and Design
(Kluwer Academic, Dordrecht, 1989)
[Mar99] J.P. Marques-Silva, The impact of branching heuristics in propositional satisfiability
algorithms, in 9th Portuguese Conference on Artificial Intelligence (EPIA) (1999)
[Mas07] D. Maslov, Linear depth stabilizer and quantum fourier transformation circuits with
no auxiliary qubits in finite neighbor quantum architectures. Phys. Rev. 76, 052310
(2007)
[McM02] K.L. McMillan, Applying SAT methods in unbounded symbolic model checking,
in Computer Aided Verification (2002), pp. 250–264
[MD04a] D. Maslov, G.W. Dueck, Improved quantum cost for n-bit Toffoli gates. IEE Elec-
tron. Lett. 39, 1790 (2004)
[MD04b] D. Maslov, G.W. Dueck, Reversible cascades with minimal garbage. IEEE Trans.
CAD 23(11), 1497–1509 (2004)
[MDM05] D. Maslov, G.W. Dueck, D.M. Miller, Toffoli network synthesis with templates.
IEEE Trans. CAD 24(6), 807–817 (2005)
[MDM07] D. Maslov, G.W. Dueck, D.M. Miller, Techniques for the synthesis of reversible
Toffoli networks. ACM Trans. Des. Autom. Electron. Syst. 12(4), 42 (2007)
180 References

[MDW09] D.M. Miller, G.W. Dueck, R. Wille, Synthesising reversible circuits from irre-
versible specifications using Reed-Muller spectral techniques, in Int’l Workshop on
Applications of the Reed-Muller Expansion in Circuit Design (2009), pp. 87–96
[Mer93] R.C. Merkle, Reversible electronic logic using switches. Nanotechnology 4, 21–40
(1993)
[Mer07] N.D. Mermin, Quantum Computer Science: An Introduction (Cambridge University
Press, Cambridge, 2007)
[MK04] M.M. Mano, C.R. Kime, Logic and Computer Design Fundamentals (Pearson Ed-
ucation, Upper Saddle River, 2004)
[ML01] J.P. McGregor, R.B. Lee, Architectural enhancements for fast subword permuta-
tions with repetitions in cryptographic applications, in Int’l Conf. on Comp. Design
(2001), pp. 453–461
[MMD03] D.M. Miller, D. Maslov, G.W. Dueck, A transformation based algorithm for re-
versible logic synthesis, in Design Automation Conf. (2003), pp. 318–323
[MMZ+01] M.W. Moskewicz, C.F. Madigan, Y. Zhao, L. Zhang, S. Malik, Chaff: Engineering
an efficient SAT solver, in Design Automation Conf. (2001), pp. 530–535
[Moo65] G.E. Moore, Cramming more components onto integrated circuits. Electronics
38(8), 114–117 (1965)
[MS99] J.P. Marques-Silva, K.A. Sakallah, GRASP: A search algorithm for propositional
satisfiability. IEEE Trans. Comput. 48(5), 506–521 (1999)
[MT06] D.M. Miller, M.A. Thornton, QMDD: A decision diagram structure for reversible
and quantum circuits, in Int’l Symp. on Multi-Valued Logic (2006), p. 6
[MT08] D.M. Miller, M.A. Thornton, Multiple-Valued Logic: Concepts and Representa-
tions (Morgan and Claypool, San Rafael, 2008)
[MWD09] D.M. Miller, R. Wille, G. Dueck, Synthesizing reversible circuits for irreversible
functions, in EUROMICRO Symp. on Digital System Design (2009), pp. 749–756
[MWD10] D.M. Miller, R. Wille, R. Drechsler, Reducing reversible circuit cost by adding
lines, in Int’l Symp. on Multi-Valued Logic (2010)
[MYDM05] D. Maslov, C. Young, G.W. Dueck, D.M. Miller, Quantum circuit simplification
using templates, in Design, Automation and Test in Europe (2005), pp. 1208–1213
[NC00] M. Nielsen, I. Chuang, Quantum Computation and Quantum Information (Cam-
bridge Univ. Press, Cambridge, 2000)
[OWDD10] S. Offermann, R. Wille, G.W. Dueck, R. Drechsler, Synthesizing multiplier in re-
versible logic, in IEEE Symp. on Design and Diagnostics of Electronic Circuits and
Systems (2010), pp. 335–340
[Pap93] C.H. Papadimitriou, Computational Complexity (Addison Wesley, Reading, 1993)
[PBG05] M.R. Prasad, A. Biere, A. Gupta, A survey of recent advances in SAT-based formal
verification. Softw. Tools Technol. Transf. 7(2), 156–173 (2005)
[PBL05] M. Perkowski, J. Biamonte, M. Lukac, Test generation and fault localization for
quantum circuits, in Int’l Symp. on Multi-Valued Logic (2005), pp. 62–68
[Per85] A. Peres, Reversible logic and quantum computers. Phys. Rev. A 32, 3266–3276
(1985)
[PFBH05] I. Polian, T. Fiehn, B. Becker, J.P. Hayes, A family of logical fault models for
reversible circuits, in Asian Test Symp. (2005), pp. 422–427
[PHM04] K.N. Patel, J.P. Hayes, I.L. Markov, Fault testing for reversible circuits. IEEE Trans.
CAD 23(8), 1220–1230 (2004)
[PHW06] A. De Pierro, C. Hankin, H. Wiklicky, Reversible combinatory logic. Math. Struct.
Comput. Sci. 16(4), 621–637 (2006)
[Pit99] A.O. Pittenger, An Introduction to Quantum Computing Algorithms (Birkhauser,
Basel, 1999)
[PMH08] K. Patel, I. Markov, J. Hayes, Optimal synthesis of linear reversible circuits. Quan-
tum Inf. Comput. 8(3–4), 282–294 (2008)
[RO08] M. Ross, M. Oskin, Quantum computing. Commun. ACM 51(7), 12–13 (2008)
[Rud93] R. Rudell, Dynamic variable ordering for ordered binary decision diagrams, in Int’l
Conf. on CAD (1993), pp. 42–47
References 181

[SD96] J.A. Smolin, D.P. DiVincenzo, Five two-bit quantum gates are sufficient to imple-
ment the quantum fredkin gate. Phys. Rev. A 53(4), 2855–2856 (1996)
[SDF04] S. Sutherland, S. Davidmann, P. Flake, System Verilog for Design and Modeling
(Kluwer Academic, Dordrecht, 2004)
[SFBD08] A. Sülflow, G. Fey, R. Bloem, R. Drechsler, Using unsatisfiable cores to debug
multiple design errors, in Great Lakes Symp. VLSI (2008), pp. 77–82
[Sha38] C.E. Shannon, A symbolic analysis of relay and switching circuits. Trans. AIEE 57,
713–723 (1938)
[Sho94] P.W. Shor, Algorithms for quantum computation: discrete logarithms and factoring,
in Foundations of Computer Science (1994), pp. 124–134
[SL00] Z. Shi, R.B. Lee, Bit permutation instructions for accelerating software cryptogra-
phy, in Int’l Conf. on Application-Specific Systems, Architectures, and Processors
(2000), pp. 138–148
[Som01] F. Somenzi, CUDD: Cu Decision Diagram Package Release 2.3.1 (University of
Colorado at Boulder, Boulder, 2001)
[SPMH03] V.V. Shende, A.K. Prasad, I.L. Markov, J.P. Hayes, Synthesis of reversible logic
circuits. IEEE Trans. CAD 22(6), 710–722 (2003)
[SSL+92] E. Sentovich, K. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj,
P. Stephan, R. Brayton, A. Sangiovanni-Vincentelli, SIS: A system for sequential
circuit synthesis. Technical Report, University of Berkeley (1992)
[SVAV05] A. Smith, A.G. Veneris, M.F. Ali, A. Viglas, Fault diagnosis and logic debugging
using boolean satisfiability. IEEE Trans. CAD 24(10), 1606–1621 (2005)
[TG08] M.K. Thomson, R. Glück, Optimized reversible binary-coded decimal adders.
J. Syst. Archit. 54, 697–706 (2008)
[TK05] Y. Takahashi, N. Kunihiro, A linear-size quantum circuit for addition with no ancil-
lary qubits. Quantum Inf. Comput. 5, 440–448 (2005)
[Tof80] T. Toffoli, Reversible computing, in Automata, Languages and Programming, ed.
by W. de Bakker, J. van Leeuwen (Springer, Berlin, 1980), p. 632. Technical Memo
MIT/LCS/TM-151, MIT Lab. for Comput. Sci.
[TS05] H. Thapliyal, M.B. Srinivas, The need of DNA computing: reversible designs of
adders and multipliers using fredkin gate, in SPIE (2005)
[Tse68] G. Tseitin, On the complexity of derivation in propositional calculus, in Studies
in Constructive Mathematics and Mathematical Logic, Part 2 (Nauka, Leningrad,
1968), pp. 115–125. (Reprinted in: J. Siekmann, G. Wrightson (eds.), Automation
of Reasoning, vol. 2, Springer, Berlin, 1983, pp. 466–483)
[VH99] A. Veneris, I.N. Hajj, Design error diagnosis and correction via test vector simula-
tion. IEEE Trans. CAD 18(12), 1803–1816 (1999)
[VMH07] G.F. Viamontes, I.L. Markov, J.P. Hayes, Checking equivalence of quantum circuits
and states, in Int’l Conf. on CAD (2007), pp. 69–74
[VSB+01] L.M.K. Vandersypen, M. Steffen, G. Breyta, C.S. Yannoni, M.H. Sherwood,
I.L. Chuang, Experimental realization of Shor’s quantum factoring algorithm us-
ing nuclear magnetic resonance. Nature 414, 883 (2001)
[WD09] R. Wille, R. Drechsler, BDD-based synthesis of reversible logic for large functions,
in Design Automation Conf. (2009), pp. 270–275
[WD10] R. Wille, R. Drechsler, Effect of BDD optimization on synthesis of reversible and
quantum logic. Electron. Notes Theor. Comput. Sci. 253(6), 57–70 (2010). Pro-
ceedings of the Workshop on Reversible Computation (RC 2009)
[Weg00] I. Wegener, Branching Programs and Binary Decision Diagrams: Theory and Ap-
plications (Society for Industrial and Applied Mathematics, Philadelphia, 2000)
[WFG+07] R. Wille, G. Fey, D. Große, S. Eggersglüß, R. Drechsler, SWORD: A SAT like
prover using word level information, in VLSI of System-on-Chip (2007), pp. 88–93
[WFG+09] R. Wille, G. Fey, D. Große, S. Eggersglüß, R. Drechsler, SWORD: A SAT like
prover using word level information, in VLSI-SoC: Advanced Topics on Systems
on a Chip: A Selection of Extended Versions of the Best Papers of the Fourteenth
182 References

International Conference on Very Large Scale Integration of System on Chip, ed.


by R. Reis, V. Mooney, P. Hasler (Springer, Berlin, 2009), pp. 175–192
[WG07] R. Wille, D. Große, Fast exact Toffoli network synthesis of reversible logic, in Int’l
Conf. on CAD (2007), pp. 60–64
[WGDD09] R. Wille, D. Große, G. Dueck, R. Drechsler, Reversible logic synthesis with output
permutation, in VLSI Design (2009), pp. 189–194
[WGF+09] R. Wille, D. Große, S. Frehse, G.W. Dueck, R. Drechsler, Debugging of Toffoli
networks, in Design, Automation and Test in Europe (2009), pp. 1284–1289
[WGHD09] R. Wille, D. Große, F. Haedicke, R. Drechsler, SMT-based stimuli generation in
the SystemC verification library, in Forum on Specification and Design Languages
(2009)
[WGMD09] R. Wille, D. Große, D.M. Miller, R. Drechsler, Equivalence checking of reversible
circuits, in Int’l Symp. on Multi-Valued Logic (2009), pp. 324–330
[WGSD08] R. Wille, D. Große, M. Soeken, R. Drechsler, Using higher levels of abstraction for
solving optimization problems by Boolean satisfiability, in IEEE Annual Sympo-
sium on VLSI (2008), pp. 411–416
[WGT+08] R. Wille, D. Große, L. Teuber, G.W. Dueck, R. Drechsler, RevLib: an online re-
source for reversible functions and reversible circuits, in Int’l Symp. on Multi-
Valued Logic (2008), pp. 220–225. RevLib is available at http://www.revlib.org
[WKS01] J. Whittemore, J. Kim, K. Sakallah, SATIRE: A new incremental satisfiability en-
gine, in Design Automation Conf. (2001), pp. 542–545
[WLDG08] R. Wille, H.M. Le, G.W. Dueck, D. Große, Quantified synthesis of reversible logic,
in Design, Automation and Test in Europe (2008), pp. 1015–1020
[WLTK08] S.A. Wang, C.Y. Lu, I.M. Tsai, S.Y. Kuo, An XQDD-based verification method for
quantum circuits. IEICE Trans. 91-A(2), 584–594 (2008)
[WOD10] R. Wille, S. Offermann, R. Drechsler, SyReC: A programming language for synthe-
sis of reversible circuits, in Forum on Specification and Design Languages (2010)
[WSD08] R. Wille, A. Sülflow, R. Drechsler, SWORD v0.2—Module-based SAT Solving,
Satisfiability Modulo Theories Competition (2008)
[WSD09] R. Wille, M. Saeedi, R. Drechsler, Synthesis of reversible functions beyond gate
count and quantum cost, in Int’l Workshop on Logic Synth. (2009), pp. 43–49
[WSD10] R. Wille, M. Soeken, R. Drechsler, Reducing the number of lines in reversible cir-
cuits, in Design Automation Conf. (2010)
[Yan91] S. Yang, Logic synthesis and optimization benchmarks user guide. Technical Report
1/95, Microelectronic Center of North Carolina (1991)
[YG07] T. Yokoyama, R. Glück, A reversible programming language and its invertible self-
interpreter, in Symp. on Partial Evaluation and Semantics-based Program Manipu-
lation (2007), pp. 144–153
[YPA06] J. Yuan, C. Pixley, A. Aziz, Constraint-based Verification (Springer, Berlin, 2006)
[YSHP05] G. Yang, X. Song, W.N.N. Hung, M.A. Perkowski, Fast synthesis of exact minimal
reversible circuits using group theory, in ASP Design Automation Conf. (2005),
pp. 1002–1005
[YSP+99] J. Yuan, K. Shultz, C. Pixley, H. Miller, A. Aziz, Modeling design constraints and
biasing in simulation using BDDs, in Int’l Conf. on CAD (1999), pp. 584–590
[ZM06] J. Zhong, J.C. Muzio, Using crosspoint faults in simplifying Toffoli networks, in
IEEE North-East Workshop on Circuits and Systems (2006), pp. 129–132
[ZSM+05] J. Zhang, S. Sinha, A. Mishchenko, R. Brayton, M. Chrzanowska-Jeske, Simula-
tion and satisfiability in logic synthesis, in Int’l Workshop on Logic Synth. (2005),
pp. 161–168
Index

A Decomposition
ALU, 48 NNC-optimal (exact), 134
Arithmetic Logic Unit, see ALU NNC-optimal (improved), 134
NNC-optimal (naive), 133
B quantum, 16
BDD, 17 Shannon, 17
Binary Decision Diagram, see BDD Double gates, 16
Bit-vector logic, 23
Boolean function, 7 E
multi-output, 8 Embedding, 28, 93
reversible, 8 Equivalence checking, 145
single-output, 7 QMDD-based, 145
Boolean satisfiability, 21 SAT-based, 148
Error candidate, 155
C Error location, 162
Circuit Error models, 155
quantum, 14 Exact synthesis, 57
reversible, 10 QBF-based, 81
traditional, 9 SAT-based, 58, 61
SMT-based, 77
Circuit composition, 124
SWORD-based, 79
Circuit cost, 12
CNF, 21 F
CNOT gate, 10 Factoring reversible circuits, 115
Complement edge, 18, 36 Fixing, 165
Conjunctive Normal Form, see CNF Fredkin gate, 10
Constant input, 9
Control line, 10 G
Cost, 12 Garbage output, 9, 29
Counterexample, 145 Gate
CNOT, 10, 14
D double, 16
Debugging, 155 Fredkin, 10
reversible, 157 NOT, 10, 14
traditional, 156 Peres, 10

R. Wille, R. Drechsler, Towards a Design Flow for Reversible Logic, 183


DOI 10.1007/978-90-481-9579-4, © Springer Science+Business Media B.V. 2010
184 Index

Gate (cont.) QMDD, 19


quantum, 14 Quantified Boolean Formulas, see QBF
reversible, 10 Quantum computation, 14
SWAP, 11 Quantum cost, 12
Toffoli, 10 Quantum gate, 14
traditional, 9 Quantum Multiple-valued Decision Diagram,
V, 14 see QMDD
V+, 15 Qubit, 14
Gate count, 12
S
H SAT Modulo Theories, see SMT
Helper line, 114 SAT problem, 21
Heuristic synthesis, see Synthesis SAT solver, 22
Select
I line in debugging, 157
Inverter, see NOT gate Shared node, 18, 34
SMT, 23
J SMT solver, 23
Janus, 47 Superposition, 13
SWAP gate, 11
L
SWOP, 100
Linear Nearest Neighbor, see LNN
SWORD solver, 24
LNN, 131
Synthesis, 27
BDD-based, 31
M
QBF-based, 81
Modules (SWORD), 24
SAT-based, 58, 61
N SMT-based, 77
Nearest Neighbor Cost, see NNC SWORD-based, 79
NNC, 131, 133 SyReC-based, 46
NOT gate, 10, 14 transformation-based, 30
with output permutation, see SWOP
O
Ordering T
BDD, 18, 37 Target line, 10
Toffoli gate, 10
P Transistor cost, 13
Peres gate, 10
V
Q V gate, 14
QBF, 24 V+ gate, 15
QF_BV, see Bit-vector logic Verification, 143

You might also like