23 views

Uploaded by Muhammad Waqas

introdution to computer theory...

save

- Inheritance
- Flat It Gate 2
- CR11 Warrants
- 5_re1
- MELJUN CORTES AUTOMATA_LECTURE_Equivalence of Regular Expressions and Finite Automata_Part 2_1
- Theory of Computation Slides
- 4-nfa
- 10.1.1.37.307
- r7311502-Automata and Compiler Design
- These Spec Relationnelles En
- What Happened to Formal Methods
- MBK on Meillassoux
- Afrikanization Is the Real Digitalization
- cd0309
- hw1
- Logic Module
- 14. Logical Notation, Ackytoffvuxomi
- CLAIMS.pdf
- Mathematical
- Basic Binary Basic Built in Behavior Resulting in Reality
- Vogel1
- MinorCSE
- Modeling Simulation and Operation Research
- GreekG EometricaAl Nafysis - Norman Gulley
- Aristotle on Practical Truth: Coherence vs. Correspondence?
- RatioSyllabusLogicaSymTrad.doc
- Calculation Thinking Computational Thinking
- Nils Bertschinger - Autonomy An Information – Theoretic Perspective
- Pneutrônica - Fluidsim e CLP - Conveyor Station
- Such a Constraint Means That It is Not Possible for All A1,...,An to Be True and at the Same Time All Of
- Sapiens: A Brief History of Humankind
- The Unwinding: An Inner History of the New America
- Yes Please
- Dispatches from Pluto: Lost and Found in the Mississippi Delta
- Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
- The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
- John Adams
- Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
- A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
- This Changes Everything: Capitalism vs. The Climate
- Grand Pursuit: The Story of Economic Genius
- The Emperor of All Maladies: A Biography of Cancer
- The Prize: The Epic Quest for Oil, Money & Power
- Team of Rivals: The Political Genius of Abraham Lincoln
- The New Confessions of an Economic Hit Man
- The World Is Flat 3.0: A Brief History of the Twenty-first Century
- Rise of ISIS: A Threat We Can't Ignore
- Smart People Should Build Things: How to Restore Our Culture of Achievement, Build a Path for Entrepreneurs, and Create New Jobs in America
- The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
- How To Win Friends and Influence People
- Steve Jobs
- Angela's Ashes: A Memoir
- Bad Feminist: Essays
- The Light Between Oceans: A Novel
- The Silver Linings Playbook: A Novel
- Leaving Berlin: A Novel
- Extremely Loud and Incredibly Close: A Novel
- The Sympathizer: A Novel (Pulitzer Prize for Fiction)
- You Too Can Have a Body Like Mine: A Novel
- The Incarnations: A Novel
- Life of Pi
- The Love Affairs of Nathaniel P.: A Novel
- A Man Called Ove: A Novel
- Bel Canto
- The Master
- The Rosie Project: A Novel
- The Blazing World: A Novel
- We Are Not Ourselves: A Novel
- The First Bad Man: A Novel
- The Flamethrowers: A Novel
- Brooklyn: A Novel
- The Art of Racing in the Rain: A Novel
- Wolf Hall: A Novel
- The Wallcreeper
- Interpreter of Maladies
- Beautiful Ruins: A Novel
- The Kitchen House: A Novel
- The Perks of Being a Wallflower
- The Bonfire of the Vanities: A Novel
- Lovers at the Chameleon Club, Paris 1932: A Novel
- A Prayer for Owen Meany: A Novel
- The Cider House Rules
- Mislaid: A Novel

You are on page 1of 285

An algorithmic approach

0

Lecture Notes

Javier Esparza

July 20, 2012

2

3

**Please read this!
**

Many years ago — I don’t want to say how many, it’s depressing — I taught a course on the automata-theoretic approach to model checking at the Technical University of Munich, basing it on lectures notes for another course on the same topic that Moshe Vardi had recently taught in Israel. Between my lectures I extended and polished the notes, and sent them to Moshe. At that time he and Orna Kupferman were thinking of writing a book, and the idea came up of doing it together. We made some progress, but life and other work got in the way, and the project has been postponed so many times that it I don’t dare to predict a completion date. Some of the work that got in the way was the standard course on automata theory in Munich, which I had to teach several times. The syllabus contained both automata on ﬁnite and inﬁnite words, and for the latter I used our notes. Each time I had to teach the course again, I took the opportunity to add some new material about automata on ﬁnite words, which also required to reshape the chapters on inﬁnite words, and the notes kept growing and evolving. Now they’ve reached the point where they are in suﬃciently good shape to be shown not only to my students, but to a larger audience. So, after getting Orna and Moshe’s very kind permission, I’ve decided to make them available here. Despite several attempts I haven’t yet convinced Orna and Moshe to appear as co-authors of the notes. But I don’t give up: apart from the material we wrote together, their inﬂuence on the rest is much larger than they think. Actually, my secret hope is that after they see this material in my home page we’ll ﬁnally manage to gather some morsels of time here and there and ﬁnish our joint project. If you think we should do so, tell us! Send an email to: vardi@cs.rice.edu, orna@cs.huji.ac.il, and esparza@in.tum.de.

Sources

I haven’t yet compiled a careful list of the sources I’ve used, but I’m listing here the main ones. I apologize in advance for any omissions. • The chapter on automata for ﬁxed-length languages (“Finite Universes’)’ was very inﬂuenced by Henrik Reif Andersen’s beautiful introduction to Binary Decision Diagrams, available at www.itu.dk/courses/AVA/E2005/bdd-eap.pdf. • The short chapter on pattern matching is inﬂuenced by David Eppstein’s lecture notes for his course on Design and Analysis of Algorithms, see http://www.ics.uci.edu/ eppstein/teach.html. • As mentioned above, the chapters on operations for B¨ chi automata and applications to u veriﬁcation are heavily based on notes by Orna Kupferman and Moshe Vardi. • The chapter on the emptiness problem for B¨ chi automata is based on several research pau pers:

4 – Jean-Michel Couvreur: On-the-Fly Veriﬁcation of Linear Temporal Logic. World Congress on Formal Methods 1999: 253-271 – Jean-Michel Couvreur, Alexandre Duret-Lutz, Denis Poitrenaud: On-the-Fly Emptiness Checks for Generalized Bchi Automata. SPIN 2005: 169-184. – Kathi Fisler, Ranan Fraer, Gila Kamhi, Moshe Y. Vardi, Zijiang Yang: Is There a Best Symbolic Cycle-Detection Algorithm? TACAS 2001:420-434 – Jaco Geldenhuys, Antti Valmari: More eﬃcient on-the-ﬂy LTL veriﬁcation with Tarjan’s algorithm. Theor. Comput. Sci. (TCS) 345(1):60-82 (2005) – Stefan Schwoon, Javier Esparza: A Note on On-the-Fly Veriﬁcation Algorithms. TACAS 2005:174-190. • The chapter on Real Arithmetic is heavily based on the work of Bernard Boigelot, Pierre Wolper, and their co-authors, in particular the paper “An eﬀective decision procedure for linear arithmetic over the integers and reals”, published in ACM. Trans. Comput. Logic 6(3) in 2005.

Acknowledgments

First of all, thanks to Orna Kupferman and Moshe Vardi for all the reasons explained above (if you haven’t read the section “Please read this” yet, please do it now!). Many thanks to J¨ rg Kreiker, o Jan Kretinsky, and Michael Luttenberger for many discussions on the topic of this notes, and for their contributions to several chapters. All three of them helped me to teach the automata course of diﬀerent ocassions. In particular, Jan contributed a lot to the chapter on pattern matching. Breno Faria helped to draw many ﬁgures. He was funded by a program of the Computer Science Department Technical University of Munich. Thanks also to Birgit Engelmann, Moritz Fuchs, Stefan Krusche, Philipp M¨ ller, Martin Perzl, Marcel Ruegenberg, Franz Saller, and Hayk Shoukourian, u all of whom attended the automata course, and provided very helpful comments.

Contents

1 Introduction and Goal 11

I

2

**Automata on Finite Words
**

Automata Classes and Conversions 2.1 Regular expressions: a language to describe languages . . . . . . 2.2 Automata classes . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Using DFAs as data structures . . . . . . . . . . . . . . . 2.3 Conversion Algorithms between Finite Automata . . . . . . . . . 2.3.1 From NFA to DFA. . . . . . . . . . . . . . . . . . . . . . 2.3.2 From NFA-ε to NFA. . . . . . . . . . . . . . . . . . . . . 2.4 Conversion algorithms between regular expressions and automata 2.4.1 From regular expressions to NFA-ε’s . . . . . . . . . . . 2.4.2 From NFA-ε’s to regular expressions . . . . . . . . . . . 2.5 A Tour of Conversions . . . . . . . . . . . . . . . . . . . . . . . Minimization and Reduction 3.1 Minimal DFAs . . . . . . . . . . . . . . . . 3.2 Minimizing DFAs . . . . . . . . . . . . . . . 3.2.1 Computing the language partition . . 3.2.2 Quotienting . . . . . . . . . . . . . . 3.3 Reducing NFAs . . . . . . . . . . . . . . . . 3.3.1 The reduction algorithm . . . . . . . 3.4 A Characterization of the Regular Languages Operations on Sets: Implementations 4.1 Implementation on DFAs . . . . . 4.1.1 Membership. . . . . . . . 4.1.2 Complement. . . . . . . . 4.1.3 Binary Boolean Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

15 15 16 17 20 20 25 28 28 31 34 41 42 45 45 48 50 51 55 61 62 62 62 63

3

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

4

. . . .

. . . .

. . . . 5

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . 4. . . . . . . . . . . . . . . . . . . . 4. . . . . . 4. . . . . . . . . 66 66 67 67 67 68 69 69 72 75 85 85 87 90 95 96 97 98 99 100 106 111 111 113 115 121 122 127 128 131 139 139 142 146 148 149 153 4. . 6. . . .1 Fixed-length Languages and the Master Automaton 7. . . 8. . . 5. . . . . . . . . . . . . and Pre . . . . . .2. . . . . . . . . . . . . . . . . . . .7 Equality. . . . . .3 Implementing Operations on Relations . . . . . . .1. . . . . . 7. . . . . . . . . . .1 Projection . . . . . .3. . . . . . . . . . . . .1 Checking Properties . . . . . . . . . . . . . . . . . . . . . . . . . .2 A Data Structure for Fixed-length Languages . . 6. . . . . . . . . . . . . . . . . . .4 Emptiness and Universality. . . . . . .1. . . . . . . . 5. . . . . . . . . . . . . . . . . . . . .1 Zip-automata and Kernels . . . . . . 7. . . . . . . Post. . . .4 Emptiness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 The word case . . . . . . . . . .4 Safety and Liveness Properties . . . . . . . . . .2 Complement. . . . . 4. . . . . . . . . . . .3. . . . . . . . .1 Membership. . . . . . . . . . . . . . . 6. . . 4. . 8. . . . . . . . . . .1 Encodings . . . . . . . . . . . . . 7 Finite Universes 7. . . . . . . . . . Operations on Relations: Implementations 6.3 The State-Explosion Problem . . . . . . . . . . . . . 7. . . 7. . . .2 Transducers and Regular Relations . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Operations on Kernels . . . . . . . . . . . . . . . . . .3 Union and intersection. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . 6 . . 4. . . . 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Inclusion and Equality. . . .6 Decision Diagrams . . . . . . . . . . . . . . . . . . . .1. . . . 6. Applications II: Veriﬁcation 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. . . . . . . . . . . . . . . .1 Lazy DFAs . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . .6. . . . .1 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . .6. . . .4 Determinization and Minimization . . . . 7. .2. . . . . . . . . . .2 5 Applications I: Pattern matching 5.5 Universality. . . . . . 6. . . .6 Inclusion. . . . . . . . . . . . . . . . . . . . .2 Networks of Automata. . . . . .1 The Automata-Theoretic Approach to Veriﬁcation 8. . . . . . . . . . . . . .4 Relations of Higher Arity . . . . . . . 4. . . . . . . . . . . . . . .1. . . . . 8. . . . . . Implementation on NFAs . . .5 Operations on Fixed-length Relations . . . . . . 7. . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . .1 Symbolic State-space Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2 Join. . . .3 Operations on ﬁxed-length languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 4. . . . . . . . CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .2 Algorithms based on breadth-ﬁrst search 13. . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . . . . .4 Other classes of ω-automata . . . . . . . . .1 Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Applications III: Presburger Arithmetic 10. .3. . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Non-equivalence of NBA and DBA . . . . . . . . . . . . . . 12. . . . . . 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Syntax and Semantics . . . .1 ω-languages and ω-regular expressions . . . . . . . . . . . . . . . .1 Emerson-Lei’s algorithm . . . . . . . . . . . . . . . . . .2 Rankings and ranking levels . . . . . . . . . . . . . . . . . .2 Muller automata . . . . . . u 11. . 10. . . . . . . . . . . . . . . . . . . . . . . . . .1 Co-B¨ chi Automata . . 13. . . . . . . . . . . . . . . . .2. . . 10. . . . .1 Union and intersection . . 11. . . . . . . . .2 Monadic Second-Order Logic on Words 9. . . . 12 Boolean operations: Implementations 12.2. .CONTENTS 9 Automata and Logic 9. . . . . . . . . . . . . . . .2.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13. . . . . . . . . . . . . .1 Algorithms based on depth-ﬁrst search . 13 Emptiness check: Implementations 13. . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . .1 The problems of complement . . . . .4. . . . . . . .2 B¨ chi automata . . 7 159 159 162 163 164 179 179 181 185 186 189 190 . . .4. . . . .1 The nested-DFS algorithm . . . . . . . . . 9. . . . . . . . 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. . .2. . . . . . . 12. . . . . . . .1. . . . . . . . . .2.2 Complement . . . .3 An NFA for the Solutions over the Integers. . . . . . . . . . . . . .1 Expressive power of MSO(Σ) .2 An NFA for the Solutions over the Naturals. . . . . . . . . . . . . . . . . . . . . . . . . .1 From ω-regular expressions to NBAs and back 11. . . . .3 Rabin automata . . 9. . . . . . . . . . . . . . . . . . . . . .1 Equations . . . . . .4 The size of A . . . . . . . . . . . . . .3 A (possible inﬁnite) complement automaton 12. . 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Algorithms . . u 11. . . . . . . . . . . . . . . . . . . . . . . . .1 Expressive power of FO(Σ) . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . .3 Generalized B¨ chi automata . . . . . 11. 193 195 195 196 198 200 201 203 203 206 209 215 215 218 218 220 221 226 229 229 232 238 250 251 11 Classes of ω-Automata and Conversions 11. . . . . . . u 11. . 12. . . . . . . . . . . . . . . . . . 11. . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . .2 The two-stack algorithm . . . . . . . . . . . . . . . . . . .2. . . . . . . . .2. . . . . . . . . . . . II Automata on Inﬁnite Words . . . . . . . . . . . .1 First-Order Logic on Words . . . . . . . . . 13. . . . . . . . . . . .

15. 14. . . .1 Expressive power of MSO(Σ) on ω-words . . . .1. . . . . . . . . . . . . . . . . . . . . . 14. . . . .2. . . . . . . . . .1. 14. . . . . .3 Reducing the NGA . . . 15. . . . . . . . . . . . . . . . .2 Constructing the NGA . . . . . . . . . . . . . . . . 15.3 Comparing the algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 15. . . . 259 259 260 263 266 266 269 271 272 273 277 277 278 279 279 280 282 15 Applications II: Monadic Second-Order Logic and Linear Arithmetic 15. . 14. . . . . . . . . . . . . .8 CONTENTS 13. . .3. . . . . . . . . . . . . . .2. . . . . . . . .2 Linear Temporal Logic . . . . .3.4 Size of the NGA . . . . . . . . . . . . . . . . . . . . . . .1 Automata-Based Veriﬁcation of Liveness Properties .1 Monadic Second-Order Logic on ω-Words .1 Checking Liveness Properties . . . . . . . . . . . . . . . . . . . . . . . . . .3 From LTL formulas to generalized B¨ chi automata . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Linear Arithmetic . . . . . . . . . . . . . 14. . . . . . . . . . . . . . . . . . . . . . . .1 Encoding Real Numbers . . . . . . . . . . u 14. 255 14 Applications I: Veriﬁcation and Temporal Logic 14. . . . . . . . .2 A Modiﬁed Emerson-Lei’s algorithm . . . . . . . . . . . . . . 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . .3. . . . . . . . . . .3 Constructing an NBA for the Real Solutions . . 14. . . . . . . . . . . . .1 A NBA for the Solutions of a · xF ≤ β .1 Satisfaction sequences and Hintikka sequences 14. 253 13. . . .3. . .4 Automatic Veriﬁcation of LTL Formulas .

and closure properties. automata for semantic descriptions can easily be many orders of magnitude larger. automata are the right data structure for represent sets and relations when the required operations are union. the equivalent DFA is mathematically deﬁned by describing its set of states. While automata for lexical or syntactical analysis typically have at most some thousands of states. and so automata were viewed as abstract machines that process inputs and accept or reject them.Why this book? There are excellent textbooks on automata theory. equivalences between models. projections and 9 . like the powerset or product construction. and algorithmic issues gradually received more attention.. complement. While results about the expressive power of machines. ranging from course books for undergraduates to research monographies for specialists. Automata started to be used to describe the behaviours—and intended behaviour—of hardware and software systems. To give a simple example. in many textbooks of the time—and in later textbooks written in the same style—the powerset construction is not introduced as an algorithm that constructs a DFA equivalent to a given NFA. often played a subordinate rˆ le o as proof tools. Just as hash tables and Fibonacci heaps are both adequate data structures for representing sets depending when the operations one needs are those of a dictionary or a priority queue. and compilers. While these research developments certainly inﬂuenced textbooks. were devoted much attention. not their syntactic descriptions. constructions on automata. and research in this direction made great progress. This view deeply inﬂuenced the textbook presentation of the theory. parsers. Eﬃcient constructions and algorithmic issues became imperative. transition relation. Since the middle 1980s program veriﬁcation has progressively replaced compilers as the driving application of automata theory. like pumping lemmas. This book intends to reﬂect the evolution of automata theory. etc. Why another one? During the late 1960s and early 1970s the main application of automata theory was the development of lexicographic analyzers. Automata on inﬁnite words can hardly be seen as machines accepting or rejecting an input: they could only do so after inﬁnite time! And the view automata as abstract machines puts an emphasis expressibility that no longer corresponds to its relevance for modern applications. Sometimes. instead. undergraduate texts still retain many concepts from the past. We think that modern automata theory is better captured by the slogan automata as data structures. automata on inﬁnite words and the connection between logic and automata became more relevant. the simple but computationally important fact that only states reachable from the initial state need be constructed is not mentioned. intersection. This shift from syntax to semantics had important consequences. At the same time.

In this view the algorithmic implementation of the operations must get the limelight. non-trivial examples whose graphical representation still ﬁts in one page. Automata on words are blessed with a graphical representation of instantaneous appeal. . and that examples are best presented with the help of pictures. they constitute the spine of this book. Experience tells that automatatheoretic constructions are best explained by means of examples.10 CONTENTS joins. and. A second goal of this book is concerned with presentation. In this book we have invested much eﬀort into ﬁnding illustrative. as a consequence.

but with respect to a diﬀerent set of operations: the boolean operations of set theory (union. operations on n-ary relations for n ≥ 3 can be reduced to operations on binary relations. Y) Union(X. where U denotes some universe of objects. Typical representations are hash tables. intersection. returns true if X = Y. These lecture notes also deal with the problem of representing and manipulating sets. x is an element of U. X. returns the set π1 (R) = {x | ∃y (x. can be reduced to the ones above. deletion. and operations on relations. Y are subsets of U. Y) Projection 1(R) Projection 2(R) Join(R. false otherwise. returns the set π2 (R) = {y | ∃x (x. R) Pre(X. Here is a more precise deﬁnition of the operations to be supported. z) ∈ S } returns postR (X) = {y ∈ U | ∃x ∈ X (x. false otherwise. false otherwise. S ⊆ U × U are binary relations on U: Member(x. and R. returns X ∩ Y. we wish to have a data 11 . and complement with respect to some universe set). e. y) ∈ R ∧ (y. or if it is contained in another one). returns true if X = ∅. if it contains all elements of the universe. S ) Post(X. false otherwise. An important point is that we are not only interested on ﬁnite sets. returns X ∪ Y. X) Complement(X) Intersection(X. z) | ∃y ∈ X (x. returns U \ X. Y) Empty(X) Universal(X) Included(X. false otherwise. y) ∈ R}. Y) Equal(X. lookup. y) ∈ R}.g. R) : : : : : : : : : : : : : returns true if x ∈ X. search trees. or heaps. returns true if X ⊆ Y. Observe that many other operations. returns true if X = U. x) ∈ R}. returns R ◦ S = {(x.Chapter 1 Introduction and Goal Courses on data structures show how to represent sets of objects in a computer so that operations like insertion. plus some operations that check basic properties (if a set is empty. returns preR (X) = {y ∈ U | ∃x ∈ X (y. set diﬀerence. Similarly. and many others can be eﬃciently implemented. y) ∈ R}.

Typical examples are real numbers and non-terminating executions of a program.12 CHAPTER 1. This gives us a structure for Part II of the notes: we follow the steps of Part I. Natural numbers. INTRODUCTION AND GOAL structure able to deal with inﬁnite sets over some inﬁnite universe. However. So we shorten word automaton to just automaton. computers usually only represent some approximation: a ﬂoat instead of a real number. and their theory does not achieve the same degree of “perfection”. In the second part of the notes we show how to represent sets of inﬁnite objects exactly using automata on inﬁnite words. like ASCII or Unicode. at least in principle. as words over the alphabet of digits. which. is the best one available for most purposes. and at which cost).. represent and manipulate sets whose elements are encoded as words. are represented in computer science as sequences of digits. for instance. but every data structure mapping sets to ﬁnite representations only has countably many instances. not every object can be represented by a ﬁnite word. as shown by 50 years of research on the theory of formal languages. The theory of tree automata is also very well developed. There are generalizations of word automata in which objects are encoded as trees.e. As a matter of fact. a word of ﬁnite length. So word automata are a very general data structure. whenever a computer stores an object in a ﬁle. Vectors and lists can also be represented as words by concatenating the word representations of their elements. (Loosely speaking.. or just automata. However. while any object can be represented by a word. Any kind of object can be represented by a word. always comparing the solutions for inﬁnite words with the “gold standard”.e. but not the subject of these notes. While the theory of automata on ﬁnite words is often considered a “gold standard” of theoretical computer science—a powerful and beautiful theory with lots of important applications in many ﬁelds—automata on inﬁnite words are harder. a simple cardinality argument shows that no data structure can provide ﬁnite representations of all inﬁnite sets: an inﬁnite universe has uncountably many subsets. as sequences of letters over an alphabet1 . When objects cannot be represented by ﬁnite words. These notes present the compromise oﬀered by word automata. i. i. 1 .) Because of this limitation every good data structure for inﬁnite sets must ﬁnd a reasonable compromise between expressibility (how large is the set of representable sets) and manipulability (which operations can be carried out. that is. or a ﬁnite preﬁx instead of a non-terminating computation. there are more sets to be represented than representations available. the computer is representing it as a word over some alphabet. Word automata.

Part I Automata on Finite Words 13 .

Chapter 2

**Automata Classes and Conversions
**

In Section 2.1 we introduce basic deﬁnitions about words and languages, and then introduce regular expressions, a textual notation for deﬁning languages of ﬁnite words. Like any other formal notation, it cannot be used to deﬁne each possible language. However, the next chapter shows that they are an adequate notation when dealing with automata, since they deﬁne exactly the languages that can be represented by automata on words.

2.1

Regular expressions: a language to describe languages

An alphabet is a ﬁnite, nonempty set. The elements of an alphabet are called letters. A ﬁnite, possibly empty sequence of letters is a word. A word a1 a2 . . . an has length n. The empty word is the only word of length 0 and it is written ε. The concatenation of two words w1 = a1 . . . an and w2 = b1 . . . bm is the word w1 w2 = a1 . . . an b1 . . . bm , sometimes also denoted by w1 · w2 . Notice that ε · w = w = w · ε = w. For every word w, we deﬁne w0 = ε and wk+1 = wk w. Given an alphabet Σ, we denote by Σ∗ the set of all words over Σ. A set L ⊆ Σ∗ of words is a language over Σ. The complement of a language L is the language Σ∗ \ L, which we often denote by L (notice that this notation implicitly assumes the alphabet Σ is ﬁxed). The concatenation of two languages L1 and L2 is L1 · L2 = {w1 w2 ∈ Σ∗ | w1 ∈ L1 , w2 ∈ L2 }. The iteration of a language L ⊆ Σ∗ is the language L∗ = i≥0 Li , where L0 = {ε} and Li+1 = Li · L for every i ≥ 0. Deﬁnition 2.1 Regular expressions r over an alphabet Σ are deﬁned by the following grammar, where a ∈ Σ r ::= ∅ | ε | a | r1 r2 | r1 + r2 | r∗ The set of all regular expressions over Σ is written RE(Σ). The language L(r) ⊆ Σ∗ of a regular expression r ∈ RE(Σ) is deﬁned inductively by 15

16 L(∅) = ∅ L(ε) = {ε} L(a) = {a}

CHAPTER 2. AUTOMATA CLASSES AND CONVERSIONS L(r1 r2 ) = L(r1 ) · L(r2 ) L(r1 + r2 ) = L(r1 ) ∪ L(r2 ) L(r∗ ) = L(r)∗

A language L is regular if there is a regular expression r such that L = L(r). We often abuse language, and write “the language r” instead of “the language L(r).”

2.2

Automata classes

We brieﬂy recapitulate the deﬁnitions of deterministic and nondeterministic ﬁnite automata, as well as nondeterministic automata with ε-transitions and regular expressions. Deﬁnition 2.2 A deterministic automaton (DA) is a tuple A = (Q, Σ, δ, q0 , F), where • Q is a set of states, • Σ is an alphabet, • δ : Q × Σ → Q is a transition function, • q0 ∈ Q is the initial state, and • F ⊆ Q is the set of ﬁnal states. A run of A on input a0 a1 . . . an is a sequence p0 −− p1 −− p2 . . . − − → pn , such that pi ∈ Q for −→ −→ −− 0 ≤ i ≤ n, p0 = q0 , and δ(pi , ai ) = pi+1 for 0 ≤ i < n − 1. A run is accepting if pn ∈ F. A accepts a word w ∈ Σ∗ , if there is an accepting run on input w. The language recognized by A is the set L(A) = {w ∈ Σ∗ | w is accepted by A}. A deterministic ﬁnite automaton (DFA) is a DA with a ﬁnite set of states. Notice that a DA has exactly one run on a given word. Given a DA, we often say “the word w leads from q0 to q”, meaning that the unique run of the DA on the word w ends at the state q. Graphically, non-ﬁnal states of a DFA are represented by circles, and ﬁnal states by double circles (see the example below). The transition function is represented by labeled directed edges: if δ(q, a) = q then we draw an edge from q to q labeled by a. We also draw an edge into the initial state. Example 2.3 Figure 2.3 shows the graphical representation of the DFA A = (Q, Σ, δ, q0 , F), where Q = {q0 , q1 , q2 , q3 }, Σ = {a, b}, F = {q0 }, and δ is given by the following table δ(q0 , a) = q1 δ(q1 , a) = q0 δ(q2 , a) = q3 δ(q3 , a) = q2 δ(q0 , b) = q3 δ(q1 , b) = q2 δ(q2 , b) = q1 δ(q3 , b) = q0

a0 a1 an−1

2.2. AUTOMATA CLASSES

a q0 a b b a q3 a q2 b b q1

17

Figure 2.1: A DFA

2.2.1

Using DFAs as data structures

Here are four examples of how DFAs can be used to represent interesting sets of objects. Example 2.4 The DFA of Figure 2.5 recognizes the strings over the alphabet {−, · , 0, 1, . . . , 9} that encode real numbers with a ﬁnite decimal part. We wish to exclude 000, −0, or 3, 10000000, but accept 37, 10.503, or −0.234 as correct encodings. A description of teh strings in English is rather long: a string encoding a number consists of an integer part, followed by a possibly empty fractional part; the integer part consists of an optional minus sign, followed by a nonempty sequence of digits; if the ﬁrst digit of this sequence is 0, then the sequence itself is 0; if the fractional part is nonempty, then it starts with ., followed by a nonempty sequence of digits that does not end with 0; if the integer part is −0, then the fractional part is nonempty.

0

0 1, . . . , 9

· ·

1, . . . , 9

0 1, . . . , 9

−

·

1, . . . , 9

0 1, . . . , 9

Figure 2.2: A DFA for decimal numbers

18

CHAPTER 2. AUTOMATA CLASSES AND CONVERSIONS

Example 2.5 The DFA of Figure recognizes the binary encodings of all the multiples of 3. For instance, it recognizes 11, 110, 1001, and 1100, which are the binary encodings of 3, 6, 9, and 12, respectively, but not, say, 10 or 111

0 1 1 0 0 1

Figure 2.3: A DFA for the multiples of 3 in binary Example 2.6 The DFA of Figure 2.4 recognizes all the nonnegative integer solutions of the inequation 2x − y ≤ 2, using the following encoding. The alphabet of the DFA has four letters, namely 0 0 1 1 , , , and . 0 1 0 1 A word like 1 0 1 1 0 0 0 1 0 0 1 1 encodes a pair of numbers, given by the top and bottom rows, 101100 and 010011. The binary encodings start with the least signiﬁcant bit, that is 101100 010011 encodes encodes 20 + 22 + 23 = 13, and 22 + 25 + 26 = 50

We see this as an encoding of the valuation (x, y) := (13, 50). This valuation satisﬁes the inequation, and indeed the word is accepted by the DFA. Example 2.7 Consider the following program with two boolean variables x, y: while x = 1 do if y = 1 then x←0 y←1−x end A conﬁguration of the program is a triple [ , n x , ny ], where ∈ {1, 2, 3, 4, 5} is the current value of the program counter, and n x , ny ∈ {0, 1} are the current values of x and y. The initial conﬁgurations are [1, 0, 0], [1, 0, 1], [1, 1, 0], [1, 1, 1], i.e., all conﬁgurations in which control is at line 1. The DFA of Figure 2.5 recognizes all reachable conﬁgurations of the program. For instance, the DFA accepts [5, 0, 1], indicating that it is possible to reach the last line of the program with values x = 0, y = 1.

1 2 3 4 5

but substituting pi+1 ∈ δ(pi . 0 1 0 0 . q0 .2. q ) such that q = δ(q.8 A non-deterministic automaton (NA) is a tuple A = (Q. or q ∈ δ(q.9 A non-deterministic automaton with ε-transitions (NA-ε) is a tuple A = (Q. ai ) = pi+1 . where Q. where Q. q0 . Acceptance and the language recognized by a NA are deﬁned as for DAs. Σ. Deﬁnition 2.5: A DFA for the reachable conﬁgurations of the program of Example 2. a). 0 1 1 0 0 1 . 0 1 1 0 1 1 0 1 19 Figure 2. The runs of NAs are deﬁned as for DAs. 1 1 Figure 2. 0 1 1 1 . a). q ) such that q ∈ δ(q. 0 1 0 1 0 0 0 1 1 1 . AUTOMATA CLASSES 0 1 0 0 . meaning q = δ(q. 1 4 2 1 5 3 0. and F are as for NAs and . and F are as for DAs and • δ : Q × Σ → P(Q) is a transition relation. 1 0 1 0 1 0 0. a. δ. We often identify the transition function δ of a DA with the set of triples (q. so we often write (q. Σ. Σ. a) for a NA.2.7 Deﬁnition 2. δ. q ) ∈ δ. 0 1 0 1 . q0 . a. A nondeterministic ﬁnite automaton (NFA) is a NA with a ﬁnite set of states. q0 . Σ.4: DFA for the solutions of 2x − y ≤ 2. F). and the transition relation δ of a NFA with the set of triples (q. F). ai ) for δ(pi . a. a) for a DA.

All our algorithms preserve normal forms. δ. A is in normal form −→ −→ if every state is reachable from the initial state. possibly leading to diﬀerent states. it suﬃces to show that every language recognized by an NFA-ε can also be recognized by an NFA. and every language recognized by an NFA can also be recognized by a DFA. Denote by . A state q ∈ Q is reachable from q ∈ Q an a1 if q = q or if there exists a run q −− . rk such that w = L(r1 ) · . r) = ∅ for all but a ﬁnite number of pairs (q. while a DFA has exactly one run on w. εkn−1 an εkn ∈ (Σ ∪ {ε})∗ for some k0 . q0 . q0 . Deﬁnition 2. Accepting runs are deﬁned as for NFAs. . They will be useful to provide a uniform conversion between automata and regular expressions. 2.11 A non-deterministic automaton with regular expression transitions (NA-reg) is a tuple A = (Q. F) be an automaton. which are a special case of NFA-ε. . an ∈ Σ∗ if A has an accepting run on εk0 a1 εk1 . The runs and accepting runs of NA-ε are deﬁned as for NAs. . F). . . 2. We ﬁrst give an informal idea of the construction. . Σ. . Deﬁnition 2.. k1 . r) ∈ Q × RE(Σ). the automaton is in normal form. q0 . where Q. The powerset construction transforms an NFA A into a DFA B recognizing the same language. . A nondeterministic ﬁnite automaton with regular expression transitions (NFA-reg) is a NA-reg with a ﬁnite set of states. and where • δ : Q × RE(Σ) → P(Q) is a relation such that δ(q. AUTOMATA CLASSES AND CONVERSIONS • δ : Q × (Σ ∪ {ε}) → P(Q) is a transition relation. We extend NAs to allow regular expressions on transitions. . . . Σ. · L(rk ). δ.10 Let A = (Q. .3. i. Since DFAs are a special case of NFA. and F are as for NAs. when the output is an automaton.20 CHAPTER 2. . we assume that automata are in normal form. kn ≥ 0. . A accepts a word a1 . A accepts a word w ∈ Σ∗ if A has an accepting run on r1 . Convention: Unless otherwise stated. an ∈ Σ∗ . Recall that a NFA may have many diﬀerent runs on a word w. A nondeterministic ﬁnite automaton with ε-transitions (NFA-ε) is a NA-ε with a ﬁnite set of states. −− q on some input a1 .1 From NFA to DFA. . Such automata are called NA-reg and they are obviously a generalization of both regular expressions and NA-εs. . Σ.e.3 Conversion Algorithms between Finite Automata We recall that all our data structures can represent exactly the same languages.

3. and its transition function is deﬁned to ensure that the run of B on w leads from {q0 } to Qw (see below). given A = (Q. a) for every Q ⊆ Q. • ∆(Q . B “keeps track” of the set Qw : its states are sets of states of A. ∆. and process it. ∅ is a state of B. Notice. and so we deﬁne ∆(Q . Σ. {qn−1 } are reachable. Then B has 2n states. and it contains at most one copy of an element (i. . a) for every Q ⊆ Q and every a ∈ Σ. a “trap” state. The name ‘worklist” is actually misleading. in other algorithms just W (we use a calligraphic font to emphasize that in this case the objects of the worklist are sets). CONVERSION ALGORITHMS BETWEEN FINITE AUTOMATA 21 Qw the set of states q such that some run of A on w leads from its initial state q0 to q. Notice that we may have Q = ∅. if an element already in the workset is added to it again. . Σ. F) we deﬁne the DFA B = (Q. however.2. But we have Qwa = q∈Qw δ(q. but only the singletons {q0 }. and • F = {Q ∈ Q | Q ∩ F ∅}. It is then easy to ensure that A and B recognize the same language: it suﬃces to choose the ﬁnal states of B as the sets of states of A containing at least one ﬁnal state. q0 . a). .3. F) as follows: • Q = P(Q).1 constructs only the reachable states. assume A happens to be a DFA with states {q0 . . . “Keeping track of the set Qw ” amounts to satisfying ∆(Qw . Q0 . that B may not be in normal form: it may have many states non-reachable from Q0 . qn−1 }. The conversion algorithm shown in Table 2. notice that picking . . Summarizing. a) = q∈Q δ(q. . Let us now deﬁne the transition function of B.. . a) = ∅ for every a ∈ ∆. a) = Qwa for every word w. Worklist algorithms maintain a worklist of objects waiting to be processed.e. it does not impose any order onto its elements. because the worklist is in fact a workset. In NFAtoDFA() the worklist is called W. since ∆(∅. with abstract sets as data structure. Worklist algorithms repeatedly pick an object from the worklist (instruction pick Q from W). Intuitively. because for every word w: iff iff iff iff B accepts w Qw is a ﬁnal state of B Qw contains at least a ﬁnal state of A some run of A on w leads to a ﬁnal state of A A accepts w. say ∆. a) = q∈Q δ(q. We however keep the name ‘worklist” for historical reasons. δ. with {q0 } as initial state. • Q0 = {q0 }. it is a worklist algorithm. and. For instance. in this case. Like nearly all the algorithms presented in the next chapters. It is written in pseudocode. the contents of the workset do not change).

respectively. The algorithm picks states from the worklist in order {1}.22 CHAPTER 2. objects picked from the worklist are stored (in NFAtoDFA() they are stored in Q). because the same object might be added to and removed from the worklist inﬁnitely many times. For this. Σ. δ. and objects are added to the worklist only if they have not been stored yet.1: NFAtoDFA(A) an object removes it from the worklist. consider the family {Ln }n≥1 of languages over Σ = {a. Q0 ← {q0 } W = {Q0 } while W ∅ do pick Q from W add Q to Q if Q ∩ F ∅ then add Q to F for all a ∈ Σ do Q ← δ(q.6 shows an NFA at the top. 4}. 4}. The language Ln is accepted by the NFA with n + 1 states shown in Figure 2. Processing an object may generate new objects that are added to the list. 2}. Applying the subset construction. That is. {1. because the rest are not reachable from {1}. even if the set of all objects is ﬁnite. Termination is guaranteed by making sure that no object that has been removed from the list once is ever added to it again. ∆. Snapshot (e) is taken at the end. The . a. F) Output: DFA B = (Q. and {1. 2. The DFA for L3 is shown on the left of Figure 2. 4}. The algorithm terminates when the worklist is empty. {1. 3}. q0 . Figure 2. Q ) to ∆ Table 2. yields a DFA with 2n states.7(b). Since objects removed from the list may generate new objects. 2}. the automaton chooses one of the a’s in the input word. 2. worklist algorithms may potentially fail to terminate. Q0 . Notice that out of the 24 = 16 subsets of states of the NFA only 5 are constructed. F ← ∅. b} given by Ln = (a + b)∗ a(a + b)(n−1) . {1. If A has n states. {1. To show that this bound is essentially reachable. Ln contains the words of length at least n whose n-th letter starting from the end is an a. {1. 3}. {1. Complexity. Σ. F) with L(B) = L(A) 1 2 3 4 5 6 7 8 9 10 Q. ∆. however. a) q∈Q if Q Q then add Q to W add (Q . and some snapshots of the run of NFAtoDFA() on it. 4}. AUTOMATA CLASSES AND CONVERSIONS NFAtoDFA(A) Input: NFA A = (Q. Snapshots (a)-(d) are taken right after it picks the states {1. The states of the DFA are labelled with the corresponding sets of states of the NFA. and checks that it is followed by exactly n − 1 letters before the word ends. then the output of NFAtoDFA(A) can have up to 2n states.7(a): intuitively.

6: Conversion of a NFA into a DFA. 2 a a 1. 3 b b 1. . 2 b a 1. CONVERSION ALGORITHMS BETWEEN FINITE AUTOMATA 23 a. 3 a b 1. 4 1 b a (b) 1. 4 a (c) a b 1 b a 1. 2 a b a b b (d) a 1.3. 4 1. 2. 2. b 4 a a b 1 1. 3 b 1.2. 2 a (a) a b 1 1. 2. 4 1. 2 b b 1 b 1. b b 1 2 a 3 a. 4 1. 4 (e) Figure 2. 3 1.

3 a a b 1. b a. b a 2 a. 3. AUTOMATA CLASSES AND CONVERSIONS a. b a. b n+1 1 . 2 a b a 1. and DFA for L3 . 4 a baa a a b aba a 1. . 2. n b b b b 1 a a 1.. 2. 4 b 1.. 3 b b bba a b a bab b b 1. Figure 2. 2.7: NFA for Ln . (a) NFA for Ln .24 CHAPTER 2. 3. 4 aaa b aab a a (b) DFA for L3 and interpretation. 4 b bbb a a abb b 1.

8(b). The conversion then removes all ε-transitions. ε. (Notice that either αβ = a for some a ∈ Σ. u · a · v1 ) = δ(q0 . then a new shortcut (q. Assume there is a DFA An = (Q. after removing ε-transitions the automaton may not be in normal form. The interpreted version of the DFA is shown on right of Figure 2. The new transitions are shortcuts: If A has transitions (q. q ) and (q . delivering an NFA that recognizes the same language as A. After saturation we get the NFA-ε of Figure 2. The NFA-ε of Figure 2.8(c) shows the ﬁnal result.8(a) accepts ε. and visiting no ε-transitions. b}∗ → Q. either both u · a · v1 · u and u · b · v2 · u are accepted by An or neither do. We can also easily prove that any DFA recognizing Ln must have at least 2n states. then it moves to the state storing a2 . the conversion ﬁrst adds to A new transitions that make all ε-transitions redundant. a b ε ε a ε b ε . β. To decide whether A accepts ε. and normalization. there must be two words u · a · v and u · b · v of length n for which for all σ ∈ Σ. Figure 2. then we mark its initial state as ﬁnal. we cannot yet remove ε-transitions. w). if αβ = a and A has a further transition (q . q0 .3. for example q0 − q1 − q2 − q3 − q4 − q5 − q6 − → − → − → − → − → − → then after saturation it has a run accepting the same word. But then we would have that δ(q0 . which clearly does not change the language. . So the na¨ve procedure runs in three phases: saturation. If the DFA is in the state storing a1 a2 . β to denote elements of Σ ∪ {ε}. if A accepts ε. saturation does not change the language of A. ı ε-check. We call the process of adding all possible shortcuts saturation. we check if some state reachable from A by a sequence of ε-transitions is ﬁnal. In this section we use a to denote an element of Σ. αβ. Since |Q| < 2 1 2 ˆ ˆ ˆ ˆ δ(q0 . Σ.7(b). States are ﬁnal if the ﬁrst letter they store is an a. q ) is added. an+1 . However. To solve this problem. a. without changing the recognized language: every word accepted by A before adding the new transitions is accepted after adding them by a run without ε-transitions. an and it reads the letter an+1 . removing all ε-transitions yields an NFA that no longer accepts ε. . where δ(q. in general. then the shortcut (q. |a · v1 · u| = |b · v2 · u| = n. α. however. u · b · v2 ).) Shortcuts may generate further shortcuts: for instance. . that is. or αβ = ε. Notice that. Since.3. u · b · v2 · u). w · σ) = δ(δ(q. because some states may no longer be reachable. Let A be a NFA-ε over an alphabet Σ. 2. namely q0 − q4 − q6 − → − → However. and if before saturation A has a run accepting a nonempty word. . σ) for all w ∈ Σ∗ and n . CONVERSION ALGORITHMS BETWEEN FINITE AUTOMATA 25 states of the DFA have a natural interpretation: they “store” the last n letters read by the automaton. q ) is added. Obviously. δ. u · a · v1 · u) = δ(q0 . ε) = q and δ(q. this contradicts the assumption that An consists of exactly the words with an a at the n-th position from the end.2. We can extend δ to a ˆ ˆ ˆ ˆ mapping δ : Q × {a. and α. F) such that |Q| < 2n and L(An ) = Ln . q ) such that α = ε or β = ε.2 From NFA-ε to NFA. Loosely speaking. q ).

1 0.26 CHAPTER 2. 2 (c) After marking the initial state and ﬁnal and removing all ε-transitions.8(a) the algorithm does not construct the transition labeled by 2 leading from the state in the middle to the state on the right. 1. Figure 2. . Furthermore. and only the reachable states are generated (in the pseudocode α and β stand for either a letter of Σ or . 2 2 (b) After saturation 0 0. 1. it is possible to carry all three steps in a single pass. We give a worklist implementation of this procedure in which the check is done while saturating. 2 1 ε 1. AUTOMATA CLASSES AND CONVERSIONS 0 ε 1 ε 2 (a) NFA-ε accepting L(0∗ 1∗ 2∗ ) 0 ε 0.8: Conversion of an NFA-ε into an NFA by shortcutting ε-transitions. For instance. for the NFA-ε of Figure 2. 2 2 0. and a stands for a letter of Σ). the algorithm avoids constructing some redundant shortcuts. However. 1 1 1.

q2 ) leaves W it is added to either δ or δ . and a transition enters W only if it does not belong to either δ or δ . .3. Σ. q3 ) δ ∪ δ then add (q1 . Σ. Proof: Notice ﬁrst that every transition that leaves W is never added to W again: when a transition (q1 . then q2 is reachable in B (after termination). CONVERSION ALGORITHMS BETWEEN FINITE AUTOMATA NFAεtoNFA(A) Input: NFA-ε A = (Q. . q2 ) to δ . add (q1 . F ) with L(B) = L(A) 1 q0 ← q0 2 Q ← {q0 }. β. α. It remains to prove L(A) = L(B). α. but the diﬀerent cases require some care. applying the invariant we get that every state of Q is reachable from q0 in B. and so we devote it a proposition. qn−1 − qn be a run of A such that qn ∈ F.e. if α = ε then q1 = q0 . q2 ) to δ . and that it is in normal form. and none of them is an ε-transition because of the guard in line 6. a) do 11 if (q2 . q0 . q3 ∈ δ(q2 . Then B is a NFA and L(A) = L(B). we ﬁrst prove ε ε that ε ∈ L(A) implies ε ∈ L(B). a. F) Output: NFA B = (Q .. q3 ) to W 12 else / ∗ α = ε ∗ / 13 add (q1 . δ .. q3 ) δ then add (q2 . i. ε) do 9 if (q1 . we need the following invariant. δ. q3 ) δ then add (q1 . then we prove by induction on n that a transition . β. q2 ) added to W.e. and let B = NFAεtoNFA(A). q0 . observe that transitions are only added to δ in line 7. α. if q2 ∈ F then add q2 to F 8 for all q3 ∈ δ(q2 . q ) ∈ δ | q = q0 } 4 while W ∅ do 5 pick (q1 . the algorithm eventually exits the loop and terminates. which can be easily proved by inspection: for every transition (q1 . α. α. α. that every state of Q is reachable from q0 in B. For the second part. q3 ∈ δ(q2 . q3 ) to W 27 The correctness proof is conceptually easy. If n > 0. δ ← ∅. If − → − → n = 0 (i. Now. which can be proved by inspection. then we are done. For the inclusion L(A) ⊆ L(B). if q2 ∈ F then add q0 to F 14 for all β ∈ Σ ∪ {ε}. Let q0 − q1 . Since every execution of the while loop removes a transition from the worklist.2. Proposition 2. and if α ε. since new states are added to Q only at line 7. qn = q0 ). To show that B is a NFA we have to prove that it only has non-ε transitions. α.12 Let A be a NFA-ε. F ← F ∩ {q0 } 3 δ ← ∅. α. β) do 15 if (q1 . W ← {(q. q2 ) from W 6 if α ε then 7 add q2 to Q . q3 ) to W 10 for all a ∈ Σ. The inclusion L(A) ⊇ L(B) follows from the fact that every transition added to δ is a shortcut. a. For the ﬁrst part.

Complexity. We present algorithms that. qm+2 ). − qm1 −− qm1 +1 − . q2 ). β. and so (q0 .1 From regular expressions to NFA-ε’s Given a regular expression s over alphabet Σ. qm is added to F at line 7. Since every transition is removed from W at most once. produces a sequence of NFA-regs. 14). a1 a2 an ε ε a1 ε ε an ε ε 2. ε. qm2 +1 ) is eventually added at line 11. . qn−1 ) is eventually added to W. it is convenient do some preprocessing by exhaustively applying the following rewrite rules: ∅·r r+∅ ∅∗ ∅ r ε r·∅ ∅+r ∅ r . qn ) is added to W at line 3. . (q0 . − qm − → − → −→ − → − → −→ − → − → such that qm ∈ F. the one initial and the other ﬁnal. . . then by hypothesis (q0 . Observe that the algorithm processes pairs of transitions (q1 . qm2 ) are eventually added at line 9. . qmn −− qm −→ −→ −→ is a run of B. . We have just proved that a transition (q0 . and so it is O(|Q|2 · |Σ| · |δ|). We now prove that for every w ∈ Σ+ . a1 . qn ) is added to W at line 15. qn ) is eventually added to W (and so eventually picked from it). AUTOMATA CLASSES AND CONVERSIONS (q0 . Then A has a run q0 − .28 CHAPTER 2. − qmn −− qmn +1 − . The runtime is dominated by the processing of the pairs. and so w ∈ L(B). α. Let w = a1 a2 . ε.4 Conversion algorithms between regular expressions and automata To convert regular expressions to automata and vice versa we use NFA-regs as introduced in Definition 2. So (q0 . a1 . If n = 1. q2 ) comes from W and (q2 . and a regular expression r “is” the NFA-reg Ar having two states.4. then (q0 . a2 . if w ∈ L(A) then w ∈ L(B). an with n ≥ 1. we obtain that q0 −− qm2 −− qm3 . 10. (q0 . q3 ) from δ (lines 8. the algorithm processes at most |Q|2 · |Σ| · |δ| pairs. qm1 ) is eventually added to W. . where (q1 . Iterating this argument. . . which implies that q0 is eventually added to F at line 13. and ending in an NFA-reg of the other subclass. Moreover. β. q3 ). α. . and (qm2 . .11. . . each one recognizing the same language as its predecessor in the sequence. picked from it at some later point. . a1 . If n > 1. qm1 +1 ) is eventually added at line 15. 2. ε. ε. and a single transition labeled r leading from the initial to the ﬁnal state. ε. given an NFA-reg belonging to one of this subclasses. Both NFA-ε’s and regular expressions can be seen as subclasses of NFA-regs: an NFA-ε is an NFA-reg whose transitions are labeled by letters or by ε. (q2 .

.e. then either r = ∅. since the initial regular expression does not contain any occurrence of the ∅ symbol. The number of transitions is linear in the number of symbols of r. Furthermore. if r is the resulting regular expression. since each rule splits a regular expression into its constituents. this NFA-reg is necessarily an NFA-ε. the result of the preprocessing is a regular expression for the same language as the original one. In the former case. In the second. The conversion runs in linear time. Moreover. we transform the NFA-reg Ar into an equivalent NFA-ε by exhaustively applying the transformation rules of Figure 2.10 on page 30. It is easy to see that each rule preserves the recognized language (i. or r does not contain any occurrence of the ∅ symbol. where a ∈ Σ ∪ {ε} r1 r2 r1 r2 Rule for concatenation r1 r1 + r2 r2 Rule for choice r r∗ ε ε Rule for Kleene iteration Figure 2.2. Moreover. the NFA-regs before and after the application of the rule recognize the same language). we eventually reach an NFA-reg to which no rule can be applied. It follows immediately from the rules that the ﬁnal NFA-ε has the two states of Ar plus one state for each occurrence of the concatenation or the Kleene iteration operators in r. Example 2. Complexity.9. The result of applying the transformation rules is shown in Figure 2.9: Rules converting a regular expression given as NFA-reg into an NFA-ε. a Automaton for the regular expression a. we can directly produce an NFA-ε.and right-hand-sides of each rule denote the same language.4.13 Consider the regular expression (a∗ b∗ + c)∗ d. . CONVERSION ALGORITHMS BETWEEN REGULAR EXPRESSIONS AND AUTOMATA29 Since the left.

30 CHAPTER 2.10: The result of converting (a∗ b∗ + c)∗ d into an NFA-ε. . AUTOMATA CLASSES AND CONVERSIONS (a∗b∗ + c)∗d (a∗b∗ + c)∗ a∗ b∗ + c ε ε d d a∗ b∗ ε ε d c a∗ ε b∗ ε d c ε a ε ε ε ε b ε d c Figure 2.

unless the only states left are the initial state and the ﬁnal state q f . replace each pair of transitions (q ... but possibly q = q ) by a shortcut (q . and replace the set of ﬁnal states by {q f }. Each phase consist of two steps.. replace it by the shortcut (q . we transform it into the NFA-reg Ar for some regular expression r.. Rule 3 1 Notice that it can have at most one. q ) (where q q q . (q. t. q). otherwise. q ). remove q.. the algorithm runs in phases... r1 s∗ sm . Rule 1 . . • Pick a non-ﬁnal and non-initial state q.. CONVERSION ALGORITHMS BETWEEN REGULAR EXPRESSIONS AND AUTOMATA31 2.. q ).4. rn sm . After shortcutting all pairs. q ). It is again convenient to apply some preprocessing to guarantee that the NFA-ε has one single ﬁnal state. st. rn s s1 rn s∗ sm ∗ r1 . q ) by a single transition (q.4. q ).. s. r. and shortcut it: If q has a self-loop (q. and that this ﬁnal state has no outgoing transition: • Add a new state q f and ε-transitions leading from each ﬁnal state to q f . Graphically: ε qf ε . After preprocessing.2. because otherwise we would have two parallel edges. (q. r1 + r2 .2 From NFA-ε’s to regular expressions Given an NFA-ε A. Graphically: s r1 s∗ s1 s1 . sr∗ t. r1 r2 r1 + r2 Rule 2 The second step reduces the number of states by one. r1 . q)1 . The ﬁrst step yields an automaton with at most one transition between any two given states: • Repeat exhaustively: replace a pair of transitions (q.. r2 ..

Snapshot (b) is taken right after applying rule 1. . Similarly. F) Output: regular expression r with L(r) = L(A) 1 2 3 4 5 6 7 8 apply Rule 1 while Q \ (F ∪ {q0 }) ∅ do pick q from Q \ (F ∪ {q0 }) apply exhaustively Rule 2 apply Rule 3 to q apply exhaustively Rule 2 if there is a transition from q0 to itself then apply Rule 4 return the label of the (unique) transition Example 2. then we are done. r1 . Moreover. and replace the transition (q0 . q f ). b} that contain an even number of a’s and an even number of b’s. plus possibly a self-loop (q0 . respectively at the bottom. respectively an odd. number of a’s. if it has read an even. the NFA-reg is Ar1 . it is in the states at the top. After apply Rule 2 exhaustively to guarantee that there is at most one transition from q0 to itself. AUTOMATA CLASSES AND CONVERSIONS At the end of the last phase we have now an NFA-reg with exactly two states. Snapshot (f) shows the ﬁnal result. Snapshots (c) to (e) are taken after each execution of the body of the while loop. Otherwise we apply some post-processing: • If the automaton has a self-loop (q0 .14 Consider the automaton of Figure 2. r2 . q0 . q f has no outgoing transitions. respectively an odd. (The DFA is in the states on the left.11(a) on page 33. if it has read an even.) The rest of the ﬁgure shows some snapshots of the run of NFAtoRE() on this automaton. r2 r1 . the initial state q0 (which is non-ﬁnal) and the ﬁnal state q f . If there is no self-loop. q0 ). and from q0 to q f . remove it. The resulting automaton is Ar2 r1 . respectively on the right. r1 . q0 ). because it was initially so and the application of the rules cannot change it. Σ. we are left with an NFA-reg having at most two transitions left: a transition (q0 . δ. r1 r2 ∗ r1r2 Rule 4 The complete algorithm is: NFAtoRE(A) Input: NFA-ε A = (Q. number of b’s. q f ). r2 . q f ) ∗ ∗ by (q0 . It is a DFA for the language of all words over the alphabet {a.32 CHAPTER 2.

2.4.11: Run of NFA-εtoRE() on a DFA . CONVERSION ALGORITHMS BETWEEN REGULAR EXPRESSIONS AND AUTOMATA33 (a) (b) ε a a a a b a a aa b b b a a aa b b b b (c) ε aa (d) ε aa + bb ab b b ba bb a a aa ba + ab ab + ba aa + bb (e) ε (f) aa + bb + (ab + ba)(aa + bb)∗(ba + ab) ( aa + bb + (ab + ba)(aa + bb)∗(ba + ab) )∗ Figure 2.

. . .34 CHAPTER 2. The result is shown in Figure 2. . remove the ε-transitions. qn−1 }. δ = {(qi . i. b} with an even number of a’s and an even number of b’s.12 on page 35 shows four snapshots of the process of applying rules 1 to 4.14. b a. qn−1 . δ. A last step allowing us to close the circle is presented in the next chapter. . Now we convert this expression into a NFA-ε: Figure 2. It is easy to see that after eliminating the state qi the NFA-reg contains some transitions labeled by regular expressions with 3i occurrences of letters. Σ. q2 . the runtime of the algorithm is independent of the order in which states are eliminated. and determinize the automaton. We start with the DFA of Figure 2. Σ = {ai j | 0 ≤. The result is shown in Figure 2. The exponental blowup cannot be avoided: It can be shown that every regular expression recognizing the same language as A contains at least 2(n−1) occurrences of letters. q0 . consider for each n ≥ 1 the NFA A = (Q. The ﬁgure converts it into a regular expression. . b . and F = {q0 }. Exercises Exercise 1 Transform this DFA a b start a into an equivalent regular expression. . . In the next step we convert the NFA-ε into an NFA. then transform this expression into an NFA (with ε-transitions). F) where Q = {q0 . To see this. then the algorithm works in polynomial time. j ≤ n − 1}. Observe that we do not recover the DFA we started with.13 on page 36. but another one recognizing the same language. By symmetry. If regular expresions are stored as strings or trees (following the syntax tree of the expression). i. j ≤ n − 1}. because the label for a new transition is obtained by concatenating aor starring already computed labels. we transform the NFA into a DFA by means of the subset construction.11(a) recognizing the words over {a. 2. Consider the order q1 . If regular expressions are stored as acyclic directed graphs (the result of sharing common subexpressions in the syntax tree). ai j . Exercise 2 Prove or disprove: the languages of the regular expressions (1 + 10)∗ and 1∗ (101∗ )∗ are equal. then the complexity can be exponential. The complexity of this algorithm depends on the data structure used to store regular expressions. AUTOMATA CLASSES AND CONVERSIONS Complexity. Finally.5 A Tour of Conversions We present an example illustrating all conversions of this chapter. q j ) | 0 ≤.

5.12: Constructing a NFA-ε for (aa + bb + (ab + ba)(aa + bb)∗ (ab + ba))∗ .2. A TOUR OF CONVERSIONS 35 ( aa + bb + (ab + ba)(aa + bb)∗(ab + ba) )∗ (a) (ab + ba)(aa + bb)∗(ab + ba) ε aa ε bb (b) a aa + bb ε ab + ba ε a a b ε b ε b ab + ba a ε a b b a ε b b a a ε b b ε a b a (c) (d) Figure 2.

and 6).g. and 6. 1} that do not contain 010 as subword. then this exposes the cards 3. A language L ⊆ Σ∗ is called star-free if there exists an extended regular expression r without any occurrence of a Kleene star operation such that L = L(r). . 2. where L(r) = L(r) and L(r ∩ r ) = L(r) ∩ L(r ).. Cut the deck at any point into two piles. and the top card is black. Prove with the help of regular expressions that all the exposed cards have the same color.36 CHAPTER 2. Exercise 4 Which regular expressions r satisfy the implication (rr = r ⇒ r = r∗ ) ? Exercise 5 (R. AUTOMATA CLASSES AND CONVERSIONS 11 a a 10 b 7 b a a 8 b 9 a a 1 a 2 b b a a a a 3 b b 4 b 5 a b b 12 b 6 b Figure 2. For example. Hint: The expression (BR)∗ ( + B) represents the possible initial decks. and then perform a riﬄe (also called a dovetail shuﬄe) to yield a new deck. E. then the cards 1.12(d) Exercise 3 Give a regular expression for the words over {0. Now. we can cut a deck with six cards 123456 into two piles 12 and 3456. if it yields 314256. and the riﬄe yields 132456 or 314256. then r and r ∩ r are also regular expressions.13: NFA for the NFA-ε of Figure 2. Give a regular expression for the set of possible decks after the riﬄe. take thecards from the new deck two at a time (if the riﬄe yields 132456. depending on the pile we start the riﬄe with. Majumdar) Consider a deck of cards (with arbitrary many cards) in which black and red cards alternate. Exercise 6 Extend the syntax and semantics of regular expressions as follows: If r and r are regular expressions over Σ. 4.

. . . . whose initial state has no incoming transitions. whose states are all ﬁnal. c. for L = {abc. 5 37 Figure 2.13 Σ∗ is star-free. we have Lpref = {abc. bc. 6 a 1 b b 3. having one single ﬁnal state. Which of the above hold for DFAs? Which ones for NFA. . ε.g.14: DFA for the NFA of Figure 2.2. Give a regular expression rpref so that L(rpref ) = L(rpref ). . ε. give an algorithm that receives a regular expression r as input and returns a regular expression rpref so that L(rpref ) = L(r)pref . Consider the regular expression r = (ab + b)∗ c. . . Exercise 7 Prove or disprove: Every regular language is recognized by a NFA .? Exercise 8 Given a language L. . 1. respectively. d} and Lsuf = {abc. whose ﬁnal state has no outgoing transitions. . because Σ∗ = L( ∅ ). d}. 2. 11 4. Show that L((01 + 10)∗ ) is star-free. Given an automaton A. 7 a 10 b b a a b a b a 9. 3. a. construct automata Apref and Asuf so that L(Apref ) = L(A)pref and L(Asuf ) = L(A)suf . of words in L. . 12 b 8. • . • . More generally. let Lpref and Lsuf denote the languages of all preﬁxes and all suﬃxes. E. d}.5. • . A TOUR OF CONVERSIONS a 2. • . ab.

Exercise 12 Let L be an arbitrary language over a 1-letter alphabet. An existential state q accepts a word w (i. Notice that a DFA accepts the same language with both the existential and the universal accepting conditions. if there are words w0 . respectively.. .e.38 CHAPTER 2. transitions may be added to the worklist more than once. yielding alternating automata. a. denoted by u w. q ) the state q accepts w . Consider the variant in which A accepts w if all runs of A on w are accepting (in particular. The states of an alternating automaton are partitioned into existential and universal states. · · · . Exercise 10 Consider the family Ln of languages over the alphabet {0. Exercise 13 In algorithm NFAεtoNFA for the removal of -transitions. The upward closure of a language L is the language L ↑= {u ∈ Σ∗ | ∃w ∈ L : w u}. processed and removed from the worklist is ever added to the worklist again. The language recognized by an alternating automaton is the set of words accepted by its initial state. The downward closure of L is the language L ↓:= {u ∈ Σ∗ | ∃w ∈ L : u w}. Exercise 14 We say that u = a1 · · · an is a scattered subword of w. This is sometimes called the existential accepting condition. q ) such that q accepts w . w ∈ L(q)) if w = ε and q ∈ F or w = aw and there exists a transition (q. no transition that has been added to the worklist. a. • Prove that every NFA (and so in particular every DFA) recognizing Ln has at least O(2n ) states. Exercise 11 The existential and universal accepting conditions can be combined. Give a NFA-ε A and a run of NFAεtoNFA(A) in which this happens. AUTOMATA CLASSES AND CONVERSIONS Exercise 9 Recall that a nondeterministic automaton A accepts a word w if at least one of the runs of A on w is accepting. Give an algorithm that transforms an automaton with universal accepting condition into a DFA recognizing the same language. Prove that L∗ is regular. A universal state q accepts a word w if w = ε and q ∈ F or w = aw and for every transition (q. if A has no run on w then it accepts w). Give an algorithm that transforms an alternating automaton into a DFA recognizing the same language. wn ∈ Σ∗ such that w = w0 a1 w1 a2 · · · an wn . This is called the universal accepting condition. • Give an automaton of size O(n) with universal accepting condition that recognizes Ln . 1} given by Ln = {ww ∈ Σ2n | w ∈ Σn }. However. Give algorithms that take a NFA A as input and return NFAs for L(A) ↑ and L(A) ↓.

B ⊆ Σ∗ with ε A. and ∪ represents set union. Y = Σ∗ is a solution. and solve the equation for Y using Arden’s Lemma. with X as initial state. Compare with the elimination procedure shown in Figure 2. We take as variables the states of the automaton.a. F) a system of linear equations as follows. qI . where the variables X. and ﬁnally W. Z. Arden’s Lemma states that given two languages A. Consider the DFA of Figure 2. Y. Prove Arden’s Lemma. Y represent languages over the alphabet Σ = {a. A TOUR OF CONVERSIONS 39 Exercise 15 Algorithm NFAtoRE transforms a ﬁnite automaton into a regular expression representing the same language by iteratively eliminating states of the automaton.. e. . Hint : In a ﬁrst step. Σ. For example. Consider the following system of equations. X. the smallest language X ⊆ Σ∗ satisfying X = AX ∪ B (i. In this exercise we see that eliminating states corresponds to eliminating variables from a system of language equations. or X= if X ∈ F. The system has an equation for each state of the form X= if X F.. X = {a}X ∪ {b} · Y ∪ {c} Y = {d}X ∪ {e} · Y ∪ { f }.) This system has many solutions. a solution contained in every other solution. f }. Y. which we call here X. the smallest language L such that AL ∪ B is again L. c. where AL is the concatenation of A and L) is the language A∗ B. Start with Y. .2. The associated system of linear equations is X Y Z W = = = = {a}Y {a}X {b}X {b}Y ∪ ∪ ∪ ∪ {b}Z ∪ {ε} {b}W {a}W {a}Z (X. 2. Find the smallest solution with the help of Arden’s Lemma. . But there is a unique minimal solution. read from top to bottom and from left to right. δ. Let X. 1. 3. then eliminate Z.a.. the operator · represents language concatenation. b. .5. d.11(a). i. consider X not as a variable.Y)∈δ {a}Y (X. but as a constant language.11.e. We can associate to any NFA A = (Q.Y)∈δ {a} · Y ∪ {ε} Calculate the solution of this linear system by iteratively eliminating variables. W be the states of the automaton. Z.e. (Here {a}X denotes the concatenation of the language {a} containing only the word a and X.

the unary encoding. Similarly. the set containing for each word w ∈ msbf(n) its reverse. • Give a regular expression of size O(n) without + and of start-height 1 such that the smallest DFA equivalent to it has O(2n ) states.t. Same for the set of numbers divisible by 3. For example: lsbf(6) = L(0110∗ ) and lsbf(0) = L(0∗ ). Construct and compare DFAs recognizing the encodings of the even numbers n ∈ IN0 w. let msbf(n) be the set of most-signiﬁcant-bit-ﬁrst encodings of n. AUTOMATA CLASSES AND CONVERSIONS Exercise 16 Given n ∈ IN0 .40 CHAPTER 2. For example: msbf(3) = 0∗ 11 and msbf(9) = 0∗ 1001 msbf(0) = 0∗ . Exercise 17 • Give a regular expression of size O(n) such that the smallest DFA equivalent to it has O(2n ) states. followed by n written in binary. where n is encoded by the word 1n ..r. i.e. (Paper by Shallit at STACS 2008). and the lsbfencoding. the msbf-encoding. 3. the words that start with an arbitrary number of leading zeros.. let lsbf(n) denote the set of least-signiﬁcant-bit-ﬁrst encodings of n. 2. • Give a regular expression of size O(n) without + such that the smallest DFA equivalent to it has O(2n ) states. i. . 1. Give regular expressions corresponding to the languages in (a) and (b).e.

up to renaming of the states). the existence of a unique minimal DFA has two important consequences. since it has smaller size. Obviously. NFA) is minimal if a a a a b b a a a b b b b b b a a b a b a b Figure 3. A DFA (respectively. We show that every regular language has a unique minimal DFA up to isomorphism (i.1 into the one on the left. and check 41 . converts it into the unique minimal DFA. the automaton on the left of the ﬁgure is better as a data structure for this language. as mentioned above.Chapter 3 Minimization and Reduction In the previous chapter we showed through a chain of conversions that the two DFAs of Figure 3.. As we shall see. we can construct their minimal DFAs..1 recognize the same language. and present an eﬃcient algorithm that “minimizes” a given DFA.e. From a data structure point of view. i. In particular.e. canonicity leads to a fast equality check: In order to decide if two regular languages are equal.1: Two DFAs for the same language no other DFA (respectively. the algorithm converts the DFA on the right of Figure 3. the minimal DFA is the one that can be stored with a minimal amount of memory. the uniqueness of the minimal DFA makes it a canonical representation of a regular language. Second. NFA) recognizing the same language has fewer states. First.

and we are done. In order to formulate it we introduce the following deﬁnition: Deﬁnition 3. An example is the language of the DFAs in Figure 3. q can be reached from q0 by at least a word w.2 Let A = (Q. In the second part of the chapter we show that. the minimal NFA is not unique. and EE ab = OO. we have EE ε = EE. where EO contains the words with an even number of a’s and an odd number of b’s. A language L ⊆ Σ∗ is a residual of L if L = Lw for at least one w ∈ Σ∗ . F). which are residuals of EE.1 Minimal DFAs We start with a simple but very useful deﬁnition. is the language recognized by A with q as initial state. Moreover. q0 .. The language has four residuals. EE a = EE aaa = OE. the language recognized by the DA (Q. . OO. MINIMIZATION AND REDUCTION if they are isomorphic . Proof: (1) Let w ∈ Σ∗ . So LA (q) = Lw . b} with an even number of a’s and an even number of b’s. However. q0 . (2) Every state of A recognizes a residual of L. For example.2 shows the result of labeling the states of the two DFAs of Figure 3. Σ. 3. (1) Every residual of L is recognized by some state of A. Recall it is the language of all words over {a. etc.42 CHAPTER 3. There is a close connection between the states of a DA (not necessarily ﬁnite) and the residuals of its language. δ.3 Let L be a language and let A = (Q. Lemma 3. or just L(q) if there is no risk of confusion.1 with the languages they recognize. δ. (2) Since A is in normal form. Σ. Deﬁnition 3. namely EE. we show that a generalization of the minimization algorithm for DFAs can be used to at least reduce the size of an NFA while preserving its language.1. computing a minimal NFA is a PSPACE complete problem. the w-residual of L is the language Lw = {u ∈ Σ∗ | wu ∈ L} . Formally: for every w ∈ Σ∗ there is q ∈ Q such that LA (q) = Lw . i. unfortunately. Formally: for every q ∈ Q there is w ∈ Σ∗ such that LA (q) = Lw . So LA (q) = Lw . EO.1 Given a language L ⊆ Σ∗ and w ∈ Σ∗ . Σ. F) be a DA recognizing L. The language recognized by q. δ.e. denoted by LA (q). OE. q. Then a word wu ∈ Σ∗ is recognized by A iff u is recognized by A with q as initial state. A language may be inﬁnite but have a ﬁnite number of residuals. Figure 3. which we denote in what follows by EE. for which no eﬃcient algorithm is likely to exist. F) be a DA and let q ∈ Q. and let q be the state reached by the unique run of A on w.

• q0L = L. δL . Let w ∈ Σ∗ .2: Languages recognized by the states of the DFAs of Figure 3. We prove by induction on |w| that w ∈ L iff w ∈ L(C L ). F L ). and easy to show.2. we have a transition labeled by a leading from EE to OE. Example 3. we have EE a = OE. Σ. Deﬁnition 3. corresponding to the four residuals of EE. Proof: Let C L be the canonical DA for L. . It is intuitively clear. where: • QL is the set of residuals of L.1. q0L . that the canonical DA for a language L recognizes L: Proposition 3. i.. and both may be inﬁnite.4 Let L ⊆ Σ∗ be a language. Notice that the number of states of C L is equal to the number of residuals of L. a) = K a for every K ∈ QL and a ∈ Σ.e. Since.6 For every language L ⊆ Σ∗ . We prove L(C L ) = L. The notion of residuals allows to deﬁne the canonical deterministic automaton for a language L ⊆ Σ∗ . for instance.3. MINIMAL DFAS a a EE a OE a b b a a b b EE b b OO EO a OO EO b b a a OE b a b a OE b EO EE 43 Figure 3. QL = {Lw | w ∈ Σ∗ }.5 The canonical DA for the language EE is shown on the left of Figure 3. The canonical DA for L is the DA C L = (QL . and • F L = {K ∈ QL | ε ∈ K}.1. • δ(K. the canonical DA for L recognizes L. It has four states.

To prove uniqueness of the minimal automaton up to isomorphism. q0 . if δ(q. and |Q| = |QL | (A is minimal by assumption). and we have aw ∈ L ⇔ w ∈ La (deﬁnition of La ) ⇔ w ∈ L(C La ) (induction hypothesis) ⇔ aw ∈ L(C L ) (δL (L. Finally. a) = LA (q ).8 A DFA is minimal if and only if L(q) L(q ) for every two distinct states q and q . assume A is minimal. Since every DFA for L has at least one state for each residual and C L has exactly one state for each residual. and so C L is a minimal automaton for L. So all minimal DFAs are isomorphic.44 If |w| = 0 then w = ε. The following simple corollary is often useful to establish that a given DFA is minimal: Corollary 3. a) = q . . Also. MINIMIZATION AND REDUCTION (w = ) (deﬁnition of F L ) (q0L = L) (q0L is the initial state of C L ) If |w| > 0.3 the number the number of states of A is greater than or equal to the number of states of C L . We prove that LA is an isomorphism. q is ﬁnal iff ∈ R. a) = La ) We now prove that C L is the unique minimal DFA recognizing a regular language L (up to isomorphism). then C L is the unique minimal DFA up to isomorphism recognizing L. then the a-transition leaving q necessarily leads to the state recognizing Ra . A more formal proof looks as follows: Theorem 3. then LA (q ) = (LA (q))a . By Lemma 3. By Lemma 3. F) be an arbitrary DFA recognizing L. and let A = (Q. and we have ε∈L ⇔ L ∈ FL ⇔ q0L ∈ F L ⇔ ε ∈ L(C L ) CHAPTER 3. and every other minimal DFA for L also has exactly one state for each residual. then w = aw for some a ∈ Σ and w ∈ Σ∗ . and q is initial iff R = L. Σ. LA is a mapping that assigns to each state of A a residual of L.7 If L is regular. The informal argument goes as follows. and so LA : Q → QL . and so δL (LA (q). Moreover.3(2). δ. LA maps the initial state of A to the initial state of C L : LA (q0 ) = L = q0L . LA maps ﬁnal to ﬁnal and non-ﬁnal to non-ﬁnal states: q ∈ F iff ε ∈ LA (q) iff LA (q) ∈ F L . LA is bijective because it is surjective (Lemma 3. But the transitions. initial and ﬁnal states of a minimal DFA are completely determined by the residuals of the states: if state q recognizes residual R.3(2)). C L is minimal. Proof: Let L be a regular language.

Since every state of recognizes some residual. P0 consists of two blocks containing the ﬁnal and the non-ﬁnal states. then P is coarser than P . the number of states of a minimal DFA is equal to the number of residuals of its language. called blocks. but no state of Q \ F does. a)) for every a ∈ Σ.3. A partition of Q is a ﬁnite set P = {B1 . and so it is minimal. These two steps are described in Section 3. P0 = {F} if Q \ F is empty. Bn } of nonempty subsets of Q. Formally. So A has at most as many states as C L(A) . because every state of F accepts the empty word. respectively (or just one of the two if all states are ﬁnal or all states are nonﬁnal). The language partition. To ﬁnd a block to split.e. F) recognizing a regular language L.7. q0 . denoted by P . then. such that Q = B1 ∪ . we ﬁrst observe the following: Fact 3. where a block contains all states recognizing the same residual. since every state recognizes some residual. To compute P we iteratively reﬁne an initial partition P0 while maintaining the following invariant: Invariant: States in diﬀerent blocks recognize diﬀerent languages. We call this partition the language partition. Then.9 If L(q1 ) = L(q2 ).2. If P reﬁnes P and P P. δ. 3. .1 Computing the language partition We need some basic notions on partitions. That is. P0 = {F. the number of states of A is less than or equal to the number of residuals. this yields a DFA in which every state recognizes a diﬀerent residual. [q]P = [q ]P if and only if L(q) = L(q ).1 and Section 3. and Bi ∩ B j = ∅ for every 1≤i j ≤ n. a)) = L(δ(q2 .2. Intuitively.2. MINIMIZING DFAS 45 Proof: (⇒): By Theorem 3. each state must recognize a diﬀerent residual. an operation usually called quotienting with respect to the partition. Q \ F} if F and Q \ F are nonempty.2 Minimizing DFAs We present an algorithm that converts a given DFA into (a DFA isomorphic to) the unique minimal DFA recognizing the same language.2. . then L(δ(q1 . (⇐): If all states of a DFA A recognize diﬀerent languages. Σ. . . For the rest of the section we ﬁx a DFA A = (Q. A partition P reﬁnes or is a reﬁnement of another partition P if every block of P is contained in some block of P. puts two states in the same block if and only if they recognize the same language (i. Notice that P0 satisﬁes the invariant. A partition is reﬁned by splitting a block into two blocks. the algorithm “merges” the states of each block into one single state. 3. . The block containing a state q is denoted by [q]P . ∪ Bn . The algorithm ﬁrst partitions the states of the DFA into blocks. and P0 = {Q \ F} if F is empty. . the same residual). .2.

F) Output: The language partition P . the green block and the letter b split the pink block into the magenta block (pink states with a b transition into the green block) and the rest. if L(δ(q1 . which stay pink. MINIMIZATION AND REDUCTION Now. In the ﬁnal step. and let a ∈ Σ. The initial partition. We prove in two steps that LanPar(A) computes the language partition. a. The partition reﬁnement algorithm LanPar(A) iteratively reﬁnes the initial partition of A until it becomes stable. . Σ. 1 2 3 4 5 6 if F = ∅ or Q \ F = ∅ then return {Q} else P ← {F. B ) splits B for some a ∈ Σ. B1 }. B ) splits B if there are q1 . B be (not necessarily distinct) blocks of a partition P. The algorithm terminates because every iteration increases the number of blocks by one. rephrasing in terms of blocks: if δ(q1 . and if all are ﬁnal then every state recognizes Σ∗ . B such that (a. a)) L(δ(q2 . The pair (a. B ) splits B P ← Ref P [B. A partition is unstable if it contains blocks B. then L(q1 ) L(q2 ). Deﬁnition 3. B ∈ P and a ∈ Σ such that (a. by contraposition. Example 3. consists of the yellow and the pink states. a)). In both cases all states recognize the same language. The result of the split is the partition Ref P [B. a) B } and B1 = {q ∈ B | δ(q. q2 ∈ B such that δ(q1 .10 Let B. LanPar(A) Input: DFA A = (Q. a. First.11 Figure 3. because q1 and q2 can be put in diﬀerent blocks while respecting the invariant. we show that it computes the coarsest stable reﬁnement of P0 .46 CHAPTER 3. a) B . a) ∈ B and δ(q2 .1. B ] = (P \ {B}) ∪ {B0 . a) ∈ B } . but q1 and q2 belong to the same block B. then B can be split. δ. a) and δ(q2 . or.3 shows a run of LanPar() on the DFA on the right of Figure 3. B ] return P Notice that if all states of a DFA are nonﬁnal then every state recognizes ∅. States that belong to the same block are given the same color. The yellow block and the letter a split the pink block into the green block (pink states with an a-transition to the yellow block) and the rest (pink states with an a-transition to other blocks). shown at the top. Then we prove that CSR is equal to P . q0 . denoted by CSR. Q \ F} while P is unstable do pick B. in other words we show that after termination the partition P is coarser than every other stable reﬁnement of P0 . where B0 = {q ∈ B | δ(q. a) belong to diﬀerent blocks. and stable otherwise. and a partition can have at most |Q| blocks. which stay pink. and the language partition is {Q}.

MINIMIZING DFAS 47 a a a b b b a b a a b b a b a a a b b b a b a a b b a b a a a b b b a b a a b b a b Figure 3.3.2.3: Computing the language partition for the DFA on the left of Figure 3.1 .

and there is a transition (B. respectively. MINIMIZATION AND REDUCTION Proof: LanPar(A) clearly computes a stable reﬁnement of P0 . a). exactly one of q1 and q2 .9. But this contradicts q1 ∈ B0 . Σ. a. a.. Actually. if two states q1 . Initially P = P0 . F P ) where .14 The quotient of A with respect to a partition P is the DFA A/P = (QP . a. Theorem 3.e. belongs to B0 . δ(q2 . By induction hypothesis. then there is (q1 . (c) Every stable reﬁnement P of P0 reﬁnes P . q2 belong to the same block of P . Since the only diﬀerence between P and Ref P [B. If w = ε then q1 ∈ F. By Fact 3. i. if w ∈ L(q1 ) then w ∈ L(q2 ). a) also belong to the same block. that L(q1 ) = L(q2 ). let q1 . for every word w. q2 ) ∈ δ such that q2 ∈ B . q ) for states q and Q belonging to B and B . it suﬃces to prove that. If w = aw . there is also a transition (q1 . B ] is the splitting of B into B0 and B1 . and the other belongs to B1 .48 Lemma 3. q2 belong to the same block of P . a. Obvious. w ∈ L(q1 ) iff w ∈ L(q2 ). Formally: Deﬁnition 3. 3. then P also reﬁnes Ref P [B. q1 ) ∈ δ such that w ∈ L(q1 ). then δ(q1 . q0P . q2 be states belonging to the same block B of P. We prove that they belong to the same block of P .12 LanPar(A) computes CSR. By symmetry. We prove that after termination P is coarser than any other stable reﬁnement of P0 . but at any time. So there exists a transition (q2 . q2 ) ∈ δ such that q2 ∈ B . So w ∈ L(q2 ). Let B be the block containing q1 .2 Quotienting It remains to deﬁne the quotient of A with respect to a partition. (b) P is stable. a. For this. Let q1 . that every stable reﬁnement of P0 reﬁnes P.13 CSR is equal to P . we have q2 ∈ F. for every a. and so w ∈ L(q2 ).2. which implies w ∈ L(q2 ). Now. a. q1 ) ∈ δ such that q1 ∈ B . Assume the contrary. Let P be an arbitrary stable reﬁnement of P0 . equivalently. We proceed by induction on the length of w. and so there is (q2 . CHAPTER 3. B ) from block B to block B if A contains some transition (q. and since P reﬁnes P0 . a. We show that they belong to the same block of Ref P [B. a. we prove more: we show that this is in fact an invariant of the program: it holds not only after termination. q2 be two states belonging to the same block of P . The states of the quotient are the blocks of the partition. δP . Since P is stable and q1 . a. say q1 . Since P is stable. So no block can be split. B ]. or. and so P reﬁnes P. Proof: The proof has three parts: (a) P reﬁnes P0 . we show that if P reﬁnes P. B ]. B does not split B.

q ) ∈ δ for some q ∈ B. a. Loosely speaking. and so also P0 ) . a.4 shows on the right the result of quotienting the DFA on the left with respect to its language partition. Proof: We prove that for every w ∈ Σ∗ we have w ∈ LA (q) iff w ∈ LA/P (B).4: Quotient of a DFA with respect to its language partition We show that A/P . let q be a state of A. and it has a transition between two colors (say. and let B be the block of P containing q. Then LA (q) = LA/P (B).2. The proof is by induction on |w|.16 Let P be a reﬁnement of P . MINIMIZING DFAS • QP is the set of blocks of P. 49 Example 3. B ) ∈ δP if (q. • q0P is the block of P containing q0 . the quotient of A with respect to the language partition. In particular. • (B. Lemma 3. is the minimal DFA for L. q ∈ B . because the quotient recognizes the same language as the original automaton.15 Figure 3. The main part of the argument is contained in the following lemma. a a a b b b a b a a a a b b a b b b a a b b Figure 3. L(A) = L(A/P). Then w = ε and we have iff iff iff iff ε ∈ LA (q) q∈F B⊆F B ∈ FP ε ∈ LA/P (B) (P reﬁnes P . it says that any partition in which states of the same block recognize the same language “is good” for quotienting. and • F P is the set of blocks of P that contain at least one state of F. The quotient has as many states as colors. an a-transition from pink to magenta) if the DFA on the left has such a transition.3. |w| = 0.

Using this result. Proof: To prove membership in PSPACE. So w ∈ LA (q) iff there is a transition (q. minimal NFAs of Figure 3. We use NPSPACE = PSPACE = co-PSPACE. Proof: By Lemma 3. a. Then w = aw for some a ∈ Σ. it suﬃces to give a procedure to decide if no NFA with at most k states is equivalent to A. The simplest witness of this fact is the language aa∗ .16. For this we construct all NFAs with at most k states (over the same alphabet as A).5. B ) ∈ δP . By the deﬁnition of A/P we have (B. a. a a a a Figure 3.17 The quotient A/P is the minimal DFA for L. 3. q ) ∈ δ such that w ∈ LA (q ). and we are done. which is recognized by the two non-isomorphic. reusing the same space for each of them. Moreover.5: Two minimal NFAs for aa∗ . MINIMIZATION AND REDUCTION |w| > 0. the states of A/P recognize residuals of L. decide if there exists an NFA equivalent to A having at most k states. decide whether L(A) = Σ∗ .3 Reducing NFAs There is no canonical minimal NFA for a given regular language. Let B be the block containing q . computing any of the minimal NFAs equivalent to a given NFA is computationally hard. and so: iff iff iff aw ∈ LA (q) w ∈ LA (q ) w ∈ LA/P (B ) aw ∈ LA/P (B) (deﬁnition of q ) (induction hypothesis) ( (B. and check that none of them is equivalent . we can easily prove that deciding the existence of a small NFA for a language is PSPACE-complete. Recall that the universality problem is PSPACE-complete: given a NFA A over an alphabet Σ. So assume that A has more than k states. B ) ∈ δP ) Proposition 3. Moreover.50 CHAPTER 3. two states of A/P recognize diﬀerent residuals by deﬁnition. So A/P has as many states as residuals. then we can answer A. observe ﬁrst that if A has at most k states. Since PSPACE = co-PSPACE. Theorem 3. a.18 The following problem is PSPACE-complete: given a NFA A and a number k ≥ 1.

a) of Deﬁnition 3. Σ. δ. q1 ) ∈ δ such that q1 ∈ B . a) ∈ B and δ(q2 . A look at Deﬁnition 3. if such a NFA. q2 belong to the same block of a stable partition and there is a transition (q2 . We extend the deﬁnition of stability to NFAs so that stable partitions still satisfy this property: we just replace condition δ(q1 . B is the NFA with one ﬁnal state and a loop for each alphabet letter. check if some NFA with one state is equivalent to A. q0 . then. and that Lemma 3. then A is not universal. and so any reﬁnement of P can be used to reduce A. to decide if a given NFA A is universal we can proceed as follows: Check ﬁrst if A accepts all words of length 1.16 still holds for NFAs with exactly the same proof. then A is not universal. a. since NPSPACE=PSPACE. q2 ∈ B such that δ(q1 . B Deﬁnition 3. P where B0 = {q ∈ B | δ(q. If not. since A accepts all words of length 1.3.3. B be (not necessarily distinct) blocks of a partition P. If not.3. but it does not provide any reduction. 3. On the other extreme. a. exists. say B. and so.14 and Lemma 3. Otherwise. and stable otherwise. which proves that LanPar(A) computes CSR for deterministic automata.19 (Reﬁnement and stability for NFAs) Let B. The algorithm stops when it observes that the current word is accepted by exactly one of A and B. The pair (a. The algorithm nondeterministically guesses a word. REDUCING NFAS 51 to A. but P is hard to compute for NFA. However. A partition is unstable if it contains blocks B. a) ∩ B = ∅.12. B ] = (P \ {B}) ∪ {B0 . it suﬃces to exhibit a nondeterministic algorithm that. we can reuse part of the theory for the DFA case to obtain an eﬃcient algorithm to possibly reduce the size of a given NFA. then there is also a transition (q1 . while maintaining the sets of states in both A and B reached from the initial states by the word guessed so far. given a NFA B with at most k states. checks that B is not equivalent to A (and runs in polynomial space).16 shows that the deﬁnition of the quotient automaton extends to NFA. Its proof only uses the following property of stable partitions: if q1 . and let a ∈ Σ. a) ∩ B = ∅. F) recognizing a language L.1 The reduction algorithm We ﬁx for the rest of the section an NFA A = (Q. a) ∩ B ∅} . Otherwise. So L(A) = L(A/P) holds for every reﬁnement P of P .10 by δ(q1 . B such that B splits B. a)∩B ∅ and δ(q2 . So A is universal. The result of the split is the partition Ref NFA [B. If an NFA is universal. q2 ) ∈ δ such that q2 ∈ B for some block B . one letter at a time. PSPACE-hardness is easily proved by reduction from the universality problem. a) ∩ B ∅ and δ(q2 . . Now. B ) splits B if there are q1 . a. B1 }. the partition that puts each state in a separate block is always a reﬁnement of P . then it is equivalent to an NFA with one state. To ﬁnd a reasonable trade-oﬀ we examine again Lemma 3. a) ∩ B = ∅} and B1 = {q ∈ B | δ(q. The largest reduction is obtained for P = P .

13 lead to the ﬁnal result: Corollary 3. states 3 and 5 recognize the same language. a. 14} (an a-transition to {15}). F) Output: The partition CSR of A. So we get: Theorem 3. and so A/P = A. yielding the blocks {1. then A/P may not be a minimal NFA for L. namely (a + b)∗ aa(a + b)∗ . the state at the bottom can be removed without changing the language. In the case of DFAs we had Theorem 3. as we will see later. Lemma 3. δ. {15}) to split the block of non-ﬁnal states. Example 3. since.21 Let A = (Q. Lemma 3. Notice that in the special case of DFAs it reduces to LanPar(A). P CSR(A) Input: NFA A = (Q. but the NFA is not minimal. 8. Then L(A/CSR) = L(A). and replace Ref P by Ref NFA as new notion of reﬁnement. Now. 1 2 3 4 5 6 if F = ∅ or Q \ F = ∅ then P ← {Q} else P ← {F. . In this example we have CSR P .7. 12. Initially we have the partition with two blocks shown at the top of the ﬁgure: the block {1. A possible run of CSR(A) is graphically represented at the bottom of the ﬁgure as a tree. The quotient automaton is shown in Figure 3. The partition CSR reﬁnes P . with exactly the same proof. B ∈ P and a ∈ Σ such that (a. .20 Let A = (Q. Q \ F} while P is unstable do pick B.13. The ﬁrst reﬁnement uses (a. The leaves of the tree are the blocks of CSR. . If all states of a NFA are nonﬁnal then every state recognizes ∅. as was the case for DFAs. . q0 . . F) be a NFA. q0 . . CSR is the partition indicated by the colors. . The NFA of Figure 3. for instance. The theorem does not hold anymore for NFAs. B ) splits B P ← Ref NFA [B. For instance. but if all are ﬁnal we can no longer conclude that every state recognizes Σ∗ .23 If A is an NFA. Σ.12 still holds: the P algorithm still computes CSR. In fact. but with respect to the new notion of reﬁnement. We ﬁnish the section with a remark. However. 13} (no a-transition to {15}) and {9. because Ref P and Ref NFA coincide for DFAs. .52 CHAPTER 3. δ. Σ. F) be a NFA. still holds. . part (c) of the proof.16 and Theorem 3. Remark 3. 14} of non-ﬁnal states and the block {15} of ﬁnal states. but they belong to diﬀerent blocks of CSR.6. 10. δ.22 Consider the NFA at the top of Figure 3. B ] P return P Notice that line 1 is diﬀerent from line 1 in algorithm LanPar. stating that CSR is equal to P . MINIMIZATION AND REDUCTION Using this deﬁnition we generalize LanPar(A) to NFAs in the obvious way: allow NFAs as inputs.8 is an example: all states accept diﬀerent languages. which showed that CSR reﬁnes P . all states might recognize diﬀerent languages. q0 . 11. Σ.

. 5.6: An NFA and a run of CSR() on it. 4. {3. . 14} {1. 13}) {3. b a 11 a a 6 a a a 1 2 a 3 7 a a a 4 a 5 12 a a 8 a 9 a a a 13 a a. 7. 8}) {1} {6. 10. {15}) {15} {1. 12} (a. 8}) {2.3. 5. . 12} {2} {3} {5. 11} (a. 8. . 10. 5. 4. b a 14 a a 10 a 15 a. 5. b {1. . 11}) {7. 13} (b. 8. . b a. 12} (a. 11. 2. {6. 7. 13}) {1. 11. 4. 12. {4. 8. 8. b a. 6. 14} (a. b 53 a. REDUCING NFAS a. . . {3. . 13} (a.3. {9. {4. 8} {3. 8}) {4. 13} {6} {11} Figure 3. 11} (b. 13} (a. {4. 6. 14}) {9.

8: An NFA A such that A/P is not minimal. b Figure 3.7: The quotient of the NFA of Figure 3. a a. MINIMIZATION AND REDUCTION a. b a a a a a a a a a a a. . b a. b a a a a a a. b a.54 CHAPTER 3.6. b a b Figure 3.

a) such that L(q1 ) = L(q2 ). and so L has inﬁnitely many residuals. q2 belong to the same block of CSR. This theorem provides a useful technique for proving that a given language L ⊆ Σ∗ is not regular: exhibit an inﬁnite set of words W ⊆ Σ∗ with pairwise diﬀerent residuals.3.9. . Let us apply the technique to some typical examples. then they not only recognize the same language. For instance. 2.e. 3} {2} (b.4. 3. there exists q2 ∈ δ(q2 . consider states 2 and 3 of the NFA on the left of Figure 3. while state 3 has no such successor. but state 2 has a c-successor. This can be used to show that two states belong to diﬀerent blocks of CSR. the canonical automaton C L recognizes L.3. {1. 6} {7} c a 1 b 3 2 c 4 e d {1. namely state 4. By Lemma 3. and so L has ﬁnitely many residuals. . {7}) {1. . the number of states of A is greater than or equal to the number of residuals of L. i. then C L necessarily has inﬁnitely many states.24 A language L is regular iff it has ﬁnitely many residuals. 3}) {1} {3} Figure 3. {1. Theorem 3. then no DFA recognizes it. 3.9: An NFA such that CSR P. . 2. v ∈ W. W must satisfy Lw Lv for every two distinct words w. 5} (d. {5}) {5} {4} {6} c 6 {1.. 3} (c. . Since.4 A Characterization of the Regular Languages We present a useful byproduct of the results of Section 3. {7}) 5 7 d. CSR has as many blocks as states. A possible run of of the CSR algorithm on this NFA is shown on the right of the ﬁgure. 6} (e. e (e. by Proposition 3. Proof: If L is not regular. but also satisfy the following far stronger property: for every a ∈ Σ and for every q1 ∈ δ(q1 .1. They recognize the same language. A CHARACTERIZATION OF THE REGULAR LANGUAGES 55 It is not diﬃcult to show that if two states q1 . that recognizes {d}. a). If L is regular. So states 2 and 3 belong to diﬀerent blocks of CSR. For this NFA. {7}) {4. then some DFA A recognizes it.6.

because a j i = j. 01. 2 2 v). we have bi ∈ La but bi La . 2. because all the words of 0 ∗ w encode the same number as w. • {w | msbf−1 (w) mod 3 = 0} ∩ Σ4 . Let W = Σ∗ . 1}.56 CHAPTER 3.e. Recall that every number has inﬁnitely many encodings. because 2 +2i+1 2 2 +2i+1 2 is only a square number for ai = a(i+1) . w we have w ∈ Lw but w Lv . v ∈ W (i. • {ww | w ∈ Σ∗ } is not regular. Show that a word w is accepted by the DFA iff X(w) = 3 · Y(w). but not to the a j -residual. if w = 00001011 then X(w) = 3 abd Y(w) = 1. 10. Let W = {an | n ≥ 0} (W = L in this case). and the bits as even positions as the mbsf encoding of Y(w). For every two distinct 2 2 2 words ai . where k ≥ 2. For every two distinct words ai . we have that a2i+i belongs to the ai -residual of L. Y(w)) ∈ IN0 × IN0 by reading the bits at odd positions as the msbf encoding encoding of X(w). MINIMIZATION AND REDUCTION • {an bn | n ≥ 0} is not regular.. because 00001011 → 00001011 → (0011. • Construct the minimal DFA for L4 . 1}∗ | msbf−1 (w) is divisible by 3}. a j ∈ W j i (i. For example. Exercise 19 Consider the family of languages Lk = {ww | w ∈ Σk }. • {w | msbf−1 (w) is a prime } ∩ Σ4 . 1) 1. a j ∈ W (i.. i j). • {an | n ≥ 0}. For every two distinct words w.e. . Let W = {ak | k ≥ 0}. i j). Construct the minimal DFA representing the language {w ∈ {0. 0001) → (3. Exercises Exercise 18 Consider the most-signiﬁcant-bit-ﬁrst encoding (msbf encoding) of natural numbers over the alphabet Σ = {0.. 11}: 00 10 11 00 01 11 start We interpret a word wΣ∗ as a pair of natural numbers (X(w). • How many states has the minimal DFA accepting Lk ? Exercise 20 Consider the following DFA over the alphabet Σ = {00. Construct the minimal DFAs accepting the following languages.e.

we denote w L = {u ∈ Σ∗ | uw ∈ L} .. a q1 a start q0 a q3 a b a a. b a q4 a 1. • Determine the inverse residuals of the ﬁrst two languages in Exercise 23.e. A language L ⊆ Σ∗ is an inverse residual of L if L = w L for some w ∈ Σ∗ . and compute CSR. i. 2. Show that this bound is tight. • (aa)∗ . the coarsest stable reﬁnement of P0 . Determine the CSR of A using the algorithm presented in the lecture. give a family of DFAs for which the loops is executed |Q| − 1 times.e. A CHARACTERIZATION OF THE REGULAR LANGUAGES 57 Exercise 21 Consider the language partition algorithm LanPar. • Does a language always have as many residuals as inverse residuals? b q2 a qf .3. b}∗ | w contains the same number of occurrences of ab and ba}. b}: • (ab + ba)∗ .4. • Show that a language is regular iff it has ﬁnitely many inverse residuals. i. Since every execution of its while loop increases the number of blocks by one. b a. • {an bn cn | n ≥ 0}. Exercise 23 Determine the residuals of the following languages over Σ = {a. Describe L(A). Exercise 24 Given a language L ⊆ Σ∗ and w ∈ Σ∗ . the loop can be executed at most |Q| − 1 times. Exercise 22 Describe in words the language of the following NFA. Hint: You can take a one-letter alphabet. • {w ∈ {a.

v) ∈ Σ∗ | uwv ∈ L} . MINIMIZATION AND REDUCTION Exercise 25 Given a language L ⊆ Σ∗ and w ∈ Σ∗ . then L1 is regular. then L2 is regular. • Can a language have more residuals than contexts? And more contexts than residuals? Exercise 26 A DFA with negative transitions (DFA-n) is a DFA whose transitions are partitioned into positive and negative transitions. 1. the w-context of L is the set of pairs {(u. 2} are not regular: • {0n 1n 2m | n. where |w|σ denotes the number of occurrences of the letter σ in word w. Exercise 28 Show that the following languages over {0.58 CHAPTER 3. • A superset of a regular language is regular. m ≥ 0} • {02n 123n | n ≥ 0} • {w | |w|0 = |w|2 }. Exercise 29 Prove or disprove: • A subset of a regular language is regular. • If L2 and L1 L2 are regular. . Show that every regular language L is recognized by an NFA whose number of states is equal to the number of prime residuals of L. A residual of L is prime if it is not composite. Exercise 27 A residual of a regular language L is composite if it is the union of other residuals of L. or • it ends in a non-ﬁnal state and the number of occurrences of negative transitions is odd. • Give a DFA-n for a regular language having fewer states than the minimal DFA for the language. A run of a DFA-n is accepting if: • it ends in a ﬁnal state and the number of occurrences of negative transitions is even. A language is a context of L if it is a w-context for at least one w ∈ Σ∗ . • Prove that the languages recognized by DFAs with negative transitions are regular. • If L1 and L1 L2 are regular. The intuition is that taking a negative transition “inverts the polarity” of the acceptance condition: after taking the transition we accept iff we would not accept were the transition positive. • Determine the contexts of L2 in Exercise 23. • Show that the minimal DFA-n for a language is not unique (even for languages whose minimal DFA-n’s have fewer states than their minimal DFAs).

.3. v ∈ Σ∗ . . the pair 001 and 110. right}. • The set of words containing the same number of occurrences of the strings 01 and 10. • Specify the set of all skylines as a regular language. bm . A word over Σ corresponds to a line in a grid consisting of concatenated segments drawn in the direction speciﬁed by the letters. It is a regular language. . if there are indices 1 ≤ i1 < i2 . • Show that the set of all rectangles is not regular. if w ∈ L and w v. • Give an algorithm that transforms a regular expression for a language into a regular expression for its upward-closure. • Prove using Higman’s lemma that every upward-closed language is regular. Higman’s lemma states that every inﬁnite set of words over a ﬁnite alphabet contains two words w1 . . < in ≤ m such that ai j = b j for every j ∈ {1. left. a language corresponds to a set of lines. Exercise 31 A word w = a1 . n}. . . an is a subword of v = b1 . . We denote the set of minimal words of L by min(L). . A CHARACTERIZATION OF THE REGULAR LANGUAGES Exercise 30 (T.) • Same for the pairs of strings 00 and 11. The set of all staircases can be speciﬁed as the set of lines given by the regular language (upright)∗ . then v ∈ l. 01010001 contains three occurrences of 01 and two occurrences of 10. 1} are regular? 59 • The set of words containing the same number of 0’s and 1’s. but the smallest NFA recognizing min(L) has O(2n ) states. Give a family of regular languages {Ln }∞ n=1 such that every Ln is recognized by a NFA with O(n) states. w2 such that w1 w2 . Exercise 32 A word of a language L is minimal if it is not a proper subword of another word of L. . denoted by w v.4. • Give regular expressions for the upward-closures of the languages in Execise 28.g. Henzinger) Which of these languages over the alphabet {0. and the pair 001 and 100. Exercise 33 Consider the alphabet Σ = {up. down. . The upward-closure of a language L is the upward-closed language obtained by adding to L all words v such that w v for some v ∈ L. In the same way. (E. . A language L ⊆ Σ∗ is upward-closed if for every two words w.

60 CHAPTER 3. MINIMIZATION AND REDUCTION .

returns true if X = U. given automata representations of the 61 . E). returns true if X = Y. for instance. Assume. false otherwise. Y) Equal(X. false otherwise. For instance. L) Complement(L) : : returns true if w ∈ L. returns X ∪ Y. we assume that each object of the universe is encoded by a word. we can easily modify them so that they work under a much weaker assumption. where U is the universe of objects. Then. Y are subsets of U. Y) : : : : : : : : returns true if x ∈ X. the operations on sets and elements become operations on languages and words. The assumption that each word encodes some object may seem too strong. Y) Union(X. false otherwise. returns X ∩ Y. returns true if X = ∅. not L. and assume that there exists a bijection between U and Σ∗ . However. but Intersection(L.e. returns U \ X. x is an element of U: Member(x. false otherwise. that almost always holds: the assumption that the language E of encodings is regular. false otherwise. Indeed. false otherwise. that E is a regular subset of Σ∗ and L is the language of encodings of a set X. X) Complement(X) Intersection(X. For each operation we present an implementation that. returns true if X ⊆ Y. We ﬁx an alphabet Σ. once we have implemented the operations above under this strong assumption. the ﬁrst two operations become Member(w. and each word is the encoding of some object.Chapter 4 Operations on Sets: Implementations Recall the list of operations on sets that should be supported by our data structures. Under this assumption. i.. returns L. X. Y) Empty(X) Universal(X) Included(X. we implement Complement(X) so that it returns. the language E of encodings is usually only a subset of Σ∗ .

δ. δ. δ. a). δ ← δ. add. tail(w) ) The complexity of the algorithm is O(|w|). q0 . when that is the return type). Implementing the complement operations on DFAs is easy. given a state q. F = ∅ for all q ∈ Q do if q F then add q to F . q0 . q0 ).1. It is convenient for future use to have an algorithm Member[A](w. Σ.62 CHAPTER 4. We assume further that. Output: true if w ∈ L(q). and the run is accepting iff it reaches a ﬁnal state. Σ. where A is the automaton representing L. F).2 consider the cases in which the representation is a DFA and a NFA. q0 . We assume that dictionary operations (lookup. F ) with L(B) = L(A) 1 2 3 Q ← Q. and a word w. head(w)) . Therefore. a state q of A.2 Complement. δ . F). OPERATIONS ON SETS: IMPLEMENTATIONS operands. q) Input: DFA A = (Q. the run on a word becomes accepting iff it was non-accepting. the algorithm looks as follows: MemDFA[A](w. F). q0 . 4. To check membership for a word w we just execute the run of the DFA on w. we can ﬁnd in constant time the unique state δ(q. if we swap ﬁnal and non-ﬁnal states. 4. we can decide in constant time if q = q0 . state q ∈ Q. So we get the following linear-time algorithm: CompDFA(A) Input: DFA A = (Q. word w ∈ Σ∗ . Member(w. Output: DFA B = (Q . Sections 4. and that given a state q and a letter a ∈ Σ. L) can then be implemented by Mem[A](w. Writing head(aw) = a and tail(aw) = w for a ∈ Σ and w ∈ Σ∗ . and if q ∈ F. Recall that a DFA has exactly one run for each word. Σ.1.1 Implementation on DFAs In order to evaluate the complexity of the operations we must ﬁrst make explicit our assumptions on the complexity of basic operations on a DFA A = (Q. returns an automaton representing the result (or a boolean. and so the new DFA accepts the word iff the new one did not accept it.1 and 4. remove) on Q and δ can be performed in constant time using hashing. 4. respectively. false otherwise 1 2 if w = then return q ∈ F else return Member[A]( δ(q. Σ. and checks if w is accepted with q as initial state. q0 ← q0 .1 Membership. q) that takes as parameter a DFA A.

. (q2 . each state of Comp(A) recognizes the complement of the language recognized by the same state in A. A2 ] on a word of Σ∗ is deﬁned as for DFAs. q2 ∈ Q2 }. .1. a. denoted by [A1 . they diﬀer only in the set of ﬁnal states. we just replace and by or. We call this DFA with a yet unspeciﬁed set of ﬁnal states the pairing of A1 and A2 . . A2 ] must accept w if and only if A1 accepts w and A2 does not accepts w. and so we declare [q1 . A2 ]. if the states of A recognize pairwise diﬀerent languages. [A1 . q2 ]. DFAs for diﬀerent boolean operations are obtained by adding an adequate set of ﬁnal states to [A1 . For diﬀerence. L2 be the languages For intersection. A2 ]. q2 ] | q1 ∈ Q1 . • q0 = [q01 . Given two DFAs A1 and A2 and a binary boolean operation like union. Σ. q02 ]. an is also a “pairing” of the runs of A1 and A2 over w. we give a generic implementation for all binary boolean operations. Σ. . stating that a DFA is minimal iff their states recognize diﬀerent languages. δ. or diﬀerence. the implementation returns a DFA recognizing the result of applying the operation to L(A1 ) and L(A2 ). q(n−1)1 −− qn1 −→ −→ −→ an a1 a2 q02 −− q12 −− q22 . Therefore. F2 ) be DFAs. . q01 −− q11 −− q21 . a. q01 . q02 . A2 ] on w. 4. q1 ) ∈ δ1 . It follows immediately from this deﬁnition that the run of [A1 . Apply now Corollary 3. δ2 . The DFAs for diﬀerent boolean operations always have the same states and transitions. Let L1 . IMPLEMENTATION ON DFAS 63 Observe that complementation of DFAs preserves minimality. Formally: Deﬁnition 4. F1 ) and A2 = (Q2 . . q2 ] ﬁnal if and only if q1 ∈ F1 and not q2 ∈ F2 . The run of [A1 . q2 ]) | (q1 . Σ.1. The pairing [A1 .. so do the states of Comp(A). q2 ] ﬁnal if and only if q1 ∈ F1 and q2 ∈ F2 . • δ = { ([q1 . A2 ] over a word w = a1 a2 .1 Let A1 = (Q1 . By construction. δ1 . This is achieved by declaring a state [q1 . .8. A2 ] must accept w if and only if A1 accepts w and A2 accepts w. For union. q2 ) ∈ δ2 }. q(n−1)1 q(n−1)2 −− −→ an qn1 qn2 is the run of [A1 . a. q0 ) where: • Q = { [q1 . Formally.. [A1 . A2 ] of A1 and A2 is the tuple (Q.3 Binary Boolean Operations Instead of speciﬁc implementations for union and intersection.4. intersection. q(n−1)2 −− qn2 −→ −→ −→ are the runs of A1 and A2 on w if and only if q01 q02 −− −→ a1 a1 a2 an q11 q12 −− −→ a2 q21 q22 . [q1 .

3 1. They recognize the words whose length is a multiple of 2 and a multiple of three.2 Figure 4. L2 and a binary boolean operation. 4 1. Given an alphabet Σ and a binary . 3 1. Figure 4. y]). 5 2. respectively. 4 1. and Mult 2 \ Mult 3. 3 2.64 CHAPTER 4. 3 1. We denote these languages by Mult 2 and Mult 3. 3 1. b} containing an even number of a’s and an even number of b’s. 4 2. returns a DFA recognizing the result of “applying” the boolean operation to L1 . respectively. 4 1. 5 a a a a a Figure 4. a a 1 a a 1. 5 2 3 a 4 a 5 a a a a a a 1. y instead of [x. union. This language is the intersection of the language of all words containing an even number of a’s. 4 1. 5 2. The Figure then shows the pairing of the two DFAs (for clarity the states carry labels x. 3 2. 4 2. and diﬀerence of their languages. 4 2. OPERATIONS ON SETS: IMPLEMENTATIONS Example 4. and the DFA for their intersection.3 The tour of conversions of Chapter 2 started with a DFA for the language of all words over {a. 4 2.1: Two DFAs. and three DFAs recognizing Mult 2 ∩ Mult 3. 5 a a a a a a 1.2 shows DFAs for these two languages. 3 2. 3 2.2 shows at the top two DFAs over the alphabet Σ = {a}. L2 . and DFAs for the intersection. First we formally deﬁne what this means. and the language of all words containing an even number of b’s. 5 2. Example 4. given two DFAs recognizing languages L1 . 5 2. We can now formulate a generic algorithm that. Mult 2 ∪ Mult 3. their pairing. 5 a a a a a a 1.

q02 . q2 ] to F for all a ∈ Σ do q1 ← δ1 (q1 . q2 ] from W add [q1 . 3 b b a a a 2.1. IMPLEMENTATION ON DFAS b a 1 a 2 a 1. q0 . q01 . Σ. false} → {true. δ. δ. parameterized by . Σ. we ﬁrst evaluate (w ∈ L1 ) and (w ∈ L2 ) to true of false. q2 ] to W add ([q1 . The generic algorithm. δ2 . q2 ] Q then add [q1 . boolean operator : {true. δ1 . 4 Figure 4. Σ. F2 ) Output: DFA A = (Q. a. q2 ← δ2 (q2 . and then apply to the results. a) if [q1 . q2 ] to Q if (q1 ∈ F1 ) (q2 ∈ F2 ) then add [q1 . F) with L(A) = L(A1 ) L(A2 ) 1 2 3 4 5 6 7 8 9 10 11 Q. q2 ]) to δ Popular choices of boolean language operations are summarized in the left column below. a). A2 ) Input: DFAs A1 = (Q1 .4. 3 b b 3 b a b b 4 a 65 1. 4 2. in order to decide if w belongs to L1 L2 . F ← ∅ q0 ← [q01 . . q2 ]. q02 ] W ← {q0 } while W ∅ do pick [q1 . looks as follows: BinOp[ ](A1 . false}. we lift ∗ ∗ 2Σ → 2Σ on languages as follows L1 L2 = {w ∈ Σ∗ | (w ∈ L1 ) (w ∈ L2 )} to a function : 2Σ × ∗ That is. [q1 . A2 = (Q2 . For instance we have L1 ∩ L2 = L1 ∧L2 . false} × {true.2: Two DFAs and a DFA for their intersection. F1 ). while the right column shows the corresponding boolean operation needed to instantiate BinOp[ ].

again an algorithm with complexity O(1) given normal form. given any regular language L.66 CHAPTER 4. Empty(A) Input: DFA A = (Q.5 Universality. 4. δ.4 Emptiness. F) Output: true if L(A) = ∅. As in Figure 4. any DFA for Mult (n1 · n2 ) has at least n1 · n2 states. q0 . then the complexity of Empty() is O(1). let Σ = {a}. we have Mult n1 ∩ Mult n2 = Mult (n1 · n2 ). In fact. A DFA accepts Σ∗ iff all its states are ﬁnal. 4. To show that the bound is reachable.1. Notice however.1. Σ. the minimal DFA for L ∩ L has one state. A DFA accepts the empty language if and only if it has no ﬁnal states (recall our normal form. For any two relatively prime numbers n1 and n2 . F) Output: true if L(A) = Σ∗ . regardless of the boolean operation being implemented. the minimal DFA recognizing Ln is a cycle of n states. but the result of the product construction is a DFA with the same number of states as the minimal DFA for L. UnivDFA(A) Input: DFA A = (Q. OPERATIONS ON SETS: IMPLEMENTATIONS Language operation Union Intersection Set diﬀerence (L1 \ L2 ) Symmetric diﬀerence (L1 \ L2 ∪ L2 \ L1 ) b1 b2 b1 ∨ b2 b1 ∧ b2 b1 ∧ ¬b2 b1 ⇔ ¬b2 The output of BinOp is a DFA with O(|Q1 | · |Q2 |). states. if we denote the minimal DFA for Mult k by Ak . where all states must be reachable!). false otherwise 1 return F = ∅ The runtime depends on the implementation. with the initial state being also the only ﬁnal state. Am ) = An·m . and O(|Q|) otherwise. and for every n ≥ 1 let Multn denote the language of words whose length is a multiple of n. q0 . then the complexity is O(|Q|). that in general minimality is not preserved: the product of two minimal DFAs may not be minimal. Σ. In particular. Therefore. If we keep a boolean indicating whether the DFA has some ﬁnal state. If checking F = ∅ requires a linear scan over Q.2. δ. false otherwise 1 return F = Q . then BinOp[∧](An .

a). A2 ) Input: DFAs A1 = (Q1 . Σ. false otherwise 1 2 3 4 5 6 7 8 9 10 Q ← ∅.4. Given two regular languages L1 . q02 . IMPLEMENTATION ON NFAS 67 4. . F2 ) be DFAs. This assumption no longer makes sense for NFA. Lemma 4. δ1 . ∅ iff at least one state The condition of the lemma can be checked by slightly modifying BinOp. we had the assumption that.7 Equality. a) if [q1 . q2 ] Q then add [q1 . F2 ) Output: true if L(A1 ) ⊆ L(A2 ). however. q2 ] to W return true 4. F1 ). W ← {[q01 . q02 .2. q2 ] of the DFA for L1 \ L2 is ﬁnal iff q1 ∈ F1 and q2 L2 iff L1 \ L2 F2 . the following lemma characterizes when L1 ⊆ L2 holds. Σ. q02 ]} while W ∅ do pick [q1 . a) is a set. 4.4 Let A1 = (Q1 . The algorithm is obtained by replacing Line 7 of IncDFA(A1 . For DFAs. q2 ] to Q if (q1 ∈ F1 ) and (q2 F2 ) then return false for all a ∈ Σ do q1 ← δ1 (q1 . A2 ) by if ((q1 ∈ F1 ) and q2 F2 )) or ((q1 F1 ) and (q2 ∈ F2 )) then return false . since δ(q.2 Implementation on NFAs For NFAs we make the same assumptions on the complexity of basic operations as for DFAs. just observe that L(A1 ) = L(A2 ) holds if and only if the symmetric diﬀerence of L(A1 ) and L(A2 ) is empty. q2 ← δ2 (q2 . F1 ) and A2 = (Q2 .1. we can ﬁnd in constant time the unique state δ(q. q01 . Proof: Let L1 = L(A1 ) and L2 = L(A1 ).1. For equality. δ1 . a). Σ. We have L1 [q1 . Σ. δ2 . given a state q and a letter a ∈ Σ. A2 = (Q2 . q01 . δ2 .6 Inclusion. L(A1 ) ⊆ L(A2 ) if and only if every state [q1 . L2 . A2 ] satisfying q1 ∈ F1 also satisﬁes q2 ∈ F2 . q2 ] from W add [q1 . The resulting algorithm checks inclusion on the ﬂy: InclDFA(A1 . q2 ] of the pairing [A1 .

head(w)) contains at most |Q| states. 2. and examining all of them one after the other in order to see if at least one is accepting is a bad idea: the number of runs may be exponential in the length of the word. word w ∈ Σ∗ . MemNFA[A](w) Input: NFA A = (Q. q0 . that is. The algorithm below does better. are shown on the right. 4} Figure 4. .68 CHAPTER 4. b a. 3} {1. 4} {2. So the overall runtime is O(|w| · |Q|2 ). and the word w = aaabba. 3. head(w)) to U W←U w ← tail(w) return (W ∩ F ∅) Example 4. while w ε do U←∅ for all q ∈ W do add δ(q.5 Consider the NFA of Figure 4. Σ. 4} {1. δ.1 Membership. OPERATIONS ON SETS: IMPLEMENTATIONS 4. the algorithm returns true. b 1 b a 3 b a 4 a Preﬁx read W a aa aaa aaab aaabb aaabba {1} {2} {2.3: An NFA A and the run of Mem[A](aaabba) . Each execution takes at most time O(|Q|). 3} {2. The successive values of W. Membership testing is slightly more involved for NFAs than for DFAs. Output: true if w ∈ L(A). For the complexity. 2.3. An NFA may have many runs on the same word. false otherwise 1 2 3 4 5 6 7 8 W ← {q0 }.2. the sets of states A can reach after reading the preﬁxes of w. For each preﬁx of the word it computes the set of states in which the automaton may be after having read the preﬁx. a b 2 a. Since the ﬁnal set contains ﬁnal states. 3. observe ﬁrst that the while loop is executed |w| times. because δ(q. 3. F). The for loop is executed at most |Q| times.

.. [A1 . It follows that the complementation operation for DFAs cannot be extended to NFAs: after exchanging ﬁnal and non-ﬁnal states the run ρ1 becomes non-accepting. but ρ2 becomes accepting. Output: DFA A with L(A) = L(A) 1 A ← CompDFA (NFAtoDFA(A)) Since making the NFA deterministic may cause an exponential blow-up in the number of states. So the new NFA still accepts w (at least ρ2 accepts). . The diﬀerence with respect to the DFA case is that the pairing may have multiple runs or no run at all on a word. . For this reason. the number of states of A may be O(2|Q| ).4. A2 ] of NFAs on a given word are deﬁned as for NFAs. Let us now discuss separately the cases of intersection. q(n−1)1 −− qn1 −→ −→ −→ an a1 a2 q02 −− q12 −− q22 .2. Recall that an NFA A may have multiple runs on a word w. .2. But we still have that q01 −− q11 −− q21 . A2 ] on w. but have another non-accepting run ρ2 on w. and it accepts w if at least one is accepting. So. and so it does not recognize the complement of L(A). an NFA can accept w because of an accepting run ρ1 . IMPLEMENTATION ON NFAS 69 4. with this choice of ﬁnal states. The pairing operation can be deﬁned exactly as in Deﬁnition 4. and set diﬀerence. q(n−1)1 q(n−1)2 −− −→ an qn1 qn2 is a run of [A1 . The runs of a pairing [A1 .2 Complement. CompNFA(A) Input: NFA A. A2 ] if q1 is a ﬁnal state of A1 and q2 is a ﬁnal state of q2 . In particular. and complementing the result. So we get the following algorithm: . A2 ] has an accepting run on w if and only if A1 has an accepting run on w and A2 has an accepting run on w. q2 ] be a ﬁnal state of [A1 . Then it is still the case that [A1 .2. union. . complementation for NFAs is carried out by converting to a DFA. Intersection.3 Union and intersection.1. A2 ] recognizes L(A1 ) ∩ L(A2 ). q(n−1)2 −− qn2 −→ −→ −→ are runs of A1 and A2 on w if and only if q01 q02 −− −→ a1 a1 a2 an q11 q12 −− −→ a2 q21 q22 . On NFAs it is no longer possible to uniformly implement binary boolean operations. 4. Let [q1 .

e. OPERATIONS ON SETS: IMPLEMENTATIONS IntersNFA(A1 . [q1 . q01 . A2 = (Q2 . The argumentation for intersection still holds if we replace and by or. and so an algorithm obtained from IntersNFA() by substituting or for and correctly computes a NFA for L(A1 ) ∪ L(A2 ). However. which means that the output of IntersNFA() is far from optimal in this case. q02 ] W ← { [q01 . Σ. The result of applying IntersNFA() is the NFA of Figure 3. q2 ] to W add ([q1 . q2 ] to Q if (q1 ∈ F1 ) and (q2 ∈ F2 ) then add [q1 . and adding a new initial state q0 . F) with L(A1 ∩ A2 ) = L(A1 ) ∩ L(A2 ) 1 2 3 4 5 6 7 8 9 10 11 Q. It is easy to see that.7. Union. Observe that the NFA has 15 states. q2 ]) to δ Notice that we overload the symbol ∩. q2 ]. a) do if [q1 . ∩ is associative and commutative in the following sense: L((A1 ∩ A2 ) ∩ A3 ) = L(A1 ) ∩ L(A2 ) ∩ L(A3 ) = L(A1 ∩ (A2 ∩ A3 )) L(A1 ∩ A2 ) = L(A1 ) ∩ L(A2 ) = L(A2 ∩ A1 ) For the complexity. a). Σ. So there is an NFA with 5 states that recognizes the intersection. all pairs of states are reachable.70 CHAPTER 4. as operation on NFAs. b}. F1 ). To get an NFA. Observe ﬁrst that a NFA-ε for L(A1 ) ∪ L(A2 ) can be constructed by putting A1 and A2 “side by side”. A2 ) Input: NFA A1 = (Q1 . So the runtime is O(|δ1 ||δ2 |). δ1 .4 over the alphabet {a. F2 ) Output: NFA A1 ∩ A2 = (Q. q2 ] to F for all a ∈ Σ do for all q1 ∈ δ1 (q1 . Example 4. the second one those containing at least one block. together with ε-transitions from q0 to q01 and q02 . q2 ] from W add [q1 . q0 . q2 ] Q then add [q1 . Even after applying the reduction algorithm for NFAs we only obtain the 10-state automaton of Figure 3. t2 ] of transitions of δ1 × δ2 . Σ. q02 . The automaton A1 ∩ A2 is often called the product of A1 and A2 . observe that in the worst case the algorithm must examine all pairs [t1 . we then apply the algorithm for removing ε-transitions. and denote the output by A1 ∩ A2 . Observe that in this example the intersection of the languages recognized by the two NFAs is equal to the language of the ﬁrst NFA. The ﬁrst one recognizes the words containing at least two blocks with two consecutive a’s each. δ. δ.6 in page 53.6 Consider the two NFAs of Figure 4. there is an algorithm that produces a smaller NFA.. q2 ∈ δ2 (q2 . . a. q02 ] } while W ∅ do pick [q1 . δ2 . but every pair is examined at most once. F ← ∅ q0 ← [q01 . i.

δ. q) to δ if δ−1 (q0i ) = ∅ then i remove q0i from Q for all a ∈ Σ. b a a a. IMPLEMENTATION ON NFAS a. δ2 .4. with the assumption that q0 is a fresh state. A2 ] has an accepting . q02 unreachable from the new initial state q0 . Set diﬀerence. and in this case we remove them. Let [q1 . The generalization of the procedure for DFAs fails. We −1 denote by δi (q0i ) the set of states q such that (q.5 is a graphical representation of how the algorithm works (up to removal of the old initial states). q01 . i where mi is the number of transitions of Ai starting at q0i . a. Then [A1 . F) Figure 4. a. b a a a. q02 . a. UnionNFA(A1 . a. F2 ) Output: NFA A1 ∪ A2 with L(A1 ∪ A2 ) = L(A1 ) ∪ L(A2 ) 1 2 3 4 5 6 7 8 9 10 11 12 Q ← Q1 ∪ Q2 ∪ {q0 } δ ← δ1 ∪ δ2 F ← F1 ∪ F2 for all i = 1. q) ∈ δi do add (q0 . q2 ] be a ﬁnal state of [A1 . δ1 . b 71 a. then the complexity is O(m1 +m2 ). b a a a. Σ. A2 ] if q1 is a ﬁnal state of A1 and q2 is not a ﬁnal state of q2 .2. A2 ) Input: NFA A1 = (Q1 . F1 ). 2 do if q0i ∈ Fi then add q0 to F for all (q0i . Removing the -transitions may make the old initial states q01 . respectively. Notice that the states become unreachable if and only if they had no incoming transitions in A1 and A2 . If emptiness of δ−1 (q0i ) can be checked in constant time. q0i ) for some a ∈ Σ. q ∈ δi (q0i . b Figure 4. A2 = (Q2 . Σ. Σ. q0 . not belonging to Q1 or Q2 . a) do remove (q0i . q) from δi return (Q.4: Two NFAs The algorithm below directly construct the ﬁnal result.

But “A2 has a non-accepting run on w” is no longer equivalent to “A2 has no accepting run on w”: this only holds in the DFA case.e.72 CHAPTER 4. 4. Since an NFA may have multiple runs on a word. Theorem 4. Universality requires a new algorithm. OPERATIONS ON SETS: IMPLEMENTATIONS a b a a b b a a b a a b a b a Figure 4. involves a worstcase exponential blowup in the size of A. Given an NFA A = (Q. Since complementation. q0 .5: Union for NFAs run on w if and only if A1 has an accepting run on w and A2 has a non-accepting run on w. which. the superpolynomial blowup cannot be avoided unless P = PS PACE. to a set of states of A containing no . is universal.3. We show that the universality problem is PSPACE-complete. The polynomial-space nondeterministic algorithm for universality looks as follows.4 Emptiness and Universality.2. An example is the NFA of Figure 4. an NFA may be universal even if some states are non-ﬁnal: for every word having a run that leads to a non-ﬁnal state there may be another run leading to a ﬁnal state. Σ.which is unlikely. Emptiness for NFAs is decided using the same algorithm as for DFAs: just check if the NFA has at least one ﬁnal state. as we shall show in this section. To prove that the problem is in PSPACE. A language L is universal if and only if L is empty.7 The universality problem for NFAs is PSPACE-complete Proof: We only sketch the proof. F). however. That is.. δ. An algorithm producing an NFA A1 \ A2 recognizing L(A1 ) \ L(A2 ) can be obtained from the algorithms for complement and intersection through the equality L(A1 ) \ L(A2 ) = L(A1 ) ∩ L(A2 ). the algorithm guesses a run of B = NFAtoDFA(A) leading from {q0 } to a non-ﬁnal state. the algorithm requires exponential time and space. we show that it belongs to NPSPACE and apply Savitch’s theorem. i. and so universality of an NFA A can be checked by applying the emptiness test to A.

. but not its transitions. an NFA A(M. Proof: Since A and B recognize the same language. #cn . Deﬁnition 4. where the ci ’s are the encodings of the conﬁgurations. and we are done. x) encoding the accepting run of M on x. Second. #cn . then no word encodes an accepting run of M on x. A state Q of B is minimal if no other state Q satisﬁes Q ⊂ Q . A is universal iff B is universal. . . So A is universal iff every state of B is ﬁnal. • If M rejects x.8 Let A be a NFA. We show that it is not necessary to completely construct A. A is universal iff every minimal state of B is ﬁnal. First. then there is a word w(M. . x)}. x)) = Σ∗ . and let B = NFAtoDFA(A). We prove PSPACE-hardness by reduction from the acceptance problem for linearly bounded automata. where the ci ’s are conﬁgurations.2. So M accepts x if and only if L(A(M. and so it only needs linear space in the size of A. and so every state of B is ﬁnal iff every minimal state of B is ﬁnal. and so L(A(M. . A Subsumption Test. x) accepting all the words of Σ∗ that do not satisfy at least one of the conditions (a)-(d) above.9 Let A be a NFA. The reduction shows how to construct in polynomial time. given a linearly bounded automaton M and an input x. IMPLEMENTATION ON NFAS 73 ﬁnal state (if such a run exists). it is not necessary to store all states. the universality check for DFA only examines the states of the DFA. and (d) for every 0 ≤ i ≤ n − 1: ci+1 is the successor conﬁguration of ci according to the transition relation of M. A linearly bounded automaton is a deterministic Turing machine that always halts and only uses the part of the tape containing the input. A conﬁguration of the Turing machine on an input of length k is coded as a word of length k. Given an input x. x)) = Σ∗ \ {w(M. x)) = Σ∗ . (b) c0 is the initial conﬁguration. The algorithm only does not store the whole run.4. Let Σ be the alphabet used to encode the run of the machine. not the transitions. (c) cn is an accepting conﬁguration. and so L(A(M. The run of the machine on an input can be encoded as a word c0 #c1 . So instead of NFAtoDFA(A) we can apply a modiﬁed version that only stores the states of A. We then have • If M accepts x. M accepts if there exists a word w of Σ∗ satisfying the following properties: (a) w has the form c0 #c1 . Proposition 4. and let B = NFAtoDFA(A). But a state of B is ﬁnal iff it contains some ﬁnal state of A. only the current state.

1 a b Figure 4. 3 a b 2. 3. 4. Σ. with the minimal states shaded. F) Output: true if L(A) = Σ∗ . a) to W return true . Proposition 4. 4 1 b a a 2. Only the states {1}. a)).10 Figure 4.6 shows a NFA on the left. and {3. Since all states of the DFA are ﬁnal. but introduces at line 8 a subsumption test: it checks if some state Q ⊆ δ(Q . q0 . and the equivalent DFA obtained through the application of NFAtoDFA() on the right. b 1 b a 3 b a a b 4 a a b b 2. Algorithm UnivNFA(A) below constructs the states of B as in NFAtoDFA(A). 4 a b 2 a b a a 3. 3. δ. OPERATIONS ON SETS: IMPLEMENTATIONS Example 4. 4} (shaded in the picture). In both cases. UnivNFA(A) Input: NFA A = (Q. are minimal.9 shows that it suﬃces to construct and store the minimal states of B. the NFA is universal.74 CHAPTER 4. a)) or is non-minimal (case Q ⊂ δ(Q . 3. false otherwise 1 2 3 4 5 6 7 8 9 Q ← ∅. a) then add δ(Q . 3. 4 1. a b 2 a.6: An NFA. 1 a b 2. W ← { {q0 } } while W ∅ do pick Q from W if Q ∩ F = ∅ then return false add Q to Q for all a ∈ Σ do if W ∪ Q contains no Q ⊆ δ(Q . In this case either δ(Q . {2}. and the result of converting it into a DFA. b a. the state is not added to the worklist. a) has already been constructed. a) has already been constructed (case Q = δ(Q .

2. and so Q2 ∈ Qt . A state [q1 .11 Let A = (Q. A2 . . . Σ. A2 be NFAs. to Qn .4. the state Q1 is eventually added to and picked from the worklist. This implies Q2 Qt (otherwise an Q2 . an a1 Assume there is a path π = Q1 −− Q2 . If UnivNFA(A) would ﬁrst generate all states of A and then would remove all non-minimal states. So B has a path π = Q2 −− Q3 . . and let B2 = NFAtoDFA(A2 ). but the pairing [A. Qn . But the algorithm removes non-minimal states whenever they appear. . So at that moment either the worklist or Q contain a state Q2 ⊂ Q2 . . . Proposition 4. δ. Recall Lemma 4. Q3 ⊆ Q3 . B2 ] is minimal if no other state [q1 . and let B = NFAtoDFA(A). Deﬁnition 4. Q2 ] satisﬁes q1 = q1 and Q2 ⊂ Q . Qn−1 −− Qn is a shorter path satisfying the same properties). as in the case of universality. However. By the minimality of Qn . On the other hand. a) must be compared with every set in W ∪ Q. . . . since Q1 ∈ Qt .2. the proof would be trivial. . A2 ] satisfying q1 ∈ F1 also satisﬁes q2 ∈ F2 . To see why. This condition can be checked using the constructions for intersection and for the emptiness check. one of them leading to a ﬁnal state q1 . q2 ] of [A1 . This state is eventually added to Q (if it is not already there). we observe that L1 ⊆ L2 holds if and only if L1 ∩ L2 = ∅. a2 an 4. the other to a non-ﬁnal state q2 . which is minimal and not in to Qt . and so π leads from Q2 . To obtain an algorithm for checking inclusion. Notice that the complexity of the subsumption test may be considerable. and −→ −→ Qn is minimal. We show that no path of B leads from a state of Qt to a minimal state of B not in Qt . This lemma no longer holds for NFAs.12 Let A1 . . and we must show that this does not prevent the future generation of other minimal states. Qn ⊆ Qn (notice that we may have Q3 = Q3 ). . This contradicts the assumption that π is as short as possible. . which belongs to Qt . A] has a run on w leading to [q1 . we get Qn = Qn . IMPLEMENTATION ON NFAS 75 The next proposition shows that UnivNFA(A) constructs all minimal states of B. Qn Qt .4: given two DFAs A1 . the set Q contains all minimal states of B. we can apply a subsumption check. Qn−1 −− Qn of B such that Q1 ∈ Qt . Qn−1 −− Qn for some states Q3 . We have L(A) ⊆ L(A). F) be a NFA. q0 . When Q1 is processed at line 7 the algorithm considers Q2 = δ(Q1 . . and so Q2 is never added to −→ the worklist. After termination of UnivNFA(A). q2 ]. Assume further that π is as short as possible. Proof: Let Qt be the value of Q after termination of UnivNFA(A). it follows that every minimal state of B belongs to Qt . Q2 ] of [A1 . let A be any NFA having two runs for some word w. Good use of data structures (hash tables or radix trees) is advisable. Since {q0 } ∈ Qt and all states of B are reachable from {q0 }. because the new set δ(Q . Since Q2 ⊂ Q2 we have −→ −→ Q2 ⊂ Q2 . . but does not add it to the worklist in line 8. the inclusion L(A1 ) ⊆ L(A2 ) holds if and only if every state [q1 . a1 ).5 Inclusion and Equality.

but Q2 does not.t. But [A1 . Since PSPACE=co-PSPACE=NPSPACE. Σ. q02 . So we get the following algorithm to check inclusion: InclNFA(A1 . F2 ) be NFAs. Hardness follows from the fact that A is universal iff Σ ⊆ L(A). the algorithm guesses a word w ∈ L(A1 ) \ L(A2 ) letter by letter. Output: true if L(A1 ) ⊆ L(A2 ). δ1 . Given NFAs A1 and A2 . δ1 . As was the case for universality. So the algorithm runs in |Q1 |2 2O(|Q2 |) time and space. Q2 ] from W if (q1 ∈ F1 ) and (Q2 ∩ F2 = ∅) then return false add [q1 . Proposition 4. q1 = q1 and Q2 ⊆ Q2 then add [q1 . the inclusion problem is PSPACE-complete. Q2 ] of [A1 . q01 .14 The inclusion problem for NFAs is PSPACE-complete Proof: We ﬁrst prove membership in PSPACE. L(A1 ) ⊆ L(A2 ) iff every minimal state [q1 . Proof: Since A2 and B2 recognize the same language. For each of them we have to execute the for loop O(|Q1 |) times. is a subproblem of the inclusion problem. q1 ∈ δ1 (q1 . the algorithm checks that Q1 contains some ﬁnal state of A1 .13 Let A1 = (Q1 . maintaining the sets Q1 .76 CHAPTER 4. OPERATIONS ON SETS: IMPLEMENTATIONS Proposition 4. and so the exponential cannot be avoided unless P = PS PACE. W ← { [q01 . Σ. F1 ). which is PSPACE-complete. q01 . each of them taking O(|Q2 |2 ) time. Q2 of states that A1 and A2 can reached by the word guessed so far. Q2 ] to Q for all a ∈ Σ. L(A1 ) ⊆ L(A2 ) iff L(A1 ) ∩ L(A2 ) = ∅ iff L(A1 ) ∩ L(B2 ) = ∅ iff [A1 . A2 ) Input: NFAs A1 = (Q1 . Without the test. Σ. it sufﬁces to give a polynomial space nondeterministic algorithm that decides non-inclusion. A2 = (Q2 . Q2 ] to W return true Notice that in unfavorable cases the overhead of the subsumption test may not be compensated by a reduction in the number of states. false otherwise 1 2 3 4 5 6 7 8 9 10 11 Q ← ∅. B2 ] has a state [q1 . Q2 ] s. a) if W ∪ Q contains no [q1 . F1 ). a) do Q2 ← δ2 (Q2 . the number of pairs that can be added to the worklist is at most |Q1 |2|Q2 | . and let B2 = NFAtoDFA(A2 ). and so the universality problem. δ2 . {q02 }] } while W ∅ do pick [q1 . When the guessing ends. Q2 ] such that q1 ∈ F1 and Q2 ∩ F2 = ∅. B2 ] has some state satisfying this condition iff it has some minimal state satisfying it. B2 ] satisfying q1 ∈ F1 also satisﬁes Q2 ∩ F2 ∅. .

The number the number of pairs that can be added to the worklist is then at most |Q1 ||Q2 |. Exercises Exercise 34 Consider the following languages over the alphabet Σ = {a. . Equality of two languages is decided by checking that each of them is included in the other. Construct an NFA for the language (L1 \ L2 ) ∪ (L3 L4 ) ∩ L5 ∩ L6 where L L denotes the symmetric diﬀerence of L and L .4. Equality.. Exercise 35 Prove or disprove: the minimal DFAs recognizing a language and its complement have the same number of states. a ∈ L4 . a L3 . • L3 is the set of all words where a occurs only at even positions. e. .g. The equality problem is again PSPACE-complete. • Any further automaton must be constructed from already existing automata via an algorithm introduced in the chapter.2. So the algorithm runs in O(|Q1 |2 |Q2 |) time and space. etc. • L4 is the set of all words where a occurs only at odd positions. • L5 is the set of all words of odd length. Comp. The only point worth observing is that. BinOp. . • L2 is the set of all words where every non-empty maximal sequence of consecutive a’s has odd length. . Remark : For this exercise we assume that the ﬁrst letter of a nonempty word is at position 1. • L6 is the set of all words with an even number of a’s.g. UnionNFA. but each of them takes O(1) time. namely when A2 is a DFA. Try to ﬁnd an order on the construction steps such that the intermediate automata and the ﬁnal result have as few states as possible. e. b}: • L1 is the set of all words where between any two occurrences of b’s there is at least one a. unlike the inclusion case. . The for loop is still executed O(|Q1 |] times. IMPLEMENTATION ON NFAS 77 There is however an important case with polynomial complexity. while sticking to the following rules: • Start from NFAs for L1 . L6 . NFAtoDFA. we do not get a polynomial algorithm when A2 is a DFA.

Exercise 40 Consider again the regular expressions (1 + 10)∗ and 1∗ (101∗ )∗ of Exercise 2. Σ. prove that the emptiness problem for alternating automata is PSPACE-complete. 1} • Give a regular expression r such that L(r ) = L(r). q0 . respectively. δ. Exercise 39 Find a family {An }∞ of NFAs with O(n) states such that every NFA recognizing the n=1 complement of L(An ) has at least 2n states. Show that the alternating automaton A = (Q2 . • Show that the emptiness problem for alternating automata is PSPACE-complete. • Give linear time algorithms that take two alternating automata recognizing languages L1 . Q2 . F) be an alternating automaton. q0 . δ. Σ. Q \ F) recognizes the complement of the language recognized by A. • Construct minimal DFAs for the expressions and check whether they are isomorphic. Hint: Using Exercise 41. Hint: The algorithms are very similar to UnionDFA. Hint: See Exercise 10. . q0 . show that alternating automata can be complemented by exchanging existential and universal states. where Q1 and Q2 are the sets of existential and universal states. Show that with the universal accepting condition of Exercise 9 the automaton A = (Q.e. Σ.. q0 . I.78 Exercise 36 CHAPTER 4. F) be an NFA. • Construct NFAs for the expressions and use InclNFA to check if their languages are equal. δ. and ﬁnal and non-ﬁnal states. Σ. • Construct DFAs for the expressions and use InclDFA to check if their languages are equal. • Let A = (Q1 . Q1 . • Construct a regular expression r such that L(r ) = L(r) using the algorithms of this and the preceding chapters. L2 and deliver a third alternating automaton recognizing L1 ∪ L2 and L1 ∩ L2 . OPERATIONS ON SETS: IMPLEMENTATIONS Let r be the regular expression ((0 + 1)(0 + 1))∗ over Σ = {0. Exercise 38 Recall the alternating automata introduced in Exercise ??. and δ : (Q1 ∪ Q2 ) × Σ → P(Q1 ∪ Q2 ). Exercise 37 Let A = (Q. δ. Q \ F) recognizes the complement of the language recognized by A.

. . . then L(A1 ⊗ A2 ) = L(A1 ) ∪ L(L2 ). An NFA A = (Q. an over an alphabet Σ. q2 ] to F . • Prove that if Σ is a 1-letter alphabet then the problem is “only” NP-complete. Given NFA A1 . Decide: Is n L(Ai ) = ∅? i=1 79 Hint: Reduction from the acceptance problem for deterministic. Σ. and (L1 L2 ) L2 Consider the variant of IntersNFA in which line 7 if (q1 ∈ F1 ) and (q2 ∈ F2 ) then add [q1 .) 1. linearly bounded automata. Determine the inclusion relations between the following languages: L1 . An over the same alphabet Σ. 3. • Prove that if the DFAs are acyclic (but the alphabet arbitrary) then the problem is again NP-complete.2. the left quotient of L1 by L2 is the language L2 L1 := {v ∈ Σ∗ | ∃u ∈ L2 : uv ∈ L1 } (Note that L2 L1 is diﬀerent from the set diﬀerence L2 \ L1 .4. 2. deﬁned as L1 L2 := {u ∈ Σ∗ | ∃v ∈ L2 : uv ∈ L1 }. δ. q0 . we deﬁne the even part of w as the word a2 a4 . . Hint: reduction from 3-SAT. Do the same for the right quotient of L1 by L2 . L2 over an alphabet Σ. IMPLEMENTATION ON NFAS Exercise 41 • Prove that the following problem is PSPACE-complete: Given: DFAs A1 . F) is complete if δ(q. (L1 L2 )L2 . A2 such that L(A1 ⊗ A2 ) = L(A1 ) ∪ L(L2 ) but neither A1 nor A2 are complete. a) ∅ for every q ∈ Q and every a ∈ Σ. We call A1 ⊗ A2 the or-product of A1 and A2 . . construct an NFA A such that L(A) = L(A1 ) L(A2 ). . q2 ] to F Let A1 ⊗ A2 be the result of applying this variant to two NFAs A1 . construct an NFA recognizing the even parts of the words of L. . • Give NFAs A1 . • Prove: If A1 and A2 are complete NFAs. Given an NFA for a language L. A2 . A2 . Exercise 42 is replaced by if (q1 ∈ F1 ) or (q2 ∈ F2 ) then add [q1 . Exercise 43 Given a word w = a1 a2 . Exercise 44 Given regular languages L1 . . a n/2 .

Exercise 49 Check by means of UnivNFA whether the following NFA is universal. an ∈ Σ. Exercise 46 (Abdulla.80 CHAPTER 4. • Prove that every downward-closed language is regular. .e. where a. if w ∈ L and v w. then v ∈ l. . it is represented by a regular expression. . . 1.. Construct the minimal DFA for L. i. a1 . (a + ) or (a1 + . + pn of products. . . 2. . en of atomic expressions. . v ∈ Σ∗ . A product is a concatenation e1 e2 . A simple regular expression is a sum p1 + . A language L ⊆ Σ∗ is downward-closed if for every two words w. OPERATIONS ON SETS: IMPLEMENTATIONS Exercise 45 We have shown in Exercise 31 that every upward-closed language is regular. Hint: since every downward-closed language is regular. Bouajjani. and Jonsson) An atomic expression over an alphabet Σ∗ is an expression of the form ∅. + an )∗ . . Exercise 47 Let Li = {w ∈ {a}∗ | the length of w is divisible by i }. Construct an NFA for L := L4 ∪ L6 with at most 11 states. a word accepted by the automaton. . Prove that this expression is equivalent to a simple regular expression. • Prove that the language of a simple regular expression is downward-closed. • Give an algorithm that transforms a regular expression for a language into a regular expression for its downward-closure. • Same for a shortest witness. • Prove that every downward-closed language can be represented by a simple regular expression. . . Exercise 48 • Modify algorithm Empty so that when the DFA or NFA is nonempty it returns a witness.

ab||d = {abd. acdb. ∗ aw bv = {a} (w bv) ∪ {b} (aw v) ∪ {bw | w ∈ au||v} For example we have: b||d = {bd. adb. db}. b a a. IMPLEMENTATION ON NFAS q2 a. acbd. u∈L1 . u v. .4. n ∈ IN} is also not regular. .v∈L2 Given DFAs recognizing languages L1 . b 81 a. . . Σ2 be two alphabets. Observe that if Σ1 = {a1 . w2 ∈ Σ∗ . . cadb. Use the preceding results to show that {(01k 2)n 3n | k. v ∈ Σ∗ : w ε = {w} ε w = {w} : Σ∗ × Σ∗ → 2Σ as follows. . Describe how to construct 1 2 a NFA for the language h−1 (L(A)) := {w ∈ Σ∗ | h(w) ∈ L(A)}. Let h : Σ∗ → Σ∗ be a homomorphism and let A be a NFA over Σ1 . abcd. . ab||cd = {cabd. Describe how to constuct 1 2 a NFA for the language h(L(A)) := {h(w) | w ∈ L(A)}. 1 3. L2 ⊆ Σ∗ construct an NFA recognizing their shuﬄe L1 L2 := Exercise 51 Let Σ1 . 2. b ∈ Σ and w. b a q4 b b a q5 start q3 q1 Exercise 50 Let Σ be an alphabet. dab}. A homomorphism is a map h : Σ∗ → Σ∗ such that 1 2 h(ε) = ε and h(w1 w2 ) = h(w1 )h(w2 ) f oreveryw1 .2. and deﬁne the shuﬄe operator where a. cdab}. . . Recall that the language {0n 1n | n ∈ IN} is not regular. 1. an } then h 1 is completely determined by the values h(a1 ). Let h : Σ∗ → Σ∗ be a homomorphism and let A be a NFA over Σ2 . h(an ).

Draw a DFA recognizing L = Name:Tel(#Telephone)∗ .82 CHAPTER 4. (A superﬁcial analysis gives that for NFAs with n1 and n2 states B and C have O(2n1 ·n2 ) and O(2n1 +n2 ) states. . :.) Exercise 54 A DFA is synchronizing if there is a word w and a state q such that after reading w from any state the DFA is always in state q. wrongly suggesting that C might be more compact. . . . Z. 1. a substitution is a map f : Σ → 2∆ assigning to each letter ∗ ∗ a ∈ Σ a language La ⊆ ∆∗ . . b start a a a b a b b . .A subsitution f can be canonically extended to a map 2Σ → 2∆ by deﬁning f (ε) = ε. Note that a homomorphism can be seen as the special case of a substitution in which all La ’s are singletons. . . Exercise 53 Given two NFAs A1 and A2 . . and for every a ∈ Σ an NFA recognizing f (a). Tel. . a substitution f . Sketch an NFA-reg recognizing f (L). NFAtoDFA(A2 )). #}. f (wa) = f (w) f (a). OPERATIONS ON SETS: IMPLEMENTATIONS ∗ Exercise 52 Given alphabets Σ and ∆. + 9)8 {#} 1. Give an algorithm that takes as input an NFA A. . :. and so in particular have the same number of states. 0. 9. + 9)10 + 00420(1 + . . let B = NFAtoDFA(IntersNFA(A1 . 1. . respectively. #}. A2 )) and C = IntersDFA(NFAtoDFA(A1 ). and f (L) = w∈L f (w). Show that B and C are isomorphic. 3. Show that the following DFA is synchronizing. . Let Σ = {Name. + 9)(0 + 1 + . and returns an NFA recognizing f (L(A)). + 9)(0 + 1 + . . 2. let ∆ = {A. and let f be the substitution f given by f (Name) f (:) f (Tel) f (#) = = = = (A + · · · + Z)∗ {:} 0049(1 + . .

2. Give a polynomial time algorithm to decide if a given DFA is synchronizing. IMPLEMENTATION ON NFAS 2. 83 .4. Give an algorithm to decide if a given DFA is synchronizing. 3.

OPERATIONS ON SETS: IMPLEMENTATIONS .84 CHAPTER 4.

k]-factor. and [0. 5. If w1 and w1 w have lengths k and k . Adapting the algorithms to also provide this information is left as an exercise. some word of L(p) occurs in t if and only if some preﬁx of t belongs to L(Σ∗ p). Example 5. Clearly.e. the algorithm RegtoNFA). maintaining the set of states reachable from the initial state by the preﬁx read so far. aabab ∈ L(p). 5]-factors of aabab belong to L(p). So we construct an NFA A for the regular expression Σ∗ p using the rules of Figure 2. 5]-. 3]-. [3. Given w.1 Let p = a(ab∗ a)b. k ]-factor of w. and a regular expression p over Σ (called the pattern). but also in the starting position k . the [1. Usually one is interested not only in ﬁnding the ending position k of the [k .9 (i. we say that w is a factor of w if there are words w1 . k]-factor of t belongs to L(p). and then simulate A on t as in MemNFA[A](q0 . w ∈ Σ∗ . respectively. respectively. we say that w is the [k. Solution 1. Recall that the simulation algorithm reads the text letter by letter. So the simulation reaches a set of states S containing a ﬁnal state if and only if the preﬁx read so far belongs to L(Σ∗ p). using nondeterministic and deterministic automata.Chapter 5 Applications I: Pattern matching As a ﬁrst example of a practical application of automata.t) on page 62. The pattern matching problem is deﬁned as follows: Given a word t ∈ Σ+ (called the text).1 The general case We present two diﬀerent solutions to the pattern matching problem. Here is the pseudocode for this algorithm: 85 . Since ab.. we consider the pattern matching problem. w2 ∈ Σ∗ such that w = w1 w w2 . So the ﬁrst occurrence of p in aabab is 3. We call k the ﬁrst occurrence of p in t. determine the smallest k ≥ 0 such that some [k .

the overall runtime is O(n) + 2O(m) . pattern p Output: the ﬁrst occurrence k of p in t. The complexity of PatternMatchingDFA for a word of length n and a pattern of length m can be easily estimated: RegtoNFA(p) takes O(m) time. and. but constructing a DFA for Σ∗ p instead of a NFA: Solution 2.86 CHAPTER 5. an ∈ Σ+ . 1 2 3 4 5 6 A ← NFAtoDFA(RegtoNFA(Σ∗ p)) q ← q0 for all i = 0 to n − 1 do if q ∈ F then return k q ← δ(q. RegtoNFA(Σ∗ p) takes O(m) time1 . p) Input: text t = a1 . each line of the loop’s body takes at most O(k2 ) time. the membership check for a DFA is faster. Since the loop is executed at most n times.3. . pattern p ∈ Σ∗ Output: the ﬁrst occurrence k of p in t. PatternMatchingDFA(t. p) Input: text t = a1 . and each line of the body takes constant time. . ai ) return ⊥ Notice that there is trade-oﬀ: while the conversion to a DFA can take (much) longer than the conversion to a NFA. then the complexity of PatternMatchingNFA for a word of length n and a pattern of length m can be estimated as follows. . 1 If Σ does not have ﬁxed size. an ∈ Σ+ . then constructing the NFA for Σ∗ p takes O(m + |Σ|) time. . Since RegtoNFA(p) takes O(m) time. or ⊥ if no such occurrence exists. and so the loop runs in O(nm2 ) time. . and so the call to NFAtoDFA (see Table 2. We proceed as in the previous case. APPLICATIONS I: PATTERN MATCHING PatternMatchingNFA(t. ai ) return ⊥ If we assume that the alphabet Σ has ﬁxed size. or ⊥ if no such occurrence exists.1) takes 2O(m) time and space. we have k ∈ O(m). The loop is executed at most n times. for an automaton with k states. The overall runtime is thus O(m + nm2 ) = O(nm2 ). 1 2 3 4 5 6 A ← RegtoNFA(Σ∗ p) S ← {q0 } for all k = 0 to n − 1 do if S ∩ F ∅ then return k S ← δ(S .

2 The word case We study the special but very common special case of the pattern-matching problem in which we wish to know if a given word appears in a text. then it is “thrown back” to state na. bk . F). It is easy to ﬁnd a faster algorithm for this special case. . For instance. .2 for convenience) when applied to a word pattern p. 0}. . Σ. We use the notation δ(S . . Figure 5. m}. Proposition 5.1(c). the alphabet could have as many letters as the text). m}. . giving a runtime of O(nm).. . bi+1 . Notice that in some applications both n and m can be large. 3 is the largest element of {3. m − 1]} ∪ {(0. q0 . .. . one letter at a time. and given a set S ⊆ {0. and there is a natural correspondence between the states of A p and B p : each state of A p is the largest element of exactly onestate of B p . and moves to state na. . . bk . 1. 1. and let S k be the k-th set picked from the worklist during the execution of NFAtoDFA(A p ). The number of moves is n − m + 1. THE WORD CASE 87 5. without any use of automata theory: just move a “window” of length m over the text t.e. is again a state of B p . an and an arbitrary but ﬁxed word pattern p = b1 . and check after each move whether the content of the window is p. 0) | a ∈ Σ} . because if the next two letters are n and o.2 Let p = b1 . 0} is {1. a. 0}. 1. It has as many states as A p . . a). We call this largest element the head of the state. then B p should accept! B p has another property that will be very important later on: for each state S of B p (with the exception of S = {0}) the tail of S . and not only in the spacial case p = nano. In this case the pattern p is the word itself. For instance. Not to ε. bm . For the rest of the section we consider an arbitrary but ﬁxed text t = a1 . where Q = {0.1(b) shows the DFA B p obtained by applying NFAtoDFA to A p . a) = q∈S δ(q. then it “makes progress”.1(a) shows the obvious NFA A p recognizing Σ∗ p for p = nano. 1.e. which is also a state of B p . F = {m}. i + 1) | i ∈ [0. q0 = 0. If we label a state with head k by the string b1 . • if B p is in state nan and reads an a. Clearly. and 4 is the largest element of {4. bm . In general. and the diﬀerence between O(nm) and O(m + n) very signiﬁcant. then we can easily interpret how B p works: it keeps track of how close it is to ﬁnding nano. . the result of removing the head of S . as shown in Figure 5. and a check requires O(m) letter comparisons. In the rest of the section we present a faster algorithm with time complexity O(m+n) independently of the size of the alphabet (i.5. and . Then (1) h(S k ) = k (and therefore k ≤ m). . we prove the following invariant of NFAtoDFA (which is shown again in Table 5. . . δ. i. Figure 5. the tail of {3. A p can reach state k whenever the word read so far ends with b0 . . respectively. Formally. . A p = (Q.2. . 0}. We show that this property and the ones above hold in general. denote the head and tail of S by h(S ) and t(S ) . For instance: • if B p is in state n and reads an a. and δ = {(i.

By (3). So S ∈ Q. or k > 0 and t(S k ) ∈ Q. and so S k+1 = δ(S k . The proof is by induction on k. Moreover. a) = δ(t(S k ). a). h(S k ) = k and t(S k ) ∈ Q by (1) and (2).1: NFA A p and DFA B p for p = nano (2) either k = 0 and t(S k ) = ∅. moreover. We claim k < m. Then δ(h(S k ). it is added while S k is being processed at lines 7-10. 0 o 4. 0 a n 2. S has later been picked from the worklist and added to Q. but also (3) after picking S k the worklist is empty. a) = ∅ for every a ∈ Σ. a) = δ(m. For k = 0 we have S 0 = {0}. 0 n a 3. We show that if (1)-(3) hold for S k and a state S k+1 is added to the list. Since t(S k ) ∈ Q by (2). . S has already added to the worklist before. a) for some a ∈ Σ. respectively. and (1)-(3) hold.88 Σ 0 n CHAPTER 5. 0 n other (b) n other other other other n n a n na n a nan o nano Figure 5. Assume a state S k+1 is added to the worklist. and so at line 8 we have S = δ(S k . by (3). Assume k = m. APPLICATIONS I: PATTERN MATCHING (a) 1 a 2 n 3 o 4 n other (b) 0 n other other other other n 1. Proof: We prove not only (1) and (2). 1. then (1)-(3) also hold for S k+1 .

For (2). bk ). the claim is proved. bk )) = δ(k.2. For every a bk we have δ(h(S k ). The remaining O(m) transitions for each state can be constructed and stored using O(m2 ) space and time. a) = δ(k. Now we claim S k+1 = δ(S k . Σ. F) with L(B) = L(A) 1 2 3 4 5 6 7 8 9 10 89 Q.Even more: any DFA for Σ∗ p must have at least m + 1 states. However. a) is not added to the worklist. F) Output: DFA B = (Q. since t(S k+1 ) = δ(t(S k ). So NFAtoDFA does not incur in any exponential blowup for word patterns. this improvement only leads to a O(n + m2 ) . Transitions of B p labeled by letters that do not appear in p always lead to state 0. and so the same argument used above to prove k < m yields that S = δ(S k . bk ) = k + 1.2. Q0 . a) if S Q then add S to W add (S . and so its construction requires at least O(nm) time. bk ) = k + 1. B p has at most m + 1 states for a pattern of length m. and the claim is proved. Σ. we recall that after picking S k the worklist is empty. a) for some a ∈ Σ. For (1) we observe that δ(k.1: NFAtoDFA(A) and S is not added to the worklist again. B p has size O(nm). we prove that S k+1 satisﬁes (1)-(3). by (3). Finally. and that in the second claim we have proved that while processing S k the only state added to the worklist is S k+1 . picked from it and added to Q. p2 of p the residuals (Σ∗ p) p1 and (Σ∗ p) p2 are also distinct. S ) to ∆ Table 5. so after picking S k+1 the worklist becomes empty again.5. a. F ← ∅. ∆. and so they do not need to be explicitely stored. So we get Corollary 5. ∆. Since S k+1 = δ(S k . we have that t(S k+1 ) has already been added to the list and. Since Σ has size O(n). bk ) and (2) holds for S k . By Proposition 5. a) = ∅. For (3). THE WORD CASE NFAtoDFA(A) Input: NFA A = (Q. This contradicts that some S k+1 is added to the worklist.3 B p is the minimal DFA recognizing Σ∗ p. B p has m + 1 states and m |Σ| transitions. δ. We can do better by storing B p more compactly. q0 . because for any two distinct preﬁxes p1 . Q0 ← {q0 } W = {Q0 } while W ∅ do pick S from W add S to Q if S ∩ F ∅ then add S to F for all a ∈ Σ do S ← δ(S . and so h(δ(S k .

δC (S k . • If a is a miss. δB (S k . we ﬁrst recall Proposition 5. and the reading head is positioned on the ﬁrst cell of the tape. a) = (t(S k ). a) of S k .2: Tape with reading head. where d ∈ {R. Figure 5. A transition of a lazy DFA is a fourtuple (q. N) means that state q is lazy and delegates processing the letter a to the state q . Intuitively. where R stands for move Right and N stands for No move. which has the form δ : Q × Σ → Q × {R. a. can be constructed in O(m) space and time. if we let δB denote the transition function of B p .1 Lazy DFAs A DFA can be seen as the control unit of a machine that reads input from a tape divided into cells by means of a reading head (see Figure 5.3 shows again the DFA and the lazy DFA with summarized transitions for p = nano. If a = bk . N). The machine accepts a word if the state reached after reading it completely is ﬁnal. the letter “B p is hoping for”. miss) = (t(S k ).. the tape contains the word to be read. independently of the letter being read. a lazy DFA only diﬀers from a DFA in the transition function. In lazy DFAs the machine may advance the head one cell to the right or keep it on the same cell.e. there are two cases for the successors δB (S k . then C p behaves as B p . b a n a n a n o n a .2: for every k ≥ 1. i. i. δC (S 0 .) Consider the behaviours of B p and C p from state S 3 when the next letter is a n. Using this.e. Which of the two takes place is a function of the current state and the current letter read by the head. R). then a is a hit. R).. updates the current state according to the transition function. Therefore. the head of the k-th set S k picked from the worklist (the k-th state added to B p ) is k.. q . the machine reads the content of the cell occupied by the reading head. Formally. a) = S k+1 . N}. as we shall see. we construct a lazy DFA C p with the same states as B p . and so we can “summarize” the transitions for all misses into a single transition δC (S k .e. a transition (q. Initially. a) = (S 0 . then a is a miss. Figure 5..2. a) = δ(t(S k ). However. N). i. a) = (S k+1 . B p moves to the same state it would move to from state t(S k ). and B p “moves forward”. At each step. if k = 0 then t(S k ) is not a state. a) for a state S k and a letter a given by: • If a is a hit.. B p moves to S 1 (because that is what it would do if it .e. i. i.e.e. and its tail is also a state of B p .90 CHAPTER 5. and δB (S k . In the case of a miss C p always delegates to the same state. (We also write just k instead of S k in the states. q . that. and advances the head one cell to the right. N} is the move of the head.2). In order to construct a lazy DFA for Σ∗ p.. and transition function δC (S k . a). d).. δC (S k . To achieve O(n + m) time we introduce an even more compact data structure: the lazy DFA for Σ∗ p.. APPLICATIONS I: PATTERN MATCHING algorithm. a. If a bk . i. then S k delegates to the state t(S k ). 5. and C p moves as B p .

bi )) = δB (t(S i−1 ). 4 we can easily construct C p . b) B j if b = b j+1 (hit) if b b j+1 (miss) and j = 0 if b b j+1 (miss) and j 0 . and so t(S i ) = t(δB (S i−1 . for p = nano once we know that miss(S 3 ) = S 1 and miss(S i ) = S 0 for i = 0. R 0 n.. We show that it can be constructed in O(m) time. 2.e. i. in the case of a miss S i delegates to t(S i ) for every i > 0. For i > 1 we have S i = {i} ∪ t(S i ). which moves to state S 1 : the move of B p is simulated in C p by a chain of delegations. the only problem is to compute in O(m) time the target state miss(S i ) of the miss transition of every state S i of C p . R miss. R miss. bi ) . b) = S 0 δ (t(S ). n other (b) 0 n other other other other n 1. and so size O(m). followed by a last move that advances the reading head to the right (in the worst case the chain of delegations reaches S 0 . R o. bi ) ∪ δ(t(S i−1 . 1. bi ). N Figure 5. N 1 a.3: DFA and lazy DFA for p = nano C p has m+1 states and 2(m+1) transitions. bi )) = t(δ(i − 1. 0 a n 2. 0 n a 3. Since t(S 1 ) = {0} = S 0 . this already gives miss(S 1 ) = S 0 . while C p delegates to S 1 . Clearly. So we get miss(S i ) = δB (miss(S i−1 ). R 2 3 4 miss. who cannot delegate to anybody). bi )) = t({i} ∪ δ(t(S i−1 ). 0 o 4. N n. THE WORD CASE 91 were in state S 1 ). By deﬁnition.2. which delegates to S 0 . The ﬁnal destination is clearly the same in both cases. 0 miss. For instance. 1. we have S j+1 δB (S j . Moreover. N miss.5. miss(S i ) = t(S i ).

.2 lead to the algorithms shown in Table 5. 1 2 3 miss(0) ← 0. because initially j i=2 is set to miss(i − 1). Exercise 56 cadabra.2: Algorithm Miss(p) computing the targets of the miss transitions It remains to show that all calls to DeltaB during the execution of Miss(p) take together O(m) time.2. To this end. Miss(p) Input: word pattern p = b1 · · · bm . and computes with integers: miss(i) = j means miss(S i ) = S j .1 and 5. 1 2 3 DeltaB( j. . Output: head of the state δB (S j . . . . . m do miss(i) ← DeltaB( miss(i − 1) . . and lazy DFAs for the patterns mammamia and abra- . b). and the return value assigned to miss(i) is at most the ﬁnal value of j plus 1. m}. Miss(p) computes for each index i ∈ {0. We prove m ni ≤ m. Construct NFAs. which returns (the head of) the state δB (S j . and recalling that miss(S 0 ) = S 0 . . b). . So we get m m ni ≤ i=2 i=2 miss(i − 1) − miss(i) + 1) ≤ miss(1) − miss(m) + m ≤ miss(1) + m = m Exercises Exercise 55 Design an algorithm that solves the following problem for an alphabet Σ.92 CHAPTER 5. each iteration of the loop decreases j by at least 1 (line 2). b) = S 0 δ (miss(S ). miss(1) ← 0 for i ← 2. b) B j (5. APPLICATIONS I: PATTERN MATCHING Putting these two last equations together. . letter b. Discuss the complexity of your solution. Since state S i has head i. • Given: w ∈ Σ∗ and a regular expression r over Σ. bi ) if i = 0 or i = 1 if i > 1 if b = b j+1 (hit) if b b j+1 (miss) and j = 0 if b b j+1 (miss) and j 0 (5. . the algorithm identiﬁes a state with its head.1) S j+1 δB (S j . . m} the target state miss(S i ). Let ni be the number of iterations of the while loop at line 1 during the call DeltaB(miss(i − 1).2) Equations 5. DFAs. we ﬁnally obtain miss(S i ) = S0 δB (miss(S i−1 ). b). • Find: A shortest preﬁx w1 ∈ Σ∗ of w such that there exists a preﬁx w1 w2 of w and w2 ∈ L(r). bi ) while b b j+1 and j 0 do j ← miss( j) if b = b j+1 then return j + 1 else return 0 Table 5. observe that ni ≤ miss(i − 1) − (miss(i) − 1). b) Input: number j ∈ {0. Miss(p) calls DeltaB( j. bi ). Output: heads of targets of miss transitions. shown on the right of the Table.

but also move left. There is a space vs. Exercise 58 Two-way ﬁnite automata are an extension of lazy automata in which the reading head may not only move right or stay put. • Give algorithms for membership and emptiness of two-way automata. Find a text t and a word pattern p such that the run of B p on t takes at most n steps and the run of C p takes at least 2n − 1 steps. The tape extends inﬁnitely long to both the left and to the right of the input with all cells empty. • Give a two-way DFA with O(n) states for the language (0 + 1)∗ 1(0 + 1)n . because. this improvement does not entirely come for free. a lazy DFA may need more than n steps to read a word of length n. Hint: a simple pattern of the form ak for some k ≥ 0 is suﬃcient. A word is accepted if the control state is a ﬁnal state at the moment the head reaches an empty cell to the right of the input for the ﬁrst time. THE WORD CASE 93 Exercise 57 We have shown that lazy DFAs are more concise than DFAs for languages of the form Σ∗ p.2. runningtime trade-oﬀ. However. due to the steps in which the head does not move. . • (Diﬃcult!) Prove that the languages recognized by two-way DFA are regular.5.

APPLICATIONS I: PATTERN MATCHING .94 CHAPTER 5.

as we shall see. Their implementations will be very similar to those of the language case. R)). returns preR (X) = {y | ∃x ∈ X : (y. where Id X = {(x. For instance.we consider the operation Membership((x. y) ∈ R. Even though we will encode the elements of U as words. and Pre(X. z) | ∃y (x. y) ∈ R}. R2 ) : : : returns the set π1 (R) = {x | ∃y (x. y) ∈ R1 ∧ (y. returns R1 ◦ R2 = {(x. and one can safely identify the object and its encoding as word. but by many. R) = Projection 2(Join(Id X . The reason. but lifted to relations. x) | x ∈ X}. y). Id x )). actually by inﬁnitely many. Observe that Post(X. when implementing relations it is convenient to think of U as an abstract universe. or Complement(R). R) = Projection 1(Join(R. z) ∈ R2 } Finally. returns the set π2 (R) = {y | ∃x (x. Given R. that returns R = (X × X) \ R. 95 . In the case of operations on sets this is not necessary. We are interested in a number of operations. y) ∈ R}. R) : : returns postR (X) = {y ∈ U | ∃x ∈ X : (x. R2 ⊆ U × U: Projection 1(R) Projection 2(R) Join(R1 . R1 . x) ∈ R}. and false otherwise. A ﬁrst group contains the operations we already studied for sets. R) that returns true if (x. A second group contains three fundamental operations proper to relations. is that for some operations we encode an element of X not by one word. R) Pre(X. given X ⊆ U we are interested in two derived operations: Post(X. and not as the set Σ∗ of words over some alphabet Σ. y) ∈ R}.Chapter 6 Operations on Relations: Implementations We show how to implement operations on relations over a (possibly inﬁnite) universe U.

1} with 0 as padding letter. y). leading to diﬀerent implementations of the operations. y) is (s x #|sy |−|sy | . • A accepts x ∈ U if it accepts all encodings of x. then we have to deﬁne when is an element accepted or rejected by an automaton. wy #k ) for every k ≥ 0. an element x has a shortest encoding s x . • A rejects x ∈ U if it accepts no encoding of x. In this case we have Σ = {0.2 Assume an encoding of the universe U over Σ∗ has been ﬁxed. then so does (w x #k . as one would expect. and other encodings are obtained by appending to the shortest encoding an arbitrary number of padding letters. It is convenient to assume that Σ contains a padding letter #. we will use this particular encoding of natural numbers without further notice. and that for every x ∈ U the last letter of s x is diﬀerent from #. and that an element x is encoded not only by a word s x ∈ Σ∗ . but by all the words s x #n with n ≥ 0. but by any word of L(0110∗ ). We say that (w x . y) if w x encodes x. • A recognizes a set X ⊆ U if L(A) = {w ∈ Σ∗ | w encodes some element of X} . A set is regular (with respect to the ﬁxed encoding) if it is recognized by some NFA.96 CHAPTER 6. and w x .1 Encodings We encode elements of U as words over an alphabet Σ. sy are the shortest encodings of x and y. wy ) encodes the pair (x. We choose the following option: Deﬁnition 6. however. OPERATIONS ON RELATIONS: IMPLEMENTATIONS 6. y there is a number n (in fact inﬁnitely many) such that both x and y have encodings of length n. An NFA is well-formed if it recognizes some set of objects. and ill-formed otherwise. It follows that the sets of encodings of two distinct elements are disjoint. That is. . A accepts every x ∈ X and rejects every x X. wy have the same length. Notice that if (w x . Notice that if A recognizes X ⊆ U then. We assume that the shortest encodings of two distinct elements are also distinct. Observe further that with this deﬁnition a NFA may neither accept nor reject a given x. We call it the least signiﬁcant bit ﬁrst encoding and write lsbf(6) to denote the language L(0110∗ ) If we encode an element of U by more than one word. Does it suﬃce that the automaton accepts(rejects) some encoding. In the rest of this chapter.1 We encode the number 6 not only by its small end binary representation 011. wy ) encodes (x. sy ). or does it have to accept (reject) all of them? Several deﬁnitions are possible. Notice. The advantage is that for any two elements x. Let A be an NFA. and |s x | ≤ |sy |. then the shortest encoding of (x. that taking 0 as padding letter requires to take the empty word as the shortest encoding of the number 0 (otherwise the last letter of the encoding of 0 is the padding letter). wy encodes y. If s x . Example 6.

Transducers are also called Mealy machines.6. . but not all. the encodings of a pair (x. . Given words w1 = a1 a2 . b1 ). we say a transducer if well-formed if it recognizes some relation.. TRANSDUCERS AND REGULAR RELATIONS 97 6. wy ) encodes some pair of R} .2.5 Let T be a transducer. . Deﬁnition 6. wy ) ∈ (Σ × Σ)∗ | (w x . It is important to emphasize that not every transducer recognizes a relation. .2 Transducers and Regular Relations We assume that an encoding of the universe U over the alphabet Σ has been ﬁxed. • T rejects a pair (x. The deﬁnition is completely analogous to Deﬁnition 6. and ill-formed otherwise. bn . because it may recognize only some. we identify the sets (Σi × Σi ) i≥0 and (Σ × Σ)∗ = i≥0 (Σ × Σ)i . which allows us to deﬁne the relation recognized by a transducer.6 The Collatz function is the function f : N → N deﬁned as follows: f (n) = 3n + 1 if n is odd n/2 if n is even Figure 6. we say that T accepts the pair (w1 . 011010k ). bn ) ∈ (Σ × Σ)∗ .4 Let T be a transducer over Σ. As for NFAs. 22) because it accepts the pairs (111000k . A relation is regular if it is recognized by some transducer. an and w2 = b1 b2 . In other words. Example 6.2 Deﬁnition 6. y). but it is convenient to look at it as a machine accepting pairs of words: Deﬁnition 6. w2 ) if it accepts the word (a1 .3 A transducer over Σ is an NFA over the alphabet Σ × Σ.1 shows a transducer that recognizes the relation {(n.. 1}. y) ∈ X × X. y) ∈ X × X if it accepts no encoding of (x. and we have lsbf(7) = L(1110∗ ) and lsbf(22) = L(011010∗ ). y). y) ∈ X × X if it accepts all encodings of (x. The elements of Σ × Σ are drawn as column vectors with two components. f (n)) | n ∈ N} with lsbf encoding and with Σ = {0. . According to this deﬁnition a transducer accepts sequences of pairs of letters. y). The transducer accepts for instance the pair (7. We now deﬁne when a transducer accepts a pair (x. • T accepts a pair (x. it accepts k 1 1 1 0 0 0 0 1 1 0 1 0 for every k ≥ 0. • T recognizes a relation R ⊆ X × X if L(T ) = {(w x . that is.(an .

in the following sense: If the input automata are well formed then the output automaton is also well-formed. In particular. Fortunately. Observe that these two deﬁnitions of determinism are not equivalent.98 CHAPTER 6.1 is deterministic in this sense. and so the deterministic case does not add anything new to the discussion of Chapter 4. q ). There is another possibility to deﬁne determinism of transducers. the set of all words can be partitioned in . of an object. The transducer of Figure 6. inﬁnitely many). We do not give separate implementations of the operations for deterministic and nondeterministic transducers. a transducer is called deterministic if for every state q and every letter a there is exactly one transition of the form (q. The new operations (projection and join) can only be reasonably implemented on nondeterministic transducers. OPERATIONS ON RELATIONS: IMPLEMENTATIONS 0 0 0 1 1 1 2 0 0 1 0 3 0 1 1 1 0 0 0 0 1 4 1 0 1 1 5 1 1 6 0 0 Figure 6. b).1: A transducer for Collatz’s function. the implementations of the boolean operations remain correct. b) is interpreted as “the transducer receives the input a and produces the output b”. 6. as long as the ﬁrst assumption still holds. We have relaxed the second assumption.3 Implementing Operations on Relations In Chapter 4 we made two assumptions on the encoding of objects from the universe U as words: • every word is the encoding of some object. Since every word encodes some object. Consider for instance the complementation operation on DFAs. when an appropriate sink state is added. and • every object is encoded by exactly one word. and allowed for multiple encodings (in fact. In this view. (a. a state of a deterministic transducer over the alphabet Σ×Σ has exactly |Σ|2 outgoing transitions. in which the letter (a. Determinism A transducer is deterministic if it is a DFA.

0010n ) | n ≥ 0} and so the NFA constructed after lines 1-4 recognizes {10n+2 | n ≥ 0}. Notice further that membership of an object x in a set represented by a well-formed automaton can be checked by taking any encoding w x of x. The complement automaton then satisﬁes the same property. q0 ← q0 . this initial idea is not fully correct. a. q0 . 6. q0 . because it does not accept all its encodings: the encodings 1 and 10 are rejected. an iff the transducer has an accepting run on some pair (w. Formally. The initial idea is very simple: loosely speaking. This problem can be easily repaired. each of them containing all the encodings of an object. F ) with L(A) = π1 (L(T )) 1 2 3 4 5 Q ← Q. Σ. q0 . for all (q. Here is the algorithm constructing the padding closure: . q ) to δ F ← PadClosure((Q . b) by a. F ← F δ ← ∅. Σ. q ) ∈ δ do add (q.3. w ).3. but accepting a class iff the original automaton does not accept it. A either accepts all the words in an equivalence class. this step is carried out in lines 1-4 of the following algorithm (line 5 is explained below): Proj 1(T) Input: transducer T = (Q. 4)} over N. This transformation yields a NFA. . the padding closure augments the set of ﬁnal states and return a new such set. it does not recognize the number 1. #) However.1 Projection Given a transducer T recognizing a relation R ⊆ X × X.6. and checking if the automaton accepts w x . we construct an automaton over Σ recognizing the set π1 (R). we go through all transitions. Consider the relation R = {(1. or none of them. If the input automaton A is well-formed. We introduce an auxiliary construction that “completes” a given NFA: the padding closure of an NFA is another NFA A that accepts a word w if and only if the ﬁrst NFA accepts w#n for some n ≥ 0 and a padding symbol #. F ). b). F) Output: NFA A = (Q . then for every object x from the universe. δ . IMPLEMENTING OPERATIONS ON RELATIONS 99 equivalence classes. . (a. δ. and this NFA has an accepting run on a word a1 . and replace their labels (a. Σ × Σ. Formally. δ . However. A transducer T recognizing R recognizes the language {(10n+2 .

since every transition is examined at most twice. The complexity of Proj i() is clearly O(|δ| + |Q|). A2 ] of two NFA A1 . Example 6. although the transducer is deterministic. Q. Recall that the pairing [A1 . typically. r] − − →[q . while W ∅ do pick q from W add q to F for all (q .2 Join. So both NFAs should be universal. The same happens to state 3 for the NFA at the bottom (second component). F ← ∅. which can reach the ﬁnal state 2 with 0. δ. States 4 and 5 of the NFA at the top (ﬁrst component) are made ﬁnal by PadClosure(). c). become after projection two transitions labeled with the same letter a: In practice the projection of a transducer is hardly ever deterministic. So we have π1 (R) = {n | n ∈ N} = N and π2 (R) = { f (n) | n ∈ N}. Observe that both projections are nondeterministic.2 shows the NFA obtained by projecting the transducer for the Collatz function onto the ﬁrst and second components. − → − → Similarly. r ] if and only if A1 has a transition q − r and A2 has a transition q − r . #) Input: NFA A = (Σ × Σ. Recall that the transducer recognizes the relation R = {(n. Post. because two transitions leaving a state and labeled by two diﬀerent (pairs of) letters (a. because they can both reach the ﬁnal state 6 through a chain of 0s (recall that 0 is the padding symbol in this case). w ) iff there is a word w such that T 1 accepts (w. b) and (a. r ] if there is a letter c such that T 1 has a transition −− (a. once in line 3. where f denotes the Collatz function. q) ∈ δ do if q F then add q to W return F Projection onto the second component is implemented analogously. and a moment of thought shows that π2 (R) = N as well.3. and then show how to modify it to obtain implementations of Pre and Post. and Pre We give an implementation of the Join operation. and the reader can easily check that this is indeed the case. A2 has a a a a transition [q. w ). we construct a transducer T 1 ◦ T 2 recognizing R1 ◦R2 . Given transducers T 1 . T 1 ◦T 2 has a transition [q.100 CHAPTER 6. T 2 recognizing relations R1 and R2 . 6. r] − − →[q . q0 . Since. OPERATIONS ON RELATIONS: IMPLEMENTATIONS PadClosure(A. f (n)) | n ∈ N}. The intuitive idea is to slightly modify the product construction. We ﬁrst construct a transducer T with the following property: T accepts (w.b) . and possibly a second time at line 5 of PadClosure().7 Figure 6. F) Output: new set F of ﬁnal states 1 2 3 4 5 6 7 W ← F. Observe that projection does not preserve determinism. a sequence of operations manipulating transitions contains at least one projection. w ) and T 2 accepts (w . deterministic transducers are relatively uninteresting. #.

Intuitively.b) q01 q02 −−→ −− a 1 b 1 q11 q12 −−→ −− a 2 b 2 q21 q22 . q(n−1)1 q(n−1)2 −−→ −− a n b n qn1 qn2 .. T can output b on input a if there is a letter −− −− c such that T 1 can output c on input a. IMPLEMENTING OPERATIONS ON RELATIONS 0 0 1 101 2 0 1 0 3 1 1 0 0 4 1 1 0 1 5 1 6 0 1 2 0 0 1 3 1 0 0 1 4 0 1 5 1 6 0 Figure 6. and T 2 can output b on input c. q − − → r and A2 has a transition q − − → r . The transducer T has a run (a.6..2: Projections of the transducer for the Collatz function onto the ﬁrst and second components.3.c) (c.

q0 ← [q01 . because it does not accept all its encodings: the encoding (01. . b). Σ × Σ. T 2 ) Input: transducers T 1 = (Q1 . because for instance [1.102 iff T 1 and T 2 have runs CHAPTER 6. 01) is missing. (a. Then T 1 and T 2 recognize the languages {(010n+1 . 3] to [3. F2 ) Output: transducer T 1 ◦ T 2 = (Q. this time using [#. f (n)) | n ∈ N}. labeled by (0. q2 ] to Q if q1 ∈ F1 and q2 ∈ F2 then add [q1 . δ. 0). 1). T does not accept the pair (2. q2 ] from W add [q1 . 4)} and R2 = {(4. F1 ). Consider the relations on numbers R1 = {(2. q1 ) ∈ δ1 . Σ × Σ. 0010n ) | n ≥ 0} and {(0010n . The number of states of Join(T 1 . T is not yet the transducer we are looking for. #] as padding symbol. and the one from 3 to 2 labeled by (1. according to our deﬁnition. (a. So T recognizes {(010n+1 . recognizes the relation {(n.3. the transition leading from [2. then T = (Q. F1 ) and T 2 = (Q2 . . q0 . 0). if T 1 = (Q1 . . q2 ) ∈ δ2 do add ([q1 . F) 1 2 3 4 5 6 7 8 9 10 a 1 c 1 a 2 c 2 a n c n Q. q0 . q01 . b). OPERATIONS ON RELATIONS: IMPLEMENTATIONS q01 − − → q11 − − → q21 . q(n−1)1 − − → qn1 −− −− −− c c c 1 2 n b b b 1 2 n q02 − − → q12 − − → q22 . q2 ] to F for all (q1 .8 Recall that the transducer of Figure 6. q02 .3 shows the transducer T ◦ T as computed by Join(T . #)) However. q2 ] Q then add [q1 . 010n+1 ) | n ≥ 0} of word pairs. 0). Let T be this transducer. q01 .1. For example. c). q02 . [q1 . q2 ]. Σ × Σ. . where f is the Collatz function. δ2 . Σ × Σ. q02 ] W ← {[q01 . Observe that T ◦ T is not deterministic. (c. 2]. even . The bottom part of Figure 6. Σ × Σ. So we add a padding closure again at line 10. q0 . δ2 . is the result of “pairing” the transition from 2 to 3 labeled by (0. q2 ] to W F ← PadClosure((Q. q02 ]} while W ∅ do pick [q1 . δ. 010n+1 ) | n ≥ 0}. T 2 = (Q2 . as for the standard product construction. F ). Example 6. δ. The problem is similar to the one of the projection operation.T ). (#. 2) ∈ N × N. 1] is the source of two transitions labeled by (0. Σ × Σδ. shown again at the top of Figure 6. F ) is the transducer generated by lines 1–9 of the algorithm below: Join(T 1 . q2 ]) to δ if [q1 . F ← ∅. 2)}. δ1 . δ1 . T 2 ) is O(|Q1 | · |Q2 |). Σ × Σ. q(n−1)2 − − → qn2 −− −− −− Formally. F2 ). (q2 . But then.

2 2.3.3: A transducer for f ( f (n)). 2 1 0 6. 3 1 1 1 1 5. 3 0 0 1 1 0 1 6. 4 1 0 1 0 1 1 1 1 0 1 3. 3 Figure 6.6. 2 0 1 0 0 4. 3 0 0 1 0 1 1 2. 3 0 1 0 1 0 0 1 1 0 1 3. IMPLEMENTING OPERATIONS ON RELATIONS 0 0 0 1 1 1 103 2 0 0 1 0 3 0 1 1 1 0 0 0 0 1 4 1 0 1 1 5 1 1 6 0 0 2. 2 1 0 1 0 0 0 5. . 6 1 1 0 1 0 0 3. 5 1. 1 0 0 1 0 0 0 2. 5 4. 2 1 0 0 0 0 1 3.

The algorithm Post(A1 . 0 (0.T 2 ) is the result of replacing lines 7-8 of Join() by 7 8 for all (q1 . 3] − → −− becomes ﬁnal due to the closure with respect to the padding symbol 0. q1 ) ∈ δ1 . This transducer recognizes the relation {(n. For instance. we construct an NFA B recognizing the set postR (U). closing it with respect to the padding symbol. In order to construct an NFA recognizing preR (X). The following DFA does the job: 0 1 1 1 2 0 0 3 1 For instance. δ2 .. q02 . a. q01 . c). It suﬃces to slightly modify the join operation. Σ×Σ. f ( f (n))) | n ∈ N}.4.e. (c. the resulting NFA has to be postprocessed. c. as usual. f (n)) | n ∈ N}. δ1 . c. (q2 . We now compute postR (Y). this DFA recognizes 0011 (encoding of 12) and 01001 (encoding of 18).104 CHAPTER 6. The result is the NFA shown in Figure 6. [q1 . R = {(n. the transition [1. q2 ) ∈ δ2 do add δ to ([q1 . namely. Example 6. (a. 1] − − →[1. R) Given an NFA A1 = (Q1 . we replace lines 7-8 by 7 8 for all (q1 .1) 1 . but not 0101 (encoding of 10).R) and Pre(X. Σ. b). OPERATIONS ON RELATIONS: IMPLEMENTATIONS though T is deterministic. q2 ) ∈ δ2 do add ([q1 . q2 ]) Notice that both post and pre are computed with the same complexity as the pairing construction. State [2. i. F1 ) recognizing a regular set X ⊆ U and a transducer T 2 = (Q2 . the product of the number of states of transducer and NFA. (q2 . A little calculation gives n/4 if n ≡ 0 mod 4 if n ≡ 2 mod 4 f ( f ((n)) = 3n/2 + 1 3n/2 + 1/2 if n ≡ 1 mod 4 or n ≡ 3 mod 4 The three components of the transducer reachable from the state [1. 3] is generated by the transitions 1 − 1 of the DFA and 1 − − → 3 of the transducer for the Collatz function. q2 ]. q1 ) ∈ δ1 . [q1 . b. Post(X. q2 ]. F2 ) recognizing a regular relation R ⊆ U ×U. the set { f (3n) | n ∈ N}. we ﬁrst need an automaton recognizing the set Y of all lsbf encodings of the multiples of 3.9 We construct an NFA recognizing the image under the Collatz function of all multiples of 3. where. For this. 1] correspond to these three cases. q2 ]) to δ As for the join operation.

2 1 0 0 3.4: Computing f (n) for all multiples of 3. 3 0 3. 4 0 3. 5 0 0 2.3. . 4 1 1 0 0 2. 2 0 1. 6 0 1 3. 6 1 0 3. IMPLEMENTING OPERATIONS ON RELATIONS 0 0 0 1 1 1 105 2 0 1 1 1 2 0 0 3 1 0 0 0 3 1 0 0 1 1 0 0 1 0 1 4 1 0 1 1 5 1 1 6 0 0 1. 2 1 1 0 2. 6 1 Figure 6. 3 1 1 1. 4 0 1. 3 0 1 1 2. 5 1 1 1. 1 0 0 0 1 2.6. 5 1 1.

then add q to the set of ﬁnal states. Given a relation R ⊆ U. A tuple (w1 .4 Relations of Higher Arity The implementations described in the previous sections can be easily extended to relations of higher arity over the universe U. . . Boolean operations are deﬁned as for NFAs.I2 R2 = {xK1 \I1 xK2 \I2 | ∃x ∈ R1 . . .) The lower part of the NFA recognizes the second set. respectively. Applying f to them yields the sets {3k | k ≥ 0} and {18k + 10 | k ≥ 0}. ym ) of arities n and m. The join operation can also be generalized. (a1 . . q ).4 is not diﬃcult to interpret. let xI denote the projection of a tuple x = (x1 . . it is easy to ﬁnd for each state a word recognized by it. In a sense. . the n-transducer recognizing Projection P(R) is computed as follows: • Replace every transition (q. . . . Given a k-transducer T recognizing R. ain ). deﬁned as the tuple (xi1 . . and if we minimize it we obtain exactly the DFA given above. and the set {6k + 3 | k ≥ 0}. . . . . xk ) ∈ U k if wi encodes xi for every 1 ≤ i ≤ k. . . . Given two tuples x = (x1 . we denote the tuple (x1 . our solution “speeds up” the computation by an inﬁnite factor! 6. given an index set I = {i1 . . k2 } of the same cardinality. xin ) ∈ U n . we deﬁne Projection I(R): returns the set πI (R) = {xI | x ∈ R}. ak ). and w1 . . wk ) of words over Σ encodes the tuple (x1 . xk ) ∈ U k over I. y1 . this upper part is a DFA. . . The ﬁrst of them is again the set of all multiples of 3. The multiples of 3 are the union of the sets {6k | k ≥ 0}. For this. we deﬁne Join I(R1 . xn ) and y = (y1 . wk have the same length. (ai1 . k}. . (#. . The operation is implemented analogously to the case of a binary relation. . R2 ): returns R1 ◦I1 . xn . respectively. but not by the others. . . and index sets I1 ⊆ {1. . . Acceptance of a k-transducer is deﬁned as for normal transducers. . . . . .106 CHAPTER 6. if q is a ﬁnal state. . . . q ) of T by the transition (q. . . . We brieﬂy describe the generalization. • Compute the PAD-closure of the result: for every transition (q. . (In fact. . . #). . . y ∈ R2 : xI1 = yI2 } . . . all whose elements are even. ym ) of dimension n + m by x · y. . k1 }. . . . . The lower part is minimal. and it is recognized by the upper part of the NFA. OPERATIONS ON RELATIONS: IMPLEMENTATIONS The NFA of Figure 6. . Fix an encoding of the universe U over the alphabet Σ with padding symbol #. . The projection operation can be generalized to projection over an arbitrary subset of components. . . Given relations R1 ⊆ U k1 and R2 ⊆ U k2 of arities k1 and k2 . . . A k-transducer over Σ is an NFA over the alphabet Σk . in } ⊆ {1. It is interesting to observe that an explicit computation of the set { f (3k) | k ≥ 0}) in which we apply f to each multiple of 3 does not terminate. . . q ). because the set is inﬁnite. . I2 ⊆ {1. . all whose elements are odd. . .

Show how to construct a nondeterministic transducer over Σ × Σ for Σ = {A. Exercise 61 In phone dials letters are mapped into digits as follows: ABC → 2 MNO → 6 DEF → 3 PQRS → 7 GHI → 4 TUV → 8 JKL → 5 WXYZ → 9 This map can be used to assign a telephone number to a given word. w) such that n ∈ {0. the number for AUTOMATON is 288662866. Assume a DFA N over the alphabet {A.6. . v) ∈ L(T ) iff wv ∈ L(A) and |w| = |v|. R2 ) a transition ([q. 0. . given a telephone number. . and aI1 = bI2 . padding occurs on the left. . . . . then add to the transducer for Join I(R1 . q ). . Z} recognizing the set of all English words is given. . . • Show how to construct a transducer T over the alphabet Σ × Σ such that (w. The ﬁrst step constructs a transducer according to the following rule: If the transducer recognizing R1 has a transition (q. then so does w #k for every k ≥ 0. and BEE. r]. The generalization of the Pre and Post operations is analogous. ﬁnding the set of English words that are mapped into it. a. 9} recognizing the set of pairs (n. the set of words mapping to 233 contains at least ADD. 9}∗ and w ∈ {A.. RELATIONS OF HIGHER ARITY 107 The arity of Join I(R1 . R2 ) is k1 + k2 − |I1 |. • Give an algorithm that accepts an NFA A as input and returns an NFA A/2 such that L(A/2) = {w ∈ Σ∗ | ∃v ∈ Σ∗ : wv ∈ L(A) ∧ |w| = |v|}. aK1 \I1 · bK2 \I2 . The operation is implemented analogously to the case of binary relations. Consider the problem of. For instance. r ]). we compute the PAD-closure of the result. ldots. The padding closure has been deﬁned for encodings where padding occurs on the right. in some natural encodings. . For instance. Give an algorithm for calculating the padding closure of a transducer when padding occurs on the left. . Exercise 60 Let A be an NFA over the alphabet Σ. Z}∗ and w is mapped to n. . BED.4. [q . Z. the transducer recognizing R2 has a transition (r.e. In the second step. . Pre operations to transducers requires to compute the padding closure in order to guarantee that the resulting automaton accepts either all or none of the encodings of a object. . . b. Exercises Exercise 59 As we have seen. r ). the application of the Post. like the most-signiﬁcant-bit-ﬁrst encoding of natural numbers. if w encodes an object. . However. We proceed in two steps. i.

c2m ) | n. respectively. 0] and [2.b1 ) (an . b) ∈ Σ×Σ. no input symbol is read and no output symbol is written.b0 ) (a1 .108 CHAPTER 6. . . . . both with 0 as initial value. . Let [i. bn . 0. and the values of x and y are x and y. With this deﬁnition transducers can only accept pairs of words (a0 a1 .bn ) Exercise 63 Transducers can be used to capture the behaviour of simple programs. Let P be the following program: 1 2 3 4 5 6 7 8 9 10 11 12 13 x ←? write x while true do do read y until x = y if eo f then write y end do x← x−1 or y←y+1 until x y P communicates with the environment through the boolean variables x and y. al . bi ∈ Σ ∪ {ε} − −→ − −→ − −→ such that w = a0 a1 . Apply IntersNFA to T 1 and T 2 . Show that no ε-transducer recognizes R1 ∩ R2 . 0. . Note that |w| ≤ n and |w | ≤ n. The initial state of P is [1. x. c2n ) | n. . A ε-transducer is a NFA whose transitions are labeled by elements of (Σ ∪ {ε}) × (Σ ∪ {ε}). −− − qn with ai . m ≥ 0}. By executing the ﬁrst instruction P moves nondeterministically to one of the states [2. 0]. bl ) of the same length. An ε-transducer accepts a pair (w. b0 b1 . 1. (a0 . an and w = b0 b1 . Which is the language recognized by the resulting ε-transducer? 3. m ≥ 0}. Construct ε-transducers T 1 . w ) of words if it has a run q0 −− − q1 −− − . OPERATIONS ON RELATIONS: IMPLEMENTATIONS Exercise 62 We have deﬁned transducers as NFAs whose transitions are labeled by pairs of symbols (a. y] denote the state of P in which the next instruction to be executed is the one at line i. . . . 2. The relation accepted by the ε-transducer T is denoted by L(T ). 1. T 2 recognizing the relations R1 = {(an bm . 0]. and so the transition relation . and R2 = {(an bm .

Describe an algorithm that decides. 0] to [3. Draw an ε-transducer modelling the behaviour of P. say. 0]. 0. Can an overﬂow error occur? 3.6. Intuitively. (ε. it writes ”Singing In The Rain”. 1. In each example. 0]. Think of O as a set of dangerous outputs that we want to avoid. capitalizing all the words in the string: if the transducer reads. 0] while reading nothing and writing 1. 1. Similarly. (ε. 0]).e. [3.) Consider transducers whose transitions are labeled by elements of (Σ∪{ε})×(Σ∗ ∪{ε}). each of which is informally deﬁned by means of two examples. These transducers can be used to perform operations on strings like. whether there are i ∈ I and o ∈ O such that (i. the program P moves from [2. What are the possible values of x and y upon termination. 0. (ε. +49 89 273452 +49 89 273452 +49 89 273452 Kathleen . [2. at each transition these transducers read one letter or no letter. 0. RELATIONS OF HIGHER ARITY 109 δ for P contains the transitions ([1. ”singing in the rain”. upon reaching end? 4. Let I and O be regular sets of inputs and outputs. ε). that none of the dangerous outputs can occur. 0]) and ([1. i. 0]. given I and O. We wish to prove that the inputs from I are safe. 2. i. respectively. Exercise 64 (Inspired by a paper by Galwani al POPL’11. 1. Hence. 0]). δ also contains the transition rule ([2. [2. 1.html Company\Docs\Spec\specs. Gates. ε).doc International Business Machines Principles Of Programming Languages Oege De Moor Fisher AT&T Labs Eran Yahav Bill Gates 004989273452 (00)4989273452 273452 Company\Code Company\Docs\Spec\ IBM POPL Oege De Moor Kathleen Fisher AT&T Labs Yahav. by executing its second instruction. Is there an execution during which P reads 101 and writes 01? 5. Give transducers for the following operations. 1). 1. 1. o) is accepted. B. for instance.4.e. if the transducer reads the string on the left then it writes the string on the right. an write a string of arbitrary length. Company\Code\index. E.

OPERATIONS ON RELATIONS: IMPLEMENTATIONS .110 CHAPTER 6.

1} of length 32. In order to return a minimal DFA it is necessary to ﬁrst determinize.232 − 1]. For instance. because they do not even preserve determinacy: the result of projecting a deterministic transducer or joining two of them may be nondeterministic. Some remarks are in order: 111 . Since all encodings have the same length. for the case in which the universe U of objects is ﬁnite. its objects can be encoded by words over {0. If X ⊂ U is encoded as a language L ⊆ Σk for some number k ≥ 0. we also assume that each word encodes some object. So. Operations on objects correspond to operations on languages. In this chapter we present implementations that directly yield the minimal DFA. then the complement set U \ X is not encoded by L (which contains words of any length) but by L ∩ Σk . A natural question is whether the operations on languages and relations described in Chapters 4 and 6 can be implemented using minimal DFAs and minimal deterministic transducers as data structure.1 Fixed-length Languages and the Master Automaton Deﬁnition 7. the result may be non-minimal (the only exception was complementation). but do not preserve minimality: even if the arguments are minimal DFAs. and then minimize. and we can assume that each object is encoded by exactly one word.Chapter 7 Finite Universes In Chapter 3 we proved that every regular language has a unique minimal DFA. at exponential cost in the worst case. As in Chapter ??.1 A language L ⊆ Σ∗ has length n ≥ 0 if every word of L has length n. in order to return the minimal DFA for the result an extra minimization operation must be applied. If L has length n for some n ≥ 0. and this common length is known a priori. or that it has ﬁxed-length. padding is not required to represent tuples of objects. with no need for an extra minimization step. if the universe consists of the numbers in the range [0. 7. but complementation requires some care. then we say that L is a ﬁxed-length language. all objects can be encoded by words of the same length.. The implementations of (the ﬁrst part of) Chapter 4 accept and return DFAs. The situation is worse for the projection and join operations of Chapter 6. When the universe is ﬁnite.

δ M . and the result is proved by direct inspection of the master automaton. a) = La for every q ∈ Q M and a ∈ Σ. Σ. because then the complement of a language of length n has also length n. If n = 0. the empty language has length n for every n ≥ 0 (the assertion “every word of L has length n” is vacuously true). F L ) as follows: QL is the set of states of the master automaton reachable from the state L. As in the case of canonical DAs. The following proposition was already proved in Chapter 3. Example 7. then L = { } or L = ∅.2 The master automaton over the alphabet Σ is the tuple M = (Q M . δL is the projection of δ M onto QL . where • Q M is is the set of all ﬁxed-length languages over Σ. the state La recognizes the language La . in particular. the states are languages. its residual with respect to a is the language La = {w ∈ Σ∗ | aw ∈ L}. q0L is the state L. b}. Proposition 7. recall the notion of residual with respect to a letter: given a language L ⊆ Σ∗ and a ∈ Σ. Deﬁnition 7.4 Let L be a ﬁxed-length language. Proof: By induction on the length n of L. For the deﬁnition. and the language {ε} containing only the empty word. we can look at the master automaton as a structure containing DFAs recognizing all the ﬁxed-length languages. FINITE UNIVERSES • According to this deﬁnition. then so does La . Since. and F L = F M . Recall that. This is useful. • There are exactly two languages of length 0: the empty language ∅. by induction hypothesis. It is easy to show that AL is the minimal DFAs recognizing L: . a) = ∅ for every a ∈ Σ. • F M is the singleton set containing the language { } as only element. the only cycles of M are the self-loops corresponding to δ M (∅. q0L . • δ : Q M × Σ → Q M is given by δ(L. and so it is regular. A simple but important observation is that if L has ﬁxed-length. but with slightly diﬀerent terminology. δL . More precisely. Notice that M is almost acyclic.1 shows a small fragment of the master automaton for the alphabet Σ = {a.112 CHAPTER 7. each ﬁxed-length language L determines a DFA AL = (QL . Σ.3 Figure 7. but no initial state. For n > 0 we observe that the successors of the initial state L are the languages La for every a ∈ Σ. we have ∅a = { }a = ∅. To make this precise. the state L recognizes the language L. The language recognized from the state L of the master automaton is L. • Every ﬁxed-length language contains only ﬁnitely many words. The master automaton over an alphabet Σ is a deterministic automaton with an inﬁnite number of states. F M ). By this proposition.

. By Proposition 7. bab. aab. δL is the projection of δ M onto QL . bb} a. baa.5 For every ﬁxed-language L. where QL is the set of states of the master automaton reachable from at least one of the states L1 . Σ. . Proof: By deﬁnition. ba} {aa. . . δL . bba. . bbb} a b {aa. The multi-DFA AL is the tuple AL = (QL . the structure representing the languages L = {L1 . Q0L = {L1 . b} Proposition 7. . . baa. FL ). . Ln . distinct states of AL recognize distinct languages. distinct states of the master automaton are distint languages. . ab. abb. and for this reason we call it the multi-DFA for L. . bbb} a b {aab.2 A Data Structure for Fixed-length Languages Proposition 7. Q0L .6 Let L = {L1 . b {ab. . b} b a a. the automaton AL is the minimal DFA recognizing L. b a. By Corollary 3.5 allows us to deﬁne a data structure for representing ﬁnite sets of ﬁxed-length languages. bab. 7. b {a. ab. Formally: Deﬁnition 7.4.8 (a DFA is minimal if and only if distinct states recognized diﬀerent languages) AL is minimal. Lm } is the fragment of the master automaton containing the states recognizing L1 . ba. bb} a a. . Ln } be a set of languages of the same length over the same alphabet Σ.1: A fragment of the master automaton for the alphabet {a.2. . . Ln }. and FL = F M . . ab. . . b a {aa. all of them of the same length. Ln and their descendants. b Figure 7. . . A DATA STRUCTURE FOR FIXED-LENGTH LANGUAGES 113 {aaa. . bb} b {b} b {a} b a { } ∅ a. Loosely speaking.7. aba. . It is a DFA with multiple initial states.

1 For instance. The algorithms on multi-DFAs use a procedure make(s) that returns the state of T having s as successor tuple. ba}. then make(s) returns q∅ . If s is the tuple all whose components are q∅ . Along the chapter we denote the state identiﬁers of the state for the languages ∅ and { } by q∅ and q . L2 . b 2 a a L2 6 b 3 a. if T is the 1 Notice that the procedure makes use of the fact that no two states of the table have the same successor tuple. . . The multi-DFA is represented by a table containing a node for each state. For clarity the state for the empty language has been omitted.2: The multi-DFA for {L1 . qm ) is the successor tuple of the node. 2 3 4 5 6 7 a-succ 1 1 0 2 2 4 b-succ 0 1 1 2 3 4 The procedure make(s). . ba. bb}. ba}. b b L3 7 a. otherwise. b 4 1 Figure 7. if such a state exists. s . am }. where q is a fresh state identiﬁer (diﬀerent from all other state identiﬁers in T ) and returns q. . . . . The procedure assumes that all the states of the tuple s (with the exception of q∅ and q ) appear in T . is Ident.114 CHAPTER 7. bb}.2. L3 } with L1 = {aa. A node is a pair q. as well as the transitions leading to it.2 shows (a DFA isomorphic to) the multi-DFA for the set {L1 . Example 7. L3 }. In order to manipulate multi-DFAs we represent them as a table of nodes. where state identiﬁers are numbers. and L3 = {ab. where q is a state identiﬁer and s = (q1 . The table for the multi-DFA of Figure 7. bb}. Assume Σ = {a1 . it adds a new node q. . L2 = {aa.7 Figure 7. L2 . L2 = {aa. bb}. ba. FINITE UNIVERSES L1 5 a. and L3 = {ab. . s to T . with the exception of the nodes q∅ and q . respectively. where L1 = {aa.

7.3. OPERATIONS ON FIXED-LENGTH LANGUAGES

115

L1

5 a, b 2

L1 ∪ L2

6 a 3 a b

L2

L3

7 a, b 4 b

L2 ∩ L3

8 b

a, b

1

Figure 7.3: The multi-DFA for {L1 , L2 , L3 , L1 ∪ L2 , L2 ∩ L3 } table above, then make(2, 2) returns 5, but make(3, 2) adds a new row, say 8, 3, 2, and returns 8.

7.3

Operations on ﬁxed-length languages

All operations assume that the input ﬁxed-length language(s) is (are) given as multi-DFAs represented as a table of nodes. Nodes are pairs of state identiﬁer and successor tuple. The key to all implementations is the fact that if L is a language of length n ≥ 1, then La is a language of length n − 1. This allows to design recursive algorithms that directly compute the result when the inputs are languages of length 0, and reduce the problem of computing the result for languages of length n ≥ 1 to the same problem for languages of smaller length. Fixed-length membership. The operation is implemented as for DFAs. The complexity is linear in the size of the input. Fixed-length union and intersection. Implementing a boolean operation on multi-DFAs corresponds to possibly extending the multi-DFA, and returning the state corresponding to the result of the operation. This is best explained by means of an example. Consider again the multi-DFA of Figure 7.2. An operation like Union(L1 , L2 ) gets the initial states 5 and 6 as input, and returns the state recognizing L1 ∪ L2 ; since L1 ∪ L2 = L2 , the operation returns state 6. However, if we take Intersection(L2 , L3 ), then the multi-DFA does not contain any state recognizing it. In this case the operation extends the multi-DFA for {L1 , L2 , L3 } to the multi-DFA for {L1 , L2 , L3 , L2 ∪ L3 }, shown in Figure 7.3, and returns state 8. So Intersection(L2 , L3 ) not only returns a state, but also has a side eﬀect on the multi-DFA underlying the operations.

116

CHAPTER 7. FINITE UNIVERSES

Given two ﬁxed-length languages L1 , L2 of the same length and a boolean operation , we present an algorithm that returns the state of the master automaton recognizing L1 ∩ L2 (the algorithm for L1 ∪ L2 is analogous). The properties • if L1 = ∅ or L2 = ∅, then L1 ∩ L2 = ∅; • if L1 = { } and L2 = { }, then L1 ∩ L2 = { }; • if L1 , L2

a a {∅, { }}, then (L1 ∩ L2 )a = L1 ∩ L2 for every a ∈ Σ;

lead to the recursive algorithm inter(q1 , q2 ) shown in Table 7.1. Assume the states q1 , q2 recognize the languages L1 , L2 . The algorithm returns the state identiﬁer qL1 ∩L2 . If q1 = q∅ , then L1 = ∅, which implies L1 ∩ L2 = ∅. So the algorithm returns the state identiﬁer q∅ . If q2 = q∅ , the algorithm also returns q∅ . If q1 = q = q2 , the algorithm returns q . This deals with all the cases in which q1 , q2 ∈ {q∅ , q } (and some more, which does no harm). If q1 , q2 {q∅ , q }, then the algorithm computes the state identiﬁers r1 , . . . , rm recognizing the languages (L1 ∩ L2 )a1 , . . . , (L1 ∩ L2 )am , and returns make(r1 , . . . , rn ) (creating a new node if no node of T has (r1 , . . . , rn ) as successor tuple). But how does the algorithm compute the state identiﬁer of (L1 ∩ L2 )ai ? By equation (3) above, we a a a a have (L1 ∩ L2 )ai = L1i ∩ L2i , and so the algorithm computes the state identiﬁer of L1i ∩ L2i by a ai ai recursive call inter(q1 , q2 ). The only remaining point is the rˆ le of the table G. The algorithm uses memoization to avoid o recomputing the same object. The table G is initially empty. When inter(q1 , q2 ) is computed for the ﬁrst time, the result is memoized in G(q1 , q2 ). In any subsequent call the result is not recomputed, but just read from G. For the complexity, let n1 , n2 be the number of states of T reachable from the state q1 , q2 . It is easy to see that every call to inter receives as arguments states reachable from q1 and q2 , respectively. So inter is called with at most n1 · n2 possible arguments, and the complexity is O(n1 · n2 ). inter(q1 , q2 ) Input: states q1 , q2 recognizing languages of the same length Output: state recognizing L(q1 ) ∩ L(q2 ) 1 if G(q1 , q2 ) is not empty then return G(q1 , q2 ) 2 if q1 = q∅ or q2 = q∅ then return q∅ 3 else if q1 = q and q2 = q then return q 4 else / ∗ q1 , q2 {q∅ , q } ∗ / 5 for all i = 1, . . . , m do ri ← inter(qai , qai ) 1 2 6 G(q1 , q2 ) ← make(r1 , . . . , rm ) 7 return G(q1 , q2 ) Table 7.1: Algorithm inter Algorithm inter is generic: in order to obtain an algorithm for another binary operator it suﬃces to change lines 2 and 3. If we are only interested in intersection, then we can easily gain a more

7.3. OPERATIONS ON FIXED-LENGTH LANGUAGES

117

eﬃcient version. For instance, we know that inter(q1 , q2 ) and inter(q2 , q1 ) return the same state, and so we can improve line 1 by checking not only if G(q1 , q2 ) is nonempty, but also if G(q2 , q1 ) is. Also, inter(q, q) always returns q, no need to compute anything either. Example 7.8 Consider the multi-DFA at the top of Figure 7.4, but without the blue states. State 0, accepting the empty language, is again not shown. The tree at the bottom of the ﬁgure graphically describes the run of inter(12, 13) (that is, we compute the node for the intersection of the languages recognized from states 12 and 13). A node q, q → q of the tree stands for a recursive call to inter with arguments q and q that returns q . For instance, the node 2,4 → 2 indicates that inter is called with arguments 2 and 4 and the call returns state 2. Let us see why is this so. The call inter(2, 4) produces two recursive calls, ﬁrst inter(1, 1) (the a-successors of 2 and 4), and then inter(0, 1). The ﬁrst call returns 1, and the second 0. Therefore inter(2, 4) returns a state with 1 as a-successor and 0 as b-successor. Since this state already exists (it is state 2), inter(2, 4) returns 2. On the other hand, inter(9,10) creates and returns a new state: its two “children calls” return 5 and 6, and so a new state with state 5 and 6 as a- and b-successors must be created. Pink nodes correspond to calls that have already been computed, and for which inter just looks up the result in G. Green nodes correspond to calls that are computed by inter, but not by the more eﬃcient version. For instance, the result of inter(4,4) at the bottom right can be returned immediately. Fixed-length complement. Recall that if a set X ⊆ U is encoded by a language L of length n, n then the set U \ X is encoded by the ﬁxed-length complement Σn \ L, which we denote by L . Since 2 3 0 the empty language has all lengths, we have e.g. ∅ = Σ2 , but ∅ = Σ3 and ∅ = Σ0 = { }. n Given the state of the master automaton recognizing L, we compute the state recognizing L with the help of these properties: • If L has length 0 and L = ∅ then L = { }. • If L has length 0 and L = { }, then L = ∅. • If L has length n ≥ 1, then L (Observe that w ∈ L

a n a 0 0

= La

(n−1)

.

iff aw

L iff w

La iff w ∈ La .)

We obtain the algorithm of Table 7.9. If the master automaton has n states reachable from q, then the operation has complexity O(n). Example 7.9 Consider again the multi-DFA at the top of Figure 7.5 without the blue states. The tree of recursive calls at the bottom of the ﬁgure graphically describes the run of comp(4, 12) (that is, we compute the node for the complement of the language recognized from state 12, which has length 4). For instance, comp(1,2) generates two recursive calls, ﬁrst comp(0,1) (the a-successor of 2), and then comp(0,0). The calls returs 0 and 1, respectively, and so comp(1,2) returns 3. Observe

118

CHAPTER 7. FINITE UNIVERSES

15 b 14 a 8 a a 5 a 2 b a

12 b 9 b 6 a b

13 b 10 b a 7 a, b 4 a, b 1 a 11 a, b

b a

3 b a

12, 13 → 15

8, 11 → 8

9, 10 → 14

5, 7 → 5

0, 7 → 0

5, 7 → 5

7, 6 → 6

2, 4 → 2

3, 4 → 3

4, 2 → 2

4, 4 → 4

1, 1 → 1

0, 1 → 0

0, 1 → 0

1, 1 → 1

1, 1 → 1 1, 0 → 0

1, 1 → 1

1, 1 → 1

Figure 7.4: An execution of inter.

7.3. OPERATIONS ON FIXED-LENGTH LANGUAGES comp(n, q) Input: length n, state q of length n n Output: state recognizing L(q) 1 if G(n, q) is not empty then return G(q) 2 if n = 0 and q = q∅ then return q 3 else if n = 0 and q = q then return q∅ 4 else / ∗ n ≥ 1 ∗ / 5 for all i = 1, . . . , m do ri ← comp(n − 1, qai ) 6 G(q) ← make(r1 , . . . , rm ) 7 return G(n, q) Table 7.2: Algorithm comp

119

how the call comp(2,0) returns 7, the state accepting {a, b}2 . Pink nodes correspond again to calls for which comp just looks up the result in G. Green nodes correspond to calls whose result is directly computed by a more eﬃcient version of comp that applies the following rule: if comp(i, j) returns k, then comp(i, k) returns j.

Fixed-length emptiness. A ﬁxed-language language is empty if and only if the node representing it has q∅ as state identiﬁer, and so emptiness can be checked in constant time.. Fixed-length universality. A language L of length n is ﬁxed-length universal if L = Σn . The universality of a language of length n recognized by a state q can be checked in time O(n). It suﬃces to check for all states reachable from q, with the exception of q∅ , that no transition leaving them leads to q∅ . More systematically, we use the properties • if L = ∅, then L is not universal; • if L = { }, then L is universal; • if ∅ L { }, then L is universal iff La is universal for every a ∈ Σ;

that lead to the algorithm of Table 7.3. Fixed-length inclusion. Given two languages L1 , L2 ⊆ Σn , in order to check L1 ⊆ L2 we compute L1 ∩ L2 and check whether it is equal to L1 using the equality check shown next. The complexity is dominated by the complexity of computing the intersection.

. 0 → 1 0. b 4 a.5: An execution of comp. 0 → 1 0. b b 3 b a 4: 12 → 17 3: 8 → 15 3: 9 → 16 2: 5 → 14 2: 0 → 7 2: 5 → 14 2: 7 → 0 1: 2 → 3 1: 3 → 2 1: 0 → 4 1: 0 → 4 0: 1 → 0 0. 0 → 1 Figure 7. FINITE UNIVERSES 17 b 16 a a 15 a 14 b 8 a a 5 a 2 b a 12 b 9 a b 6 a b 13 a 10 b a 7 a. b 1 11 a.120 CHAPTER 7.

q2 ) ← and(eq(qa1 . .4. two languages are equal if and only if the nodes representing them have the same state identiﬁer. q2 ) is not empty then return G(q1 . .4 Determinization and Minimization Let L be a ﬁxed-length language. and that both tables assign state identiﬁers q∅1 and q∅2 to the empty language. q2 ) ← false 4 else if q1 q∅1 and q2 = q∅2 then G(q1 . q2 ) Input: states q1 . F) be a NFA recognizing L. In other . and let A = (Q. false otherwise 1 if G(q1 . DETERMINIZATION AND MINIMIZATION univ(q) Input: state q Output: true if L(q) is ﬁxed-length universal.7. If they come from two diﬀerent tables T 1 . then. leading to a constant time algorithm. q2 ) ← true 3 else if q1 = q∅1 and q2 q∅2 then G(q1 .4. Σ. . T 2 . q2 ) 2 if q1 = q∅1 and q2 = q∅2 then G(q1 . qa1 ).5 returns the state of the master automaton recognizing L. q2 of diﬀerent tables Output: true if L(q1 ) = L(q2 ).4: Algorithm eq2 7. . Since the minimal DFA recognizing a language is unique. q0 . This is done by algorithm eq2 of Table 7. univ(qam )) 6 return G(q) Table 7. This solution. The algorithm det&min(A) shown in Table 7. since state identiﬁers can be assigned in both tables in diﬀerent ways. eq(qam . eq2(q1 . δ. false otherwise 1 if G(q) is not empty then return G(q) 2 if q = q∅ then return false 3 else if q = q then return true 4 else / ∗ q q∅ and q q ∗ / 5 G(q) ← and(univ(qa1 ). q2 ) Table 7. however. . it is necessary to check if the DFA rooted at the states q1 and q2 are isomorphic. .3: Algorithm univ 121 Fixed-length equality. . qam )) 1 2 1 2 7 return G(q1 . assumes that the two input nodes come from the same table. q2 ) ← false 5 else / ∗ q1 q∅1 and q2 q∅2 ∗ / 6 G(q1 . . which assumes that qi belongs to a table T i .

• if S ∩ F ∅ then L(S ) = { } (since the states of S recognize ﬁxed-length languages. Given a set S of states of A. In Chapter 6 we assumed that if an element of X is encoded by w ∈ Σ∗ . sets of states are written without the usual parenthesis (e. Observe. det&Min(A) just calls state({q0 }). The run of det&min is shown at the bottom of the ﬁgure. in the worst case. since all the states of S recognize languages of the same length and S ∩ F ∅. γ instead of {β. FINITE UNIVERSES words. When state(S ) is computed for the ﬁrst time. With this assumption. We make the assumption that for every state q of A there is a path leading from q to some ﬁnal state. w ∈ {0. all recognizing languages of the same length. β.5. The algorithm actually solves a more general problem. det&min(A) simultaneously determinizes and minimizes A. its minimal u u v DFA has O(2n ) states: for every u. NFAs for ﬁxed-length languages usually satisfy the property by construction. for the sake of readability. The projection.6 shows a NFA (upper left) and the result of applying det&min to it. n • if S ∅ and S ∩ F = ∅. We start with an observation on encodings. because. and post operations can be however implemented more eﬃciently as in Chapter 6. The heart of the algorithm is a procedure state(S ) that returns the state recognizing L(S ). where S i = δ(S . To show that an exponential blowup is unavoidable. then L(S ) = i=1 ai · L(S i ). γ}. ai ). where Ln = {ww | w. it may call state(S) for every set S ⊆ Q. The algorithm has exponential complexity.122 CHAPTER 7. y) ∈ X × X has an encoding (w x . because v ∈ Ln but v Ln . because both have the states 2 and 3 as a-successor and b-successor.5 Operations on Fixed-length Relations Fixed-length relations can be manipulated very similarly to ﬁxed-length languages. wy ) such that w x and wy have the same length. Boolean operations are as for ﬁxed-length languages. join. in applications. consider the family {Ln }n≥0 of languages. These properties lead to the recursive algorithm of Table 7. we have L(S ) = { }). L(S ) satisﬁes: • if S = ∅ then L(S ) = ∅. γ}.g. where # is the padding letter. the result is memoized in G(S ). but usually it is not necessary. v ∈ Σn if u v then Ln Lv . for instance. that the algorithm assigns to {γ} the same node as to {β. the states of F necessarily recognize { }. and any subsequent call directly reads the result from G. While Ln can be recognized be an NFAs of size O(n2 ). 7. 1}n and w w }. . where. This ensures that every pair (x. The procedure state(S ) uses a table G of results. respectively. Example 7. the language L(S ) = q∈S L(q) has also this common length. Since L = L({q0 }). initially empty. This assumption can be enforced by suitable preprocessing. pre.10 Figure 7. then it is also encoded by w#.

.ζ → 3 ∅→0 η→1 η→1 η. b γ 5 a. b 4 b 3 a. θ → 1 Figure 7. γ → 4 γ→4 →2 δ. b 1 α→5 β.5. ζ → 3 →2 . OPERATIONS ON FIXED-LENGTH RELATIONS 123 α a β b b δ b η b a θ b a a.6: Run of det&min on an NFA for a ﬁxed-length language .7. θ → 1 η→1 η. b b ζ 2 b a a.

too large when the table is only sparsely ﬁlled. FINITE UNIVERSES state(S ) Input: set S ⊆ Q recognizing languages of the same length Output: state recognizing L(S ) 1 if G(S ) is not empty then return G(S ) 2 else if S = ∅ then return q∅ 3 else if S ∩ F ∅ then return q 4 else / ∗ S ∅ and S ∩ F = ∅ ∗ / 5 for all i = 1. then so does R[a. ε)}. The master transducer over the alphabet Σ is the tuple MT = (Q M . Given a language R ⊆ Σ∗ × Σ∗ and a. .5: Algorithm det&min. . state(S m )). . the minimal deterministic transducer recognizing a ﬁxed-length relation R is the fragment of the master transducer containing the states reachable from R. w2 = b1 . . a row of the table has |Σ|2 entries. y) ∈ R}. bw2 ) ∈ R}. As in the language case. wy ) | (x. b ∈ Σ.124 det&min(A) Input: NFA A = (Q. Notice that in particular.b] . (an . .b] = ∅. [a. . ai ) 6 G(S ) ← make(state(S 1 ). A word relation R ⊆ Σ∗ × Σ∗ has length n ≥ 0 if for all pairs (w1 . Σ × Σ. and that if R has ﬁxed-length. then we say that R has ﬁxed-length Recall that a transducer T accepts a pair (w1 . The basic deﬁnitions on ﬁxed-length languages extend easily to ﬁxed-length relations. Σ. w2 ) ∈ Σ∗ × Σ∗ | (aw1 . where r is an −− − − → → auxiliary state with exactly one input and one output transition. F) Output: master state recognizing L(A) 1 return state({q0 }) CHAPTER 7. and δ M : Q M × (Σ × Σ) → Q M is given by δ M (R. δ. . A ﬁxed-length transducer accepts a relation R ⊆ X × X if it recognizes the word relation {(w x . and T accepts the word (a1 .b] for every q ∈ Q M and a. F M = {(ε. Like minimal DFA. where Q M is is the set of all ﬁxed-length relations. . Since in the ﬁxed-length case all shortest encodings have the same length. an . ∅[a. b1 ) . F M ). Sparse transducers over Σ × Σ are better encoded as NFAs over Σ by introducing auxiliary states: a transition q − − → q of the transducer is “simulated” by two transitions q − r − q . b]) = R[a. w2 ) ∈ R the words w1 and w2 have length n. bn ) ∈ Σ∗ × Σ∗ . the padding symbol is no longer necessary. If R has length n for some n ≥ 0. . we deﬁne R[a. . . δ M . one for each letter of Σ×Σ.b] = {(w1 . bn . minimal deterministic transducers are represented as tables of nodes. [a. . However. m do S i ← δ(S . q0 .b] a b . 7 return G(S ) Table 7. a remark is in order: since a state of a deterministic transducer has |Σ|2 successors. w2 ) ∈ Σ∗ × Σ∗ if w1 = a1 . . . b ∈ Σ. So in this section we assume that each word or pair has exactly one encoding.

and the projection onto the second component to postR (Σ∗ ). Recall that in the ﬁxed-length case we do not need any padding symbol.a1 ] . Then. . bw2 ) | (w1 . OPERATIONS ON FIXED-LENGTH RELATIONS 125 Fixed-length projection The implementation of the projection operation of Chapter 6 may yield a nondterministic transducer. b] R = {(aw1 . preR (L) admits an inductive deﬁnition that we now derive. r2 m G(r1 . rm. which in turn can be seen as variants of join.11). We have the following properties: • ∅ ◦ R = R ◦ ∅ = ∅. .m ) return G(r1 . .c∈Σ which lead to the algorithm of Figure 7. Let [a.7. r2 ) [a .a ] [a [a ri. where union is deﬁned similarly to inter. even if the initial transducer is deterministic. . r2 ) Input: states r1 . So we need a diﬀerent implementation. The complexity is exponential in the worst case: if t(n) denotes the worst-case complexity for two states of length n. We give a recursive deﬁnition of R1 ◦ R2 . a j ) ∈ Σ × Σ do 6 7 8 [a . r2 ) 2 if r1 = q∅ or r2 = q∅ then return q∅ 3 else if r1 = q and r2 = q then return q 4 else / ∗ q∅ r1 q and q∅ r2 q ∗ / 5 for all (ai . . join(r1 .1 . • R1 ◦ R2 = [a. given a ﬁxed-length language L and a relation R. which is a special case of pre and post.b.5. b] · R[a. r2 ) = make(r1. .c] ◦ R[c. r2 ) is not empty then return G(r1 . ε] } = { [ε. We observe that projection can be reduced to pre or post: the projection of a binary relation R onto its ﬁrst component is equal to preR (Σ∗ ). because union has quadratic worst-case complexity.7. Fixed-length join.7: Algorithm join Fixed-length pre and post. .a j ] Figure 7. We prove it later for the projection operation (see Example 7. We have the following properties: . . So we defer dealing with projection until the implementation of pre and post have been discussed. j ← union join r1 i . r2 1 j . . This exponential blowup is unavoidable. 1 2 a. then we have t(n) = O(t(n − 1)2 ). • { [ε.b] . ε] }. . ε] } ◦ { [ε. join r1 i .am ] . r2 of transducer table Output: state recognizing L(r1 ) ◦ L(r2 ) 1 if G(r1 . . w2 ) ∈ R}. .

am ] . FINITE UNIVERSES a · preR[a. bw2 ] ∈ R ⇔ ∃b ∈ Σ ∃w2 ∈ Lb : [w1 . pre(r. q) 2 if r = r∅ or q = q∅ then return q∅ 3 else if r = r and q = q then return q 4 else 5 for all ai ∈ Σ do 6 qi ← union pre r[ai . It suﬃces to give a dedicated algorithm for preR (Σ∗ ). and so every word of preR (L) also has length at least one. r) Table 7.b] | [a. .b∈Σ R[a. shown in Table 7. then preR (L) = CHAPTER 7. .b∈Σ a · preR[a. ] } and L = { }. given an arbitrary word aw1 ∈ ΣΣ∗ . state q of automaton table Output: state recognizing preL(r) (L(q)) 1 if G(r. we have aw1 ∈ preR (L) ⇔ ∃bw2 ∈ L : [aw1 . . observe that all pairs of R have length at least one. then preR (L) = { }. The ﬁrst two properties are obvious. qm ) 8 return G(q.6: Algorithm pre. Now.b] ⇔ ∃b ∈ Σ : w1 ∈ preR[a.b] (L ) These properties lead to the recursive algorithm of Table 7. b]w ∈ R}. .b] (Lb ). Algorithm pro1 has exponential worst-case complexity.b] (Lb ) ⇔ aw1 ∈ a · preR[a. . we can now give an implementation of the operation that projects a relation R onto its ﬁrst component. The transducer table is not changed by the algorithm. then preR (L) = ∅.126 • if R = ∅ or L = ∅. . q) Input: state r of transducer table. pre r[ai . The next example shows that projection is inherently exponential.5. . a. q) is not empty then return G(r. b which accepts as inputs a state of the transducer table for a relation R and a state of the automaton table for a language L. . qa1 . the reason is the quadratic blowup introduced by union when the recursion depth increases by one.5. r) ← make(q1 . w2 ] ∈ R[a. qam 7 G(q. . For the last one. As in the case of join. ]} and ∅ = {w ∈ (Σ × L Σ)∗ { }. • if R = { [ .b] (Lb ) b∈Σ and so preR (L) = a. As promised. and returns the state of the automaton table recognizing preR (L). • if ∅ where R {[ .a1 ] .

. . pro1 r[ai .6 Decision Diagrams Binary Decision Diagrams. In this section we show that they can be seen as minimal automata of a certain kind. 1). . BDDs for short.11 Consider the relation R ⊆ Σ2n × Σ2n given by R = {(w1 xw2 yw3 . .7. . . 1}n such that f (b1 .6. and whose second word contains only 0’s but for two 1’s at the same two positions. . but not completely equal. as shown when discussing det&min.7: Algorithm pro1 . pre. |w2 | = n and |w1 w3 | = n − 2} . and then reads (y.am ] 7 G(r) ← make(q1 . . . w ∈ Σn and w w}. qm ) 8 return G(r) Table 7. . where y x). bn ) = 1. . The minimal DFA recognizing L f is very similar to the BDD representing f . . let L f denote the set of strings b1 b2 . 127 That is. Consider the following minimal DFA for a language of length four: . Slight modiﬁcations of this example show that join. . bn ∈ {0. So any algorithm for √ projection has Ω(2 n ) complexity. . DECISION DIAGRAMS pro1 (r) Input: state r of transducer table Output: state recognizing proj1 (L(r)) 1 if G(r) is not empty then return G(r) 2 if r = r∅ then return q∅ 3 else if r = r then return q 4 else 5 for all ai ∈ Σ do 6 qi ← union pro1 r[ai . has O(2n ) states. xn ) : {0. are a very popular data structure for the representation and manipulation of boolean functions. Given a boolean function f (x1 . . Example 7. We modify the constructions of the last section to obtain an exact match. and post are inherently exponential as well. 0). it memorizes the letter x above the ﬁrst 1. reads n − 1 letters of the form (z. we have pro1 (R) = {ww | w. On the other hand. 1}n → {0. . 7. .a1 ] . 1}. R contains all pairs of words of length 2n whose ﬁrst word has a position i ≤ n such that the letters at positions i and i + n are distinct. It is easy to see that the minimal deterministic transducer for R has O(n2 ) states (intuitively. whose minimal DFA. 0|w1 | 10|w2 | 10|w3 | ) | x y.

. accept any word of length three.128 a q1 a q0 b q2 b q4 b b a a q3 b a CHAPTER 7. and can be obtained by repeatedly applying to the minimal DFA the following reduction rule: q1 l1 ••• qn l2 q a1 ••• am q1 l1 · Σ r ••• qn l2 · Σ r The converse direction also works: the minimal DFA can be recovered from the zip-automaton by “reversing” the rule. bypassing the minimal DFAs.1 shows that this zip-automaton (as we call it) is unique. and convert the result back. This already allows us to use zip-automata as a data structure for ﬁxedlength languages. ΣΣ n . accept any word of length 2. Section 7. conduct the operation.6.2 shows how to do better by directly deﬁning the operations on zip-automata. expand them to minimal DFAs. accept any two-letter word whose last letter is a b. FINITE UNIVERSES q5 b b q6 a q7 Its language can be described as follows: after reading an a. Following this description.1 Zip-automata and Kernels A zip over an alphabet Σ is a regular expression of the form aΣn = a ΣΣΣ . but only through conversion to minimal DFAs: to compute an operation using zip-automata.6. the language can also be more compactly described by an automaton with regular expressions as transitions: r0 b r1 b·Σ r2 a · Σ3 a · Σ2 b r3 Section 7.6. after reading bb. after reading ba. . 7.

b}. bba. ab.8 shows a fragment of the master (compare with Figure 7. K a ) for every kernel K and a ∈ Σ. and {ab. q0 .1). baa.6. and a transition (K. F) is deterministic if for every q ∈ Q and a ∈ Σ there is exactly one k ∈ N such that δ(q. It is easy to see that a ﬁxed-length language is recognized by a zip-DFA if and only it is a kernel: Deﬁnition 7. b ∅ a. the kernel { } as unique ﬁnal state. baa. A zip-NFA A ∈ (Q. Σ. b ∈ Σ such that La Lb . A zip-DFA is minimal if no other zip-DFA for . aΣk ) ∅. δ. bab.13 Figure 7.8: A fragment of the master zip-automaton The zip-DFA AK for a kernel K is the fragment of the master containing the states reachable from K. b Figure 7. aba. bab. ab. aΣk . The languages {a. Example 7.7. {aa. DECISION DIAGRAMS 129 (aΣΣΣ . bb} b {aab. bbb} a bΣ2 {aa. ΣΣ looks a bit like a zip). . where k is equal to the length of K a minus the length of K a . abb.1 are not kernels. aab. A zip-NFA is a NFA whose transitions are labelled by zips. ba. The master zip-automaton (we call it just “the master”) has the set of all kernels as states. bb} of Figure 7. The kernel of a ﬁxed-language L is the unique kernel L satisfying L = Σk L for some k ≥ 0.12 A ﬁxed-language L is a kernel if it is empty or if there are a. bbb} b aΣ { } a. . It is easy to see that AK recognizes K. ab. ba} b aΣ {a} b a b a aΣ {b} {aa. bb}. and so they are not states of the master either. {aaa.

proves that they are minimal. both L(r) and L(r ) are kernels. where l and K are unknowns and K must be a kernel. By (1 ⇒). (1 ⇐): We show that two zip-DFAs A and A that satisfy (i) and (ii) and recognize the same language are isomorphic. By (1). assume that some state q does not recognize a kernel. together with (1 ⇒). Since L(A) is a kernel. It suﬃces to prove that if two states q. It follows that all of them necessarily recognize L(q) . q ). But then we necessarily have L(r) = K a = L(q ). L(q)a = L(q)b for every b ∈ Σ. The resulting zip-DFA recognizes the same language as A. is K = ( K a ). because the only solution to the equation K = aΣl K . then for every a ∈ Σ the (unique) transitions (q. contradicting the hypothesis that it has been applied exhaustively. Then L(q)a = Σk L(q ). and therefore the target states of all outgoing transitions of q recognize kernels. Let k be the smallest number such that A contains a transition (q. (3) Let B be a zip-DFA obtained by exhaustively applying the reduction rule to A. aΣk . For (ii) observe that. and (ii) distinct states of A recognize distinct kernels. since L(q) is not a kernel. we remove q and any other state no longer reachable form the initial state (recall that q is neither initial nor ﬁnal). Since B contains at most one state recognizing L(q) . all outgoing transitions of q have the same target. and so A is not minimal. For (i). assume A contains a state q such that L(q) is not a kernel. r) and (q . aΣk . If two distinct states of A recognize the same kernel then the quotient has fewer states than A. So we have L(q) = a∈Σ a Σk L(q ) = Σk+1 L(q ). bΣl+k+1 . q is neither initial nor ﬁnal. (3) The result of exhaustively applying the reduction rule to the minimal DFA recognizing a ﬁxed-length language L is the minimal zip-DFA recognizing L . So it is a minimal zip-DFA. and. The following proposition shows that minimal zip-DFAs have very similar properties to minimal DFAs: Proposition 7. (2) AK is the unique minimal zip-DFA recognizing K.14 (1) A zip-DFA A that recognizes a kernel is minimal if and only if (i) every state of A recognizes a kernel. Now we perform the following two operations: ﬁrst. which. Proof: (1 ⇒): For (i). So A is not minimal. and so the reduction rule can be applied to q. it suﬃces to prove that B satisﬁes (i) and (ii). since every state of A recognizes a diﬀerent language. Without loss of generality. q) of A by a transition (q . aΣk . we can choose L(q) of minimal length. q ) for some letter a and some state q . observe that the quotienting operation can be deﬁned for zip-DFAs as for DFAs. q of A and A satisfy L(q) = L(q ). then. For (ii). allowing us to merge states that recognize the same kernel without chaning the language. we replace every transition (q . and has at least one state less. Uniqueness follows from the proof of (1 ⇐). so does every state of B (the reduction rule preserves the recognized languages).130 CHAPTER 7. . FINITE UNIVERSES the same language has fewer states. Let L(q) = K = L(q ). r ) satisfy k = k and L(r) = L(r ). (2) AK recognizes K and it satisﬁes conditions (a) and (b) of part (1) by deﬁnition. bΣl .

If we only now that the a. The table for the multi-zip-DFA of Figure 7. . L2 . and L3 = {aa.6. Lm } is represented by the states of the master recognizing L1 .6. .9 shows the multi-zip-DFA for the set {L1 . the length of L1 . DECISION DIAGRAMS 131 L2 6 L1 2 a a bΣ b 1 L3 4 Figure 7. bb}. However. state 6 are states 2 and 1. and L3 = {ab. .7. s . Observe that the states and the length completely determine L. Observe that. A set L = {L1 .15 Figure 7. L3 } with L1 = {aa}. Lm and by the common length of L1 . . . l. Example 7. . .9: The multi-zDFA for {L1 . Lm . once we know that state 6 accepts a language of length 2. L2 . or aΣ and bΣ2 . L2 has a diﬀerent length than L1 and L3 . and s = (q1 . L2 . 7. l is a length. L2 = {aa.2 Operations on Kernels We use multi-zip-DFAs to represent sets of ﬁxed-length languages of the same length. . ba}.9 is: Ident. qm ) is the successor tuple of the kernode. Multi-zip-DFAs are represented as a table of kernodes. . . the labels are a and bΣ. Recall that L1 = {aa. bb}. . L3 } of Example 7. L2 and L3 have the same length. we can deduce the correct labels: since states 2 and 1 accept languages of length 1 and 0.and b-successors of. or aΣn and bΣn+1 for any n ≥ 0. where q is a state identiﬁer. say. respectively. . bb}. .2.7. The multi-zip-DFA is the result of applying the reduction rule to the multi-automaton of Figure 7. A kernode is a triple q. . 2 4 6 Length 1 1 2 a-succ 1 0 2 b-succ 0 1 1 This example explains the role of the new length ﬁeld. . ab}. L2 = {aa. we cannot infer which zips label the transitions from 6 to 2 and from 6 to 1: they could be a and bΣ. . . We represent the set by the multi-zip-DFA and the number 2. L3 . ba. while L1 .

If the lengths of K1 and K2 are l1 . s) returns q. . For instance. • if K1 = ∅ or K2 = ∅. then we have L = m ai Σk K for some k. Let Ki be the kernel recognized by the i-th component of s. . Lam and applying kmake. s) returns the kernode for L . . 2) creates a new node having 2 as a-successor and b-successor. then kmake(l. L2 . These states can be computed recursively by means of: • if l1 < l2 then • if l1 > l2 then • if l1 = l2 then (Σl2 −l1 K1 ∩ K2 )a (K1 ∩ Σl1 −l2 K2 )a (K1 ∩ K2 )a = = = a Σl2 −l1 −1 K1 ∩ K2 a K1 ∩ Σl1 −l2 −1 K2 a a K1 ∩ K2 = = = K1 a K1 a K1 a K2 . K2 of languages L1 . . then kmake(l. if no such kernode exists. . and for simultaneous determinization and minimization. t) returns the kernode for K. All algorithms call a procedure kmake(l. Algorithms. . where L is the unique language of length l such that Lai = Ki for every ai ∈ Σ. while make(2.132 CHAPTER 7. and. and was obtained by recursively computing the states for La1 . the state of the zip-master automaton for L is L . s). Then kmake(l. Now. Km are all equal to some kernel K. (K1 ∩ Σl2 −l1 K2 )a if l1 < l2 if l1 > l2 if l1 = l2 (K1 ∩ K2 )a K1 which allows us to obtain the state for K1 (Σl1 −l2 K1 ∩ K2 )a K2 by computing states for or for every a ∈ Σ. . . Fixed-length intersection. and i=1 therefore L = Σl+1 K = K. s) creates a new kernode q. then kmake(3. def 2 We have the obvious property K2 = ∅. If K1 . . . l. s . l2 . K2 a K2 which leads to the algorithm of Table 7. complement. . Given kernels K1 . . The algorithms for operations on kernels are modiﬁcations of the algorithms of the previous section. and can be obtained by recursively computing states for La1 . then kmake(l. then K1 Assume now K1 ∅ every k. and applying kmake. In the previous section. Lam and then applying make. 2 is well deﬁned because L1 = L1 and L2 = L2 implies L1 ∩ L2 = L1 ∩ L2 . So kmake(l. . s with a fresh identiﬁer q. and returns q. FINITE UNIVERSES The procedure kmake(l. if T is the table above. s) with the following speciﬁcation. If Ki K j for some i. l. the state of the master automaton for a language L was the language L itself. We show how to modify the algorithms for intersection. j. we compute the state recognizing K1 K2 = L1 ∩ L2 .8. L we have K2 . . (2. then since Σk L = L holds for l2 −l1 Σ K1 ∩ K2 K2 = K1 ∩ Σl1 −l2 K2 K ∩K 1 2 . s) behaves like make(s): if the current table already contains a kernode q. 2)) returns 3. .

and so ∅ = Σn Σm = L for n m.10. q2 ) 2 if q1 = q∅ or q2 = q∅ then return q∅ 3 if q1 q∅ and q2 q∅ then 4 if l1 < l2 /* lengths of the kernodes for q1 . . q2 ) Table 7. and need not be executed. and the rest of the ﬁgure describes the run of kinter on it.8 shows a run of inter on the two languages represented by the multiDFA at the top of Figure 7. . m do ri ← kinter(q1 . and so K1 so K1 K2 = K2 .16 Example 7. The subscript n is n m only necessary beacuse ∅ has all possible lengths. rm ) 13 return G(q1 . m do ri ← kinter(qai . . rm ) 10 else /* l1 = l2 */ 11 for all i = 1. The algorithm can be improved by observing that two further properties hold: • if K1 = {ε} then L1 ∩ L2 = L1 . . . and These properties imply that kinter(qε . m do ri ← kinter(qai . . Recall that pink nodes correspond to calls whose result has already been memoized. . . where n is the length of L. and if K2 = {ε} then L1 ∩ L2 = L2 . we n wish to compute the master state recognizing L . q2 */ then 5 for all i = 1. q2 ) is not empty then return G(q1 . . rm ) 7 else if l1 l2 then 8 for all i = 1. q2 ) ← kmake(l2 . . . qai ) 1 2 12 G(q1 . . Given the kernel K of a ﬁxed-language languages L of length n.4. qε ) for every state q.10 correspond to calls whose result is immediately returned with the help of this check. . and so the subscript is not needed anymore. The meaning of the green nodes is explained below. . Observe how this improvement has a substantial eﬀect. . The multi-zip-DFA for the same languages is shown at the top of Figure 7. q2 ) ← kmake(l1 . r1 .6. reducing the number of calls from 19 to only 5. The green nodes in Figure 7. We deﬁne the . L2 Output: state recognizing L1 ∩ L2 1 if G(q1 . DECISION DIAGRAMS kinter(q1 . So we can improve kinter by explicitly checking if one of the arguments is qε . q2 ) 1 9 G(q1 . . . . . qai ) 2 6 G(q1 . . q2 ) Input: states q1 . . r1 . Fixed-length complement. . .8: Algorithm kinter 133 Example 7. r1 . Now n we have ∅ = { } for every n ≥ 0. q2 recognizing L1 .7. q) = q = kinter(q. . K2 = K1 . q2 ) ← kmake(l1 .

1 → 0 0. 1 → 5 1. 1 → 1 1. 1 → 1 1. 1 → 0 1. 10 → 14 5.134 CHAPTER 7. 1 → 1 0. 1 → 1 0. 1 → 3 1. 1 → 0 0. 0 → 0 Figure 7. .10: An execution of kinter. 1 → 0 5. 1 → 5 0. 1 → 8 9. 2 → 2 1. FINITE UNIVERSES 15 b 14 a 8 a a 5 a 2 b a 12 b 9 bΣ2 6 a bΣ 3 b a 1 13 b 10 b aΣ2 aΣ3 b a 12. 6 → 6 2. 1 → 2 3. 0 → 0 0. 13 → 15 8.

6.7. m do ri ← kcomp(qai ) 6 G(q) ← kmake(r1 .3 We obtain the state for K by recursively computing states for K a by means of the properties • if K = ∅ then K = { }. • if ∅ K { } then K a = K a . It is shown in Table 7. . Exercises Exercise 65 3 Prove that the minimal DFAs for the languages of length 3 have at most 8 states. The algorithm kdet&min that converts a NFA recognizing a ﬁxed-language L into the minimal zip-DFA recognizing L diﬀers from det&min() essentially in one letter: it uses kmake instead of make.9. . Since the two recursive calls kstate({η}) and kstate({η. kcomp(q) Input: state q recognizing a kernel K Output: state recognizing K 1 if G(q) is not empty then return G(q) 2 if q = q∅ then return q 3 else if q = q then return q∅ 4 else 5 for all i = 1. and the minimal zip-DFA for the kernel of its language. .10. .17 Figure 7. Example 7. . rm ) 7 return G(q) Table 7. θ}) return both state 1 with length 1. ζ}). . For the diﬀerence with det&min(A). The run of kdet&min(A) is shown at the bottom of the ﬁgure. The operator is well deﬁned because L = L implies L = L . which lead to the algorithm of Table 7.9: Algorithm kcomp Determinization and Minimization. . as make(1. . kmake(1. 1) does not create a new state. .11 shows again the NFA of Figure 7. The same occurs at the top call kstate({α}). DECISION DIAGRAMS 135 operator on kernels by K = L . then K = ∅.6. consider the call kstate({δ. and if K = { }. . 1) would do it returns state 1.

136

CHAPTER 7. FINITE UNIVERSES

α a β b b δ b η b a θ b a a, b b ζ 2 b 1 a bΣ a, b γ 3

α→3

β, γ → 3

γ→3

→2

δ, , ζ → 1

→2

,ζ → 1

∅→0

η→1

η→1

η, θ → 1

η→1

η, θ → 1

Figure 7.11:

7.6. DECISION DIAGRAMS kdet&min(A) Input: NFA A = (Q, Σ, δ, q0 , F) Output: state of a multi-DFA recognizing L(A) 1 return state[A]({q0 }) kstate(S ) Input: set S of states of length l Output: state recognizing L(R) 1 if G(S ) is not empty then return G(S ) 2 else if S = ∅ then return q∅ 3 else if S ∩ F ∅ then return q 4 else / ∗ S ∅ and S ∩ F = ∅ ∗ / 5 for all i = 1, . . . , m do S i ← δ(S , ai ) 6 G(S ) ← kmake(l, kstate(S 1 ), . . . , kstate(S m )); 7 return G(S ) Table 7.10: The algorithm kdet&min(A).

137

Exercise 66 Give an eﬃcient algorithm that receives as input the minimal DFA of a ﬁxed-length language and returns the number of words it contains. Exercise 67 Let Σ = {0, 1}. Given a, b ∈ Σ, let a · b be the usual multiplication (an analog of boolean and) and let a ⊕ b be 0 if a = b = 0 and 1 otherwise (an analog of boolean or). Consider the boolean function f : Σ6 → Σ deﬁned by f (x1 , x2 , x3 , x4 , x5 , x6 ) = (x1 · x2 ) ⊕ (x3 · x4 ) ⊕ (x5 · x6 ) 1. Construct the minimal DFA recognizing {x1 x2 x3 x4 x5 x6 | f (x1 , x2 , x3 , x4 , x5 , x6 ) = 1}. (For instance, the DFA accepts 111000 because f (1, 1, 1, 0, 0, 0) = 1, but not 101010, because f (1, 0, 1, 0, 1, 0) = 0.) 2. Construct the minimal DFA recognizing {x1 x3 x5 x2 x4 x6 | f (x1 , x2 , x3 , x4 , x5 , x6 ) = 1}. (Notice the diﬀerent order! Now the DFA accepts neither 111000, because f (1, 0, 1, 0, 1, 0) = 0, nor 101010, because f (1, 0, 0, 1, 1, 0) = 0.) 3. More generally, consider the function f (x1 , . . . , x2n ) = (x2k−1 · x2k )

1≤k≤n

and the languages {x1 x2 . . . x2n−1 x2n | f (x1 , . . . , x2n ) = 1} and {x1 x3 . . . x2n−1 x2 x4 . . . x2n | f (x1 , . . . , x2n ) = 1}.

138

CHAPTER 7. FINITE UNIVERSES Show that the size of the minimal DFA grows linearly in n for the ﬁrst language, and exponentially in n for the second language.

Exercise 68 Given a language L of length n and a permutation π of {1, . . . , n}, we deﬁne π(L) = {aπ(1) . . . aπ(n) | a1 a2 . . . an ∈ L}. Prove that the following problem is NP-complete: Given: A minimal DFA A recognizing a ﬁxed-length language, a number k ≥ 1. Decide: Is there a permutation π of {1, . . . , n}, where n is the length of L(A), such that the minimal DFA for L(π(A)) has at most k states? Exercise 69 Given X ⊂ {0, 1, . . . , 2k − 1}, let AX be the minimal DFA recognizing the lsbf encodings of length k of the elements of X. • Deﬁne X +1 by X +1 = {x+1 mod 2k | x ∈ X}. Give an algorithm that on input AX produces AX+1 as output. • Let AX = (Q, {0, 1}, δ, q0 , F). Which is the set of numbers recognized by the automaton (Q, {0, 1}, δ , q0 , F), where δ (q, b) = δ(q, 1 − b)? Exercise 70 Recall the deﬁnition of DFAs with negative transitions (DFA-nt’s) introduced in Exercise 26, and consider the alphabet {0, 1}. Show that if only transitions labelled by 1 can be negative, then every regular language over {0, 1} has a unique minimal DFA-nt. Exercise 71 Deﬁne emb(L) = {[v1 , v2 ] ∈ (Σ × Σ)n | v2 ∈ L} and deﬁne preS (L), where S ∈ (Σ × Σ)∗ and L ∈ Σ∗ , as follows: preS (L) = {w1 ∈ Σn | ∃[v1 , v2 ] ∈ S : v1 = w1 and v2 ∈ L} Show that preS (L) Exercise 72

a

=

b∈Σ

pre [a, b] (Lb ). S

Exhibit a family of relations Rn ⊆ {0, 1}n × {0, 1}n such that

• the minimal deterministic transducer for Rn has O(n) states, and • the minimal DFA for proj1 (Rn ) has 2O(n) states.

Chapter 8

Applications II: Veriﬁcation

One of the main applications of automata theory is the automatic veriﬁcation or falsiﬁcation of correctness properties of hardware or software systems. Given a system (like a hardware circuit, a program, or a communication protocol), and a property (like“after termination the values of the variables x and y are equal” or “every sent message is eventually received”), we wish to automatically determine whether the system satisﬁes the property or not.

8.1

The Automata-Theoretic Approach to Veriﬁcation

We consider discrete systems for which a notion of conﬁguration can be deﬁned1 . The system is always at a certain conﬁguration, with instantaneous moves from one conﬁguration to the next determined by the system dynamics. If the semantics allows a move from a conﬁguration c to another one c , then we say that c is a legal successor of c. A conﬁguration may have several successors, in which case the system is nondeterministic. There is a distinguished set of initial conﬁgurations. An execution is a sequence of conﬁgurations (ﬁnite or inﬁnite) starting at some initial conﬁguration, and in which every other conﬁguration is a legal successor of its predecessor in the sequence. A full execution is either an inﬁnite execution, or an execution whose last conﬁguration has no successors. In this chapter we are only interested in ﬁnite executions. The set of executions can then be seen as a language E ⊆ C ∗ , where the alphabet C is the set of possible conﬁgurations of the system. We call C ∗ the potential executions of the system. Example 8.1 As an example of a system, consider the following program with two boolean variables x, y:

We speak of the conﬁgurations of a system, and not of its states, in order to avoid confusion with the states of automata.

1

139

140

1 2 3 4 5

CHAPTER 8. APPLICATIONS II: VERIFICATION while x = 1 do if y = 1 then x←0 y←1−x end

A conﬁguration of the program is a triple [ , n x , ny ], where ∈ {1, 2, 3, 4, 5} is the current value of the program counter, and n x , ny ∈ {0, 1} are the current values of x and y. So the set C of conﬁgurations contains in this case 5 × 2 × 2 = 20 elements. The initial conﬁgurations are [1, 0, 0], [1, 0, 1], [1, 1, 0], [1, 1, 1], i.e., all conﬁgurations in which control is at line 1. The sequence [1, 1, 1] [2, 1, 1] [3, 1, 1] [4, 0, 1] [1, 0, 1] [5, 0, 1] is a full execution, while [1, 1, 0] [2, 1, 0] [4, 1, 0] [1, 1, 0] is also an execution, but not a full one. In fact, all the words of ( [1, 1, 0] [2, 1, 0] [4, 1, 0] )∗ are executions, and so the language E of all executions is inﬁnite. Assume we wish to determine whether the system has an execution satisfying some property of interest. If both the language E ⊆ C ∗ of executions and the language P ⊆ C ∗ of potential executions that satisfy the property are regular, and we can construct automata recognizing them, then we can solve the problem by checking whether the language E ∩ P is empty, which can be decided using the algorithms of Chapter 4. This is the main insight behind the automata-theoretic approach to veriﬁcation. The requirement that the language E of executions is regular is satisﬁed by all systems with ﬁnitely many reachable conﬁgurations (i.e., ﬁnitely many conﬁgurations c such that some execution leads from some initial conﬁguration to c). A system NFA recognizing the executions of the system can be easily obtained from the conﬁguration graph: the graph having the reachable conﬁgurations as nodes, and arcs from each conﬁguration to its successors. The construction is very simple: The states of the system NFA are the reachable conﬁgurations of the program plus a new state, which is also the initial state. For every transition c → c of the graph there is a transition c −→ c in the − NFA. All states are ﬁnal. Example 8.2 Figure 8.1 shows the conﬁguration graph of the program of Example 8.1, and below it its system NFA. Notice that the labels of the transitions of the NFA carry no information, because they are just the name of the target state. We wish to automatically determine if the system has a full execution such that initially y = 1, ﬁnally y = 0, and y never increases. Let [ , x, 0], [ , x, 1] stand for the sets of conﬁgurations where y = 0 and y = 1, respectively, but the values of and x are arbitrary. Similarly, let [5, x, 0] stand

c

x. 0. 0. 1] 3. THE AUTOMATA-THEORETIC APPROACH TO VERIFICATION 1. 1. 1.1 for the set of conﬁgurations with = 5 and y = 0. 0 1. 0. 1 Figure 8. 1 [3.1 is shown at the bottom of Figure 8. 1] 1. 1. 1 2. 1. 1 3. 1] [2. 1. 1 4. 1.2. 0. x. 0 1. 0] [4. 1. Example 8. they are omitted for the sake of readability. 0. This is the case if . 0] [5. 1. 0. 1 [2. Since no state of the intersection has a dark pink color. 0. the intersection is empty. The set of potential executions satisfying the property is given by the regular expression [ . 1. 0. 1. 1] 5. Since labels of the transitions of the pairing are always equal to the target state. 0. 1 1. 1] [1. 0 1. 1] 2. 1.8. but x arbitrary. 0] 4. 0 141 1. A light pink state of the pairing labeled by [ . 1.3 We wish now to automatically determine whether the assignment y ← 1 − x in line 4 of the program of Example 8. 0] 2. x. 0] which is recognized by the property NFA at the top of Figure 8. 0 4. x. 1. 1. 1. 1 [4. 1. 0 [1. 1 5. 1 [5. Its intersection with the system NFA of Figure 8.1: Conﬁguration graph and system NFA of the program of Example 8. 1 [1. 1. x.2. 1] [ . 0. y] is the result of pairing the light pink state of the property NFA and the state [ . 0] 5. and so the program has no execution satisfying the property.1 is redundant and can be safely removed. 0] [1. 0 5. 1. x. 0. 0. 0. 0. 0.1. 0. 1] 4. 0]∗ [5. 0 i [1. 1]∗ [ . 0 2. 1. 0 [1. y] of the system NFA. 1 1.

1. qn of states. while the non-participating . n}. . 1 i 1. . . 1 3. 0. 1] [ . 1 4. 0. . then it can occur. . we say that the i-th NFA participates in a if a ∈ Σi . n} such that Ai participates in a there is a transition (qi . . . An of NFAs with pairwise disjoint sets of states. which makes them a suitable target for automatic veriﬁcation techniques. Σn are not necessarily pairwise disjoint).142 [ . 1. a. In this section we exploit this structure to deﬁne conﬁguration graphs in a more systematic way. . where qi ∈ Qi for every i ∈ {1. 1] CHAPTER 8. yielding a network of automata and then deﬁne the graph by means of an automata theoretic construction. . . . 1] [1. 0. . 0. 0] 1. . A property NFA for this expression can be easily constructed.2 Networks of Automata. 1 2. x. x. . . q0n . Concurrent systems are particularly diﬃcult to design correctly. qn if for every i ∈ {1. A network of automata is a tuple A = A1 . APPLICATIONS II: VERIFICATION [ . . . . x. 1 2. x. 1 5.2: Property NFA and product NFA the assignment never changes the value of y. y]∗ . 1 3. 1 5. The initial conﬁguration is the conﬁguration q01 . 1. If a is enabled. and the assignment is indeed redundant. x. . y]∗ ( [4. Alphabet letters are called actions. . A conﬁguration of a network is a tuple q1 . and its occurrence makes all participating NFAs Ai move to the state qi . x. and its intersection with the system NFA is again empty. 0] [1. Given an action a. qi ) ∈ δi . . . 0. So the property holds. . . . An action a is enabled at a conﬁguration q1 . 1. Each NFA has its own alphabet Σi (the alphabets Σ1 . x. These systems usually consist of a number of communicating sequential components. The potential executions of the program in which the assignment changes the value of y at some point correspond to the regular expression [ . where q0i is the initial state of Ai . 1. . x. 1] + [4. We assign an NFA to each sequential component. 1 Figure 8. 0] [5. . 1 1. 0. x. x. 8. 1 1. 0] ) [ . . 1 4. 1.

adding nw successors to the worklist. and the sets of automata participating in them are disjoint. 0. Reading or writing of a variable by a process is modeled as a joint action with the process-component and the variable-component as participants. and i ∈ {0.5 The bottom part of Figure 8. b0 ∈ {0. cn of conﬁgurations is an execution of the network A1 ⊗ · · · ⊗ an a1 a2 An if and only if there is a word a1 . We call the NFAs A0 . shown at the top of the ﬁgure. Each NFA but the last one has three states. an ∈ Σ∗ such that c0 −− c1 −− · · · −− cn is an accepting −→ −→ −→ run of A1 ⊗ · · · ⊗ An . 0. b1 . . 143 NFAs do not change their state. . . 7. A1 . . all states are ﬁnal. . b0 ]. .2.1. two of which are marked with 0 and 1. The conﬁguration reached by the occurrence of a is a successor of q1 . . indicates that the current value of the counter is 4b2 + 2b1 + b0 (conﬁgurations are represented as triples of states of A2 . . b1 . 7} models a query of the environment asking if i is the current value of the counter. Starting at the initial conﬁguration. 1. Example 8. A2 . . NETWORKS OF AUTOMATA. (Actually. aux. qn . 0] the actions inc and inc2 are concurrent: they are both enabled. 7} Intuitively. A3 to better reﬂect their meaning: Ai stands for the i-th bit. .6 A sequence c0 c1 . More precisely. characterizes the executions of the network in terms of the asynchronous product: Proposition 8. aux. A process-component has a state for each control point of the process. 7} Σ1 = {inc1 .8. in that order). 0] and [0. inc1 . .) Observe that at the conﬁgurations [1. The next example shows how to model a (possibly concurrent) program with ﬁnite-domain variables (like booleans or ﬁnite ranges) as a network of automata. Example 8. . A conﬁguration of the form [b2 . 0. The key idea is to have a network component for each process and for each variable. A2 instead of A1 . . . inc2 . . . .3 shows the asynchronous product of the network modeling the 3-bit counter. 1. .1. This proposition. . but have been drawn as simple instead of double ellipses for simplicity. . . . the system interacts with its environment by means of the actions inc. 0. Notice that A1 ⊗· · ·⊗An can be easily transformed into the system NFA accepting the executions of the network by means of a transformation similar to the one described in Figure 8.3 shows a network of three NFAs modeling a 3-bit counter. The alphabets are Σ0 = {inc. .4 The upper part of Figure 8. Given a network of automata we deﬁne its asynchronous product as the output of algorithm Async in Table 8. 7} Σ2 = {inc2 . . whose proof is an easy consequence of the deﬁnitions. This means that they can occur independently of each other. . . where b2 . A0 . and a variable component has one state for each possible value of the variable. . A1 . 1}. inc models a request of the environment to increase the counter by 1. . . the algorithm repeatedly picks a conﬁguration from the worklist and constructs its successors.

1 inc inc 0. 0. APPLICATIONS II: VERIFICATION A2 4. 7 A0 1. 1 inc2 1. aux inc1 4 1. 6. 7 A1 2. 0. 0. 2. aux. aux inc2 6 1. 3. 1. 1. aux. 1 inc2 inc2 0. 1. 0 inc 1 0. aux inc2 2 0. . aux inc1 inc1 0. aux Figure 8. 0. 6. 1. 5. 3 0. 4. 1. 0. 4. 0 inc inc2 1. 1. 1 inc inc 1. aux. 1 inc 1. 5. aux. 2. 6 0 0. 0. 1 inc 0. aux inc1 1.144 CHAPTER 8. 7 1 1 inc1 inc inc2 1 inc inc2 0 inc2 inc1 0 0 inc1 0. 0 inc 0. 3. 0 inc 5 1. 0 inc 3 0. aux. aux. 5 0.3: A network modeling a 3-bit counter and its asynchronous product. 1. 0 inc 7 1. 1.

in the non-critical sections the processes can also do some unspeciﬁed work. . qn ] Q then add [q1 . Example 8. .4. It can move from nc0 to t0 at any time by setting b0 to 1. × Qn do if [q1 . Σ. δ. or be already in its critical section (c0 ). An = (Qn . While nc1 . t1 .8. if process 1 sees that process 0 is either trying to enter or in the critical section. . o the local states q1 and q1 model a “polite” behavior: Intuitively. . . . . and c1 play the same rˆ le as in process 0. . qn ] to F for all a ∈ Σ1 ∪ . qn ] to Q add [q1 . . . The NFAs for the processes are shown in Figure 8. Qn ) Output: the asynchronous product A1 ⊗ · · · ⊗ An = (Q. NETWORKS OF AUTOMATA. δ. Q1 ). ∪ Σn do for all i ∈ [1. ﬁnally. . . qn ] to W add ([q1 . . it can move from c0 to nc0 at any time by setting b0 to 0. F ← ∅ q0 ← [q01 . δn . . [q1 . . . Process 1 is a bit more complicated. . . a. Lamport: The mutual exclusion problem: part II-statements and solutions. . . . q0n . qn ]) to δ return (Q.2. where A1 = (Q1 . Other properties the algorithm should satisfy are discussed later. δ1 . process 0 is in its non-critical section (local state nc0 ).7 The network of Figure 8.1: Asynchronous product of a tuple of automata. . AsyncProduct(A1 . it can move from t0 to c0 if the current value of b1 is 0. q01 . .. . . . . . a) else Qi = {qi } for all [q1 . . In the algorithm. . . An ) Input: a network of automata A = A1 . process 0 and process 1 communicate through two shared boolean variables. F) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 145 Q. qn ] ∈ Q1 × . .4 models a version of Lamport and Burns’ 1-bit mutual exclusion algorithm for two processes2 . qn ] from W add [q1 .n] do if a ∈ Σi then Qi ← δi (qi . Process i reads and writes variable bi and reads variable b(1−i) . . q0 . F) Table 8. Σn . . b0 and b1 . . Initially. it moves to an “after you” local state q1 . In principle. which initially have the value 0. It can then return to the non-critical section if the value of b0 is 0. q0 . δ. q0n ] W ← {[q01 . . . . . JACM 1986 . . q0n ]} while W ∅ do pick [q1 . An . . . . . . . it can also be trying to enter its critical section (t0 ). . . . . . . which 2 L. Σ1 . qn ]. The algorithm should guarantee that the processes 0 and 1 never are simultaneously in their critical sections. and then sets b1 to 0 to signal that it is no longer trying to enter its critical section (local state q1 ). . Σ. . .

1 Checking Properties We use Lamport’s algorithm to present some more examples of properties and how to check them automatically.4: A network of four automata modeling a mutex algorithm. s1 . s0 .5. 8.146 b0 ← 0. c0 . The mutual exclusion property can be easily formalized: it holds if the asynchronous product does not contain any conﬁguration of the form [v0 . 1}. Transitions of the asynchronous product correspond to the execution of a read or a write statement by one of the processes. where v0 . c1 ]. where v0 . v1 . Since for the purposes of this chapter these transition are not important. v1 ∈ {0. b0 = 1 b1 ← 0. and in which b0 = v0 and b1 = v1 . The property . The asynchronous product is shown in Figure 8. we have omitted them. b1 = 0 bi ← 1 0 bi ← 0 b1 = 1 nc0 b0 ← 1 t0 b0 ← 0 b0 = 1 b1 ← 0 b1 = 0 c0 1 b1 ← 1. b0 = 0 b0 ← 1 0 b0 ← 0 1 CHAPTER 8.2. b1 = 1 q1 q1 b0 = 1 b0 = 0 b0 = 0 b1 ← 1 nc1 t1 b1 ← 0 c1 Figure 8. APPLICATIONS II: VERIFICATION b0 ← 1. v1 . should be represented by transition from nc0 andnc1 to themselves. s1 represents the conﬁguration in which the NFA modelling processes 0 and 1 are in states s0 . A conﬁguration of this network is a four tuple.

nc0. 0. T i . NETWORKS OF AUTOMATA. c0. t0. q1 b0 ← 1 b0 ← 0 1. This property can be checked using the NFA E recognizing the executions of the network. t1 b0 ← 0 b1 = 1 1. The algorithm is deadlock-free if every conﬁguration of the asynchronous product has at least one successor. 1. q1 147 b1 = 0 Figure 8. 0. q1 b1 = 1 1. nc1 b1 ← 1 b0 ← 1 0. q1 b1 ← 0 1. 1. 1. 1. nc1 b0 ← 0 b1 = 0 1. q1 b0 ← 0 b1 ← 0 0. nc0. t0. t0. Let NCi . Ci be the sets of conﬁgurations in which process i is in its non-critical .5: Asynchronous product of the network of Figure 8. nc0. • Bounded overtaking. and it holds for Lamport’s algorithm. and a quick inspection of Figure 8. After process 0 signals its interest in accessing the critical section (by moving to state t0 ). 0. Other properties of interest for the algorithm are: • Deadlock freedom. the property can be checked on the ﬂy. c0.5 shows that it holds.4. b1 = 0 b0 = 0 b1 = 1 1. c1 b0 = 1 1. 1. c1 b1 = 1 b1 = 1 b1 ← 1 b1 = 1 b0 = 1 1. 1. q1 b1 = 0 b1 ← 0 0. t1 b1 ← 0 b0 ← 0 b0 = 1 0. nc0. c0. nc1 b1 ← 1 b0 ← 1 0. obtained as explained above by renaming the labels of the transitions of the asynchronous product. t0. This is always the case if we only wish to check the reachability of a conﬁguration or set of conﬁgurations. t1 b1 ← 0 1. 1. Again.8. 1. 0. nc0.2. c0. process 1 can enter the critical section at most once before process 0 enters the critical section. 0. 0. t0. Notice that in this case we do not need to construct the NFA for the executions of the program. can be easily checked on-the-ﬂy while constructing the asynchronous product.

a NFA V over Σ1 ∪ . where E is an NFA recognizing the executions of the system. An over alphabets Σ1 . (3) construct the NFA E ∩ V using intersNFA(). So the approach has exponential worst-case complexity. Since the alphabet of V is the union of the alphabets of A1 . If we model E as a network of automata. . We conclude the section with a ﬁrst easy step towards palliating this problem. the number can be as high as the product of the number of states of all the components of the network. and checks whether the automaton E ∩ V is empty. is trying to access its critical section. Decide: if L(A1 ⊗ · · · ⊗ An ⊗ V) ∅. An . or is in its critical section.3 The State-Explosion Problem The automata-theoretic approach constructs an NFA V recognizing the potential executions of the system that violate the property one is interested in. if E is the NFA recognizing the language of executions. Given: A network of automata A1 . ∪ Σn . . . Observe that the NFA E may have more states than E ∩ V: if a state of E is not reachable by any word on which V has a run. . The straightforward method to check if L(E) ∩ L(r) = ∅ has four steps: (1) construct the NFA E for the network A1 . while simultaneously checking if any of them is ﬁnal. . Now. and so it is better to directly construct E ∩ V. then A1 ⊗ · · · ⊗ An ⊗ An+1 = (A1 ⊗ · · · ⊗ An ) ∩ An+1 . . This is done by constructing the set of states of E ∩ V. Since the transitions of V are often labeled with only a small fraction of the letters in the alphabet of E. .8 The following problem is PSPACE-complete. Theorem 8. respectively. . The key problem of this approach is that the number of states of E can be as high as the product of the number of states of the the components A1 . An . . An using AsyncProduct(). .1. then it does not appear in E ∩ V. . (4) check emptiness of E ∩ V using empty(). . which can easily exceed the available memory. A2 corresponds to the particular case of the asynchronous product in which Σ1 ⊆ Σ2 (or vice versa): if Σ1 ⊆ Σ2 . Σn . we can then check emptiness of A1 ⊗ · · · ⊗ An ⊗ V by means of the algorithm of Table 8. The following result shows that this cannot be avoided unless P=PSPACE. This is called the state-explosion problem. which can be checked using the algorithms of Chapter 4. The regular expression r = Σ∗ T 0 (Σ \ C0 )∗ C1 (Σ \ C0 )∗ NC1 (Σ \ C0 )∗ C1 Σ∗ represents all the possible executions that violate the property. and the literature contains a wealth of proposals to deal with it.4.148 CHAPTER 8. . . . Let Σ stand for the set of all conﬁgurations. the diﬀerence in size between E and E ∩ V can be considerable. A more in depth discussion can be found in the next section. . . . (2) transform r into an NFA V (V for “violations”) using the algorithm of Section 2. APPLICATIONS II: VERIFICATION section. . . bypassing the construction of E. 8. if Σ1 ∪ · · · ∪ Σn ⊆ Σn+1 . This is easily achieved by observing that the intersection A1 ∩ A2 of two NFAs A1 . The number of states of E can be very high. the property holds if and only if L(E) ∩ L(r) = ∅. then A2 participates in every action. . More generally.2. . and the NFAs A1 ⊗ A2 and A1 ∩ A2 coincide. .

1. .2: Algorithm to check violation of a property. V) Input: a network A1 . . . δV . . Output: true if A1 ⊗ · · · ⊗ An ⊗ V is nonempty.1 Symbolic State-space Exploration Figure 8. q0 ← [q01 . where Ai = (Qi . qn . The transitions correspond to the possible moves of A according to its transition table. leading to a ﬁnal conﬁguration.3. qn . PSPACE-hardness is proven by reduction from the acceptance problem for linearly bounded automata.8. and its ﬂowgraph. . . . . Notice that storing a conﬁguration requires linear space. W 13 Proof: We only give a high-level sketch of the proof. Qi ). q0i . q ] ∈ Q1 × . q0n . The polynomial-space nondeterministic algorithm just guesses an execution of the product. 1 2 3 4 5 6 7 8 9 10 11 12 149 Q ← ∅. An edge of the ﬂowgraph . . . . Acceptance of A corresponds to reachability of certain conﬁgurations in the network. A linearly bounded automaton (LBA) is a deterministic Turing machine that always halts and only uses the part of the tape containing the input. . . we construct in linear time a network of automata that “simulates” A. ∪ Σn do for all i ∈ [1. × Qn × Q do if n qi ∈ Fi and q ∈ Fv then return true i=1 if [q1 . . . . . . an NFA V = (QV . . q] from W add [q1 . .. . THE STATE-EXPLOSION PROBLEM CheckViol(A1 . and one component for each tape cell used by the input. . . . . The states of a cell-component are the possible tape symbols. . q0v ] W ← {q0 } while W ∅ do pick [q1 . and k is a head position. which can be easily encoded as an emptiness problem. q0v . . ∪ Σn . a) for all [q1 . An . Fv ). k). q ] Q then add [q1 . qn . .3. . The states of the control component are pairs (q. a) else Qi = {qi } Q ← δV (q. . . we show that it belongs to NPSPACE and apply Savitch’s theorem. q] to Q for all a ∈ Σ1 ∪ .n] do if a ∈ Σi then Qi ← δi (qi . . qn . . where q is a control state of A. Σ1 ∪ . δi . The network has one component modeling the control of A (notice that the control is essentially a DFA). Given an LBA A. An . qn . . one conﬁguration at a time. . Σi . 8. To prove that the problem is in PSPACE. . q ] to return false Table 8.6 shows again the program of Example 8. false otherwise. .

Post(P. R) Input: set I of initial conﬁgurations. x0 . then we deﬁne Ra. containing all pairs of conﬁgurations ([ . y0 .150 CHAPTER 8.6: Flowgraph of the program of Example 8. y0 . and the new values are x0 . the set of conﬁgurations reachable from I can be computed by the following algorithm. which repeatedly applies the Post operation: Reach(I. [1. relation R Output: set of conﬁgurations reachable form I 1 2 3 4 5 OldP ← ∅.1 leading from node to node can be associated a step relation S . S )) return P . For instance.1 = [4. P ← I while P OldP do OldP ← P P ← Union(P. APPLICATIONS II: VERIFICATION 1 1 2 3 4 5 while x = 1 do if y = 1 then x←0 y←1−x end x=1 2 y=1 3 x←0 y 1 4 x 1 5 y←1−x Figure 8. for the edge leading from node 4 to node 1 we have S 4.b where C is the set of control points. [ . x0 . y0 ]. y0 ]) such that if at control point the current values of the variables are x0 . x0 . y0 = y0 It will be convenient to assign a relation to every pair of nodes of the control graph. x0 .2 = [1. y0 ]. even to those not connected by any edge. If no edge leads from a to b. then the program can take a step after which the new control point is .b∈C S a. x0 . Given a set I of initial conﬁgurations. [2. y0 ] | x0 = 1 = x0 . y0 = 1 − x0 and for the edge leading from 1 to 2 S 1. y0 ] | x0 = x0 . The complete program is then described by the global step relation S = a.b = ∅. y0 ]. x0 .

This is the case of automata and transducers: Union.9 Consider the set of tuples X = {[x1 . typical examples are lists and hash tables.1 as a triple [ . the size of the automaton encoding a set is not. they store a representation of the set itself. Despite this. x0 . When the set is small. Consider two possible encodings of a . x2 = xk+1 . since in all cases the information content is the same. .8 shows minimal DFAs for the set I and for the sets obtained after each iteration of the while loop. . . Post. or as any other permutation. 1}}. x2k ] | x1 . ny ]. Their distinctive feature is that the memory needed to store a set is proportional to the number of its elements. or. . . 110 Figure 8. For instance. and the equality check in the condition of the while loop operation are implemented by the algorithms of Chapters 4 and 6. . y0 ] is encoded by the word x0 y0 of length 3. ny n x . the set P and the step relation S are represented by an automaton and a transducer. Consider for instance the transition labeled by 4 .7. when encoding a set of conﬁgurations all the conﬁgurations must be encoded using the same variable order. and we have encoded it as the word n x ny . and the pairs of conﬁgurations of S . . . the overhead of symbolic data structures usually oﬀsets the advantage of a compact representation. and the subset Y ⊆ X of tuples satisfying x1 = xk . Example 8. on the contrary. if they are ﬁxed-length. The veriﬁcation community distinguishes between explicit and symbolic data structures. xk = x2k . Using it the transducer can recognize four pairs. if P is the set of all possible conﬁgurations then its encoding is usually Σ∗ . Explicit data structures store separately each of the conﬁgurations of P. by the algorithms of Chapter 7. The ﬁxed-length transducer for the step relation S is shown in Figure 8. we apply it to the ﬁve-line program of Figure 8. We could have also encoded it as the word n x ny . which is encoded by a very small automaton. Variable orders. do not store a set by storing each of its elements. . or even inﬁnite.3. A prominent example of a symbolic data structure are ﬁnite automata and transducers: given an encoding of conﬁgurations as words over some alphabet Σ.6. x2k ∈ {0. respectively. . . Of course. An extreme case is given by the following example. . n x . a conﬁguration [ . Symbolic data structures are only useful if all the operations required by the algorithm can be implemented without having to switch to an explicit data structure. THE STATE-EXPLOSION PROBLEM 151 The algorithm can be implemented using diﬀerent data structures. and in order to illustrate the method. Symbolic data structures. Their sizes can be much smaller than the sizes of P or S . namely 400 101 401 101 410 110 411 . Symbolic data structures are interesting when the set of reachable conﬁgurations can be very large. recognizing the encodings of its elements.8. 1 which describe the action of the instruction y ← 1 − x . x2 . While the information content is independent of the variable order. We have deﬁned a conﬁguration of the program of Example 8.

0 1 Figure 8. 111111} Figure 8. APPLICATIONS II: VERIFICATION 1 1 1 2 1 5 0 1 . 0 0 1 1 0 0 0 1 . x2 . 0 1 0 1 . and by the word x1 xk+1 x2 xk+2 . In the ﬁrst case. On the other hand. 000011. 101101. it is easy to see that the minimal DFA for L2 has only 3k + 1 states. . sorted according to the state of the ﬁrst process and then to the state of . 0 0 3 4 0 0 4 1 2 3 2 4 0 1 . x2k . It is easy to see that the minimal w DFA for L1 has at least 2k states: since for every word w ∈ {0. 001111.7: Transducer for the program of Figure 8. 110000. x2k ]: by the word x1 x2 . 111111} and in the second the language L2 = {000000. . the language L1 has a diﬀerent residual for each word of length k. 0 1 0 0 1 1 0 1 .152 CHAPTER 8. The set of reachable conﬁgurations. So a good variable order can lead to a exponentially more compact representation. 011011. . . . 110110. 001100. 1}k the residual L1 is equal to {w}. .6 tuple [x1 . We can also appreciate the eﬀect of the variable order in Lamport’s algorithm. 010010. . 1 1 0 1 . the encoding of Y for k = 3 is the language L1 = {000000. . and so the minimal DFA has at least 2k states (the exact number is 2k+1 + 2k − 2). xk x2k .9 shows the minimal DFAs for the languages L1 and L2 . 100100. 110011. 001001. 111100.

independently of the rest of the execution. 1 nc0 . 0 nc0 . 1 t0 . but is recognized by a ﬁve-state DFA. 0. nc1 . 0 t0 . Exercises . 0 c0 . 1 nc0 . yielding the minimal DFAs of Figure 8. 0. or. t1 . every interesting system must also satisfy properties of the form “something good eventually happens”. 0.10 can be merged. This requires to develop a theory of automata on inﬁnite words. then it is already violated after a ﬁnite amount of time. the automata-theoretic approach to automatic veriﬁcation described in this chapter has a second limitation: it assumes that the violations of the property can be described by a set of ﬁnite executions. t1 . 0. v1 by the word v0 s0 s1 v1 . c1 . q1 . which is the subject of the second part of this book. q1 . 8. 1 c0 . 1. 1. 0 c0 . 1 c0 . it eventually enters the critical section” (without specifying how long it may take).8. the set of reachable conﬁgurations is recognized by the minimal DFA on the left of Figure 8. 1. v0 . the automata-theoretic approach can be extended to liveness properties. but the access is never granted. 0 t0 . In other words. Typical examples are “the system never deadlocks”. 1. two states of the DFAs of Figure 8. 1. 1 to the set of reachable conﬁgurations (ﬁlling the “hole” in the list above). q1 .4. q1 . nc1 . Intuitively.4 Safety and Liveness Properties Apart from the state-explosion problem. 0. contains 120 elements. Fortunately.11. 1. q1 . Not all properties are of this form. more generally. However. and loosely speaking. 1. s1 . and can only be witnessed by inﬁnite executions. Clearly. if we encode by the word v1 s1 s0 v0 we get the minimal DFA on the right. but it might enter it in the future. it assumes that if the property is violated. is nc0 . 1. because otherwise the system that does nothing would already satisfy all properties. nc1 . 1 t0 . Observe also that the set of all conﬁgurations. If we add the “missing” conﬁguration c0 . A violation of the property can only be witnessed by an inﬁnite execution. c1 . 0 153 If we encode a tuple s0 . “the system never enters a set of bad states”. 1 t0 . Properties of this kind are called liveness properties. 1 nc0 . t1 . 1. Properties which are violated by ﬁnite executions are called safety properties. The application of this theory to the veriﬁcation of liveness properties is presented in Chapter 14. in which we observe that the process requests access. q1 . SAFETY AND LIVENESS PROPERTIES the second process. A typical example is the property “if a process requests access to the critical section. 1. they correspond to properties of the form “nothing bad ever happens”. After ﬁnite time we can only tell that the process has not entered the critical section yet. c1 . The same example can be used to visualize how by adding conﬁgurations to a set the size of its minimal DFA can decrease.10. reachable or not.

prove that all accepting runs of the automaton below satisfy P1 . Is there always an injective function g : N → N satisfying wg(i) = answer and f (i) < g(i) for all i ∈ {1. APPLICATIONS II: VERIFICATION Let Σ = {request. 4. Using the intersection construction. proc enter(i) while turn = 1 − i do skip proc leave(i) turn ← 1 − i . . but both requests must necessarily be mapped to the same answer. idle}. . if words were inﬁnite and there were inﬁnitely many requests. Σ start answer Exercise 74 This exercise focuses on modelling and veriﬁcation of mutual exclusion (mutex) algorithms. Build an automaton recognizing all words with the property P1 : for every occurrence of request there is a later occurrence of answer. Build an automaton recognizing all words with the property P2 : there is an occurrence of answer. the sequence request request answer satisﬁes P1 . and before that only working and request occur. would P1 guarantee that every request has its own answer? More precisely. P1 does not imply that every occurrence of request has “its own” answer: for instance. while true do loop-arbitrarily-many-times non-critical-command enter(id) critical-command leave(id) The procedures enter() and leave() are speciﬁed below. initially set to 0.154 Exercise 73 CHAPTER 8. They use a global variable turn. answer. k}? 3. where id is an identiﬁer. Find all accepting runs violating P2 . . let w = w1 w2 · · · satisfying P1 and containing inﬁnitely many occurrences of request. a local variable having value 0 for one of the processes and value 1 for the other. Consider two processes running the following mutex algorithm. working. 2. . But. 1. and deﬁne f : N → N such that w f (i) is the ith request in w.

2i] enter[i. prove that there are no such runs of this system. and T 3 . j. j] enter[i . . C7 (the local controllers) to make sure that two trains can never be on the same or adjacent tracks (i. initially set to 0. j.e. then | j − j | proc leave(i) ﬂag[i] ← false . Can a deadlock occur? Finally. and every train eventually visits every segment. Do all inﬁnite runs satisfy that if a process wants to enter the critical section then it eventually enters it? Consider now a diﬀerent deﬁnition of enter and leave. . T 2 . | 1 ≤ i ≤ 3}. Each controler C j can only have knowledge of the state of the tracks j 1. 1. j ⊕ 1].8. . there must always be at least one empty track between two trains). i. . . alphabet {enter[i. 2. . j 1]. and j ⊕ 1. Each automaton T i has states {qi. Peterson’s algorithm combines both approache. The procedures enter and leave are deﬁned as follows: proc enter(i) turn ← 1 − i ﬂag[i] ← true while ﬂag[1 − i] and turn = 1 − i do skip Can a deadlock occur? What kind of starvation can occur? Exercise 75 Consider a circular railway divided into 8 tracks: 0 → 1 → . 7: C j has alphabet {enter[i. .) • For every word w ∈ L(R): if w = w1 enter[i. modeled by three automata T 1 . → 7 → 0. . . Now the processes use two booelan variables ﬂag[0] and ﬂag[1] initially set to false. initially set to false. enter[i. (No deadlocks. 3: L(R) |Σi = ( enter[i. .4. . j⊕1 ) | 0 ≤ j ≤ 7}. enter[i.) i. j ⊕ 1]. transition relation {(qi. where ⊕ denotes addition modulo 8. ﬂag[1]. ⊗ C7 ⊗ T 1 ⊗ T 2 ⊗ T 3 must satisfy the following speciﬁcation: • For j = 0. More formally. Using the intersection algorithm. . . j . SAFETY AND LIVENESS PROPERTIES 155 Design an asynchronous network of automata capturing this algorithm. qi.2i . . and ﬂag[0]. (No two trains on the same or adjacent tracks. . and every train must eventually visit every track.. 2i ⊕ 1] . 2i ⊕ 7] )∗ . enter[i.7 }. it is a mutex algorithm. In the railway circulate three trains. Furthermore. proc enter(i) ﬂag[i] ← true while ﬂag[1 − i] do skip proc leave(i) ﬂag[i] ← false Design an asynchronous network of automata capturing this behaviour. and initial state qi. 7}. j] | 0 ≤ j ≤ 7} (where enter[i. There must be no deadlocks. the asynchronous product R = C0 ⊗ . enter[i. The processes use variables turn. j] models that train i enters track j). j]. Deﬁne automata C0 . and j ⊕ 1. . .) • For i = 1. qi. build an automaton recognizing all runs reaching a conﬁguration with both agents in the critical section. (C j only knows the state of tracks j 1.0 .e. . . j ] w2 and i {0.

1 1 Figure 8. 1 0 1 0 0. 1 1 1 4 2 1 5 3 0. 1 1 4 2 1 5 3 1 0. APPLICATIONS II: VERIFICATION 1 0. 1 0. 1 0 1 0 1 0 0.156 CHAPTER 8. 1 2 1 5 1 0.8: Minimal DFAs for the reachable conﬁgurations of the program of Figure 8. 1 0 0.6 .

8.9: Minimal DFAs for the languages L1 and L2 . SAFETY AND LIVENESS PROPERTIES 157 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 1 0 1 0 1 0 1 0 1 Figure 8.4.

nc0 t0. q1 c1 t0. q1 nc1. q1 t1. q1 0 t1. q1 nc1. q1 t1. c0 0 1 0 0 1 1 0 1 Figure 8. v0 . q is encoded by the word s0 s1 v0 v1 q. 1. c1.11: Minimal DFAs for the reachable conﬁgurations of Lamport’s algorithm plus c0 .158 CHAPTER 8. q1 c0 nc1. c1. c1 . q1 nc0 t1. c1. q1 t1. q1 1 t1. on the right by v1 s1 s0 v0 . v1 . q1 nc1. q1 nc1. On the left a conﬁguration s0 . c0 0 1 nc1.10: Minimal DFAs for the reachable conﬁgurations of Lamport’s algorithm. s1 . APPLICATIONS II: VERIFICATION nc0 t0 t1. c1. . 1 . c0 0 1 0 0 1 1 nc0 nc0 t0 0 1 Figure 8. c1. q1 nc1. q1 t0.

e. For instance. For instance. Predicate logic is the standard language to express membership predicates. We introduce atomic predicates Qa (x). and it is arguably less intuitive than the declarative one. we obtain an algorithm to convert declarative into operational descriptions. We say that regular expressions are an operational description language. the property that words must satisfy in order to belong to it. Starting from some natural. i. and then write a b”. The intended meaning is “the letter at 159 . b} containing an even number of a’s and an even number of b’s” is a declarative description. A language may have a simple declarative description and a complicated operational description as a regular expression. and x ranges over the positions of the word.. as the set of words that satisfy a property. 9. As a consequence. This becomes even more clear if we consider the language of the words over {a. For instance. more complex ones can be constructed through boolean combinations and quantiﬁcation. b. We use logical formulas to describe properties of words. the regular expression (aa + bb + (ab + ba)(aa + bb)∗ (ba + ab))∗ is a natural operational description of the language above. c} containing an even number of a’s. and of c’s.1 First-Order Logic on Words In declarative style. and logical operators to construct complex properties out of simpler ones. In this chapter we present a logical formalism for the declarative description of regular languages. the expression aa(a + b)∗ b can be interpreted as “write two a’s. “the words over {a. repeatedly write a or b an arbitrary number of times. and vice versa. We then show how to automatically translate a formula describing a property of words into an automaton recognizing the words satisfying the property. where a is a letter. “atomic” predicates. of b’s. a language is deﬁned by its membership predicate.Chapter 9 Automata and Logic A regular expression can be seen as a set of instructions ( a ‘recipe’) for generating the words of a language. Languages can also be described in declarative style.

Formulas with free variables cannot be interpreted on words alone: it does not make sense to ask whether Qa (x) holds for the word ab or not. Deﬁnition 9. I) |= ϕ1 or (w. I) is a model of ϕ. I) |= ϕ we say that (w. . |w|} (the mapping may also assign positions to other variables). then all letters to the right of this position are also as” is formalized by the formula ∀x∀y ((Qa (x) ∧ x < y) → Qa (y)) .160 CHAPTER 9.} be a ﬁnite alphabet. and I[i/x] is the mapping that assigns i to x and otherwise coincides with I. . where I assigns to each free variable (and perhaps to others) a position in the word. x → 2). . I[i/x]) |= ϕ where w[i] is the letter of w at position i. A formula without free variables is a sentence. |w|} satisﬁes (w. I) |= |= |= |= |= Qa (x) x<y ¬ϕ ϕ1 ∨ ϕ2 ∃x ϕ iff iff iff iff iff w[I(x)] = a I(x) < I(y) (w. ∀x Qa (x) is true for the word aa. AUTOMATA AND LOGIC position x is an a. I) (w. I). In order to express relations between positions we add to the syntax the predicate x < y. with intended meaning “position x is smaller than (i. Deﬁnition 9. . . because the letter at position 1 of ab is a. I) (w. I) of ϕ is deﬁned by: (w.} be an inﬁnite set of variables. As usual. . z. Notice that if ϕ is a sentence then a pair (w. E) we write simply w. and let Σ = {a. . . (Notice that I may not assign any value to x.e.2 An interpretation of a formula ϕ of FO(Σ) is a pair (w. I) (w..” For instance. Deﬁnition 9. . Two formulas are equivalent if they have the same models. E).) If (w. I) (w. I) |= ϕ between a formula ϕ of FO(Σ) and an interpretation (w. Given a word w and a number k. . We now formally deﬁne when an interpretation satisﬁes a formula. I) |= ϕ (w. A formula with free variables is interpreted over a pair (w. the property “if the letter at a position is an a. c. I) where w ∈ Σ∗ and I is a mapping that assigns to every free variable x a position I(x) ∈ {1. the property “all letters are as” is formalized by the formula ∀x Qa (x). Sentences of FO(Σ) are interpreted on words over Σ.3 The satisfaction relation (w. . . x → 1). Qa (x) is true for the pair (ab. where E is the empty mapping that does not assign any position to any variable. is an interpretation of ϕ. The set FO(Σ) of ﬁrst-order formulas over Σ is the set of expressions generated by the grammar: ϕ := Qa (x) | x < y | ¬ϕ | (ϕ ∨ ϕ) | ∃x ϕ . but false for (ab. y. Instead of (w.1 Let V = {x. b. I) |= ϕ2 |w| ≥ 1 and some i ∈ {1. lies to the left of) position y”. For instance. For example. . variables within the scope of an existential quantiﬁer are bounded. let w[k] denote the letter of w at position k. . . For instance. but false for word ab. and otherwise free.

Further useful abbreviations are: ﬁrst(x) last(x) y= x+1 y= x+2 y = x + (k + 1) := := := := := ¬∃y y < x “x is the ﬁrst position” ¬∃y x < y “x is the last position” x < y ∧ ¬∃ z(x < z ∧ z < y) “y is the successor position of x” ∃ z(z = x + 1 ∧ y = z + 1) ∃ z(z = x + k ∧ y = z + 1) Example 9.” ∃x Qb (x) ∧ ∀x (last(x) → Qb (x) ∧ ¬last(x) → Qa (x)) • “Every a is immediately followed by a b. because the empty word satisﬁes the second. While this causes no problems for our purposes. We use some standard abbreviations: ∀x ϕ := ¬∃ x¬ϕ ϕ1 ∧ ϕ2 := ¬ (¬ϕ1 ∨ ¬ϕ2 ) ϕ1 → ϕ2 := ¬ϕ1 ∨ ϕ2 Notice that according to the deﬁnition of the satisfaction relation the empty word satisﬁes no formulas of the form ∃x ϕ. FIRST-ORDER LOGIC ON WORDS 161 It follows easily from this deﬁnition that if two interpretations (w.” ∀x (Qa (x) → ∀y (y = x + 1 → Qb (y))) • “Between every a and every later b there is a c. and all formulas of the form ∀x ϕ. it is worth noticing that in other contexts it may lead to complications. then either both interpretations are models of ϕ. not on I. I) of a sentence is a model or not depends only on w. For instance. I1 ) and (w.1. the formulas ∃x Qa (x) and ∀y∃x Qa (x) do not hold for exactly the same words. In particular. but not the ﬁrst. or none of them is.” ∀x∀y (Qa (x) ∧ Qb (y) ∧ x < y → ∃z (x < z ∧ z < y ∧ Qc (z))) . unless it is the last letter.4 Some examples of properties expressible in the logic: • “The last letter is a b and before it there are only a’s.9. whether an interpretation (w.” ∀x (Qa (x) → ∃y (y = x + 1 ∧ Qb (y))) • “Every a is immediately followed by a b. I2 ) of ϕ diﬀer only in the positions assigned by I1 and I2 to bounded variables.

denoted by QF. and if f = k > last. and otherwise L( f ) is co-ﬁnite. A language L ⊆ Σ∗ is FO-deﬁnable if L = L(ϕ) for some formula ϕ of FO(Σ). The case f = f1 ∧ f2 is similar. we prove a theorem characterizing the FO-deﬁnable languages in the case of a 1-letter alphabet Σ = {a}. . For example. we can associate to a sentence the set of words satisfying it. . and so it is a positive (i.5 The language L(ϕ) of a sentence ϕ ∈ FO(Σ) is the set L(ϕ) = {w ∈ Σ∗ | w |= φ}. negation-free) boolean combination of formulas of the form k < last or k > last. then by induction hypothesis L( f1 ) and L( f2 ) are ﬁnite or co-ﬁnite. we deﬁne the quantiﬁer-free fragment of FO({a}).162 CHAPTER 9.e. which is always true in every interpretation. for instance. Macros like k > x or x + k > y are deﬁned similarly. then so is L( f ).. The languages of the properties in the example are FO-deﬁnable by deﬁnition. >} and k ∈ N. In this simple case we only have one predicate Qa (x). So every formula is equivalent to a formula without any occurrence of Qa (x). then we show that 1-letter languages are QF-deﬁnable iff they are ﬁnite or co-ﬁnite. If f = f1 ∨ f2 .1. then L(ϕ) is ﬁnite. where a language is co-ﬁnite if its complement is ﬁnite. First. . If f = k < last. Since QF does not have quantiﬁers. For the deﬁnition of QF we need some more macros whose intended meaning should be easy to guess: x + k < y := ∃z (z = x + k ∧ z < y) x < y + k := ∃z (z = y + k ∧ x < z) k < last := ∀x (last(x) → x > k) In these macros k is a constant. To get an idea of the expressive power of FO(Σ). We also say that ϕ expresses L(ϕ). We prove that a language over a one-letter alphabet is FO-deﬁnable if and only if it is ﬁnite or co-ﬁnite. the formula ∃y (Qa (y) ∧ y < x) is equivalent to ∃y y < x. even a simple language like {an | n is even } is not FO-deﬁnable. Deﬁnition 9. ﬁnally. that is. then L(ϕ) is co-ﬁnite. k < last standa for the inﬁnite family of macros 1 < last. f does not contain any occurrence of a variable.. So. if L( f1 ) and L( f2 ) are ﬁnite.6 The logic QF (for quantiﬁer-free) is the fragment of FO({a}) with syntax f := x ≈ k | x ≈ y + k | k ≈ last | f1 ∨ f2 | f1 ∧ f2 where ≈ ∈ {<. The plan of the proof is as follows. 3 < last . 2 < last. Deﬁnition 9. AUTOMATA AND LOGIC 9. We proceed by induction on the structure of f . Proof: (⇒): Let f be a sentence of QF. we prove that 1-letter languages are FO-deﬁnable iff they are QF-deﬁnable. Proposition 9.1 Expressive power of FO(Σ) Once we have deﬁned which words satisfy a sentence.7 A language over a 1-letter alphabet is QF-deﬁnable iff it is ﬁnite or co-ﬁnite.

If ϕ = ¬ψ. we observe that the word has even length if its last position is even: EvenLength := ∃X (Even(X) ∧ ∀x (last(x) → x ∈ X) ) More generally. . Corollary 9. The formula fi is a conjunction of formulas containing all conjuncts of Di with no occurrence of x. The formula states that the last position belongs to the set of even positions. . Then ϕ ≡ ∃x D1 ∨ ∃x D2 ∨ . where t2 = k1 or t2 = x1 + k1 we add to fi a conjunct equivalent to t2 + 1 < t1 . y + 7 < x and x < z + 3 we add y + 5 < z. . akn } is expressed by the formula (last > k1 − 1 ∧ last < k1 + 1) ∨ . This is easily proved by induction on the structure of the formula. second-order logic allows for variables ranging over relations of arbitrary arity. . 1 . say f = D1 ∨ . . ∨ ∃x Dn . let us informally see how this extension allows to describe the language Even.9. If ϕ(x. By induction on the structure of ϕ. it suﬃces to show that for every formula f of QF expressing a language L. Theorem 9. ∨ Dn . . The following formula states that X is the set of even positions: second(x) := ∃y (ﬁrst(y) ∧ x = y + 1) Even(X) := ∀x (x ∈ X ↔ (second(x) ∨ ∃y (x = y + 2 ∧ y ∈ X))) For the complete formula. ranging over sets of positions. 1 It is allowed to quantify over both kinds of variables. meaning “position x belongs to the set X. or the second successor of another position in the set. which corresponds to sets. For this reason we now introduce monadic second-order logic. Consider now the case ϕ = ∃x ψ. and we can assume that f is in disjunctive normal form. MONADIC SECOND-ORDER LOGIC ON WORDS 163 (⇐): A ﬁnite language {ak1 . . To express a co-ﬁnite language. . . If ϕ = ϕ1 ∨ ϕ2 .9 The language Even = {a2n | n ≥ 0} is not ﬁrst-order expressible. not even over a 1-letter alphabet. 9. . and with predicates x ∈ X. the result follows directly from the induction hypothesis. It is easy to see that fi ≡ ∃x Di . . For instance. ∨ (last > k1 − 1 ∧ last < k1 + 1). plus other conjuncts obtained as follows. there is another formula f expressing the language L. and every upper bound of the form x > t2 . ψ is equivalent to a formula f of QF. y) = x < y. Proof: Sketch. Z. Before giving a formal deﬁnition. For every lower bound x < t1 of Di . The monadic fragment only allows arity 1. then ϕ ≡ y < x + 0. Y. By induction hypothesis. where t1 = k1 or t1 = x1 + k1 .2. . .8 Every formula ϕ of FO({a}) is equivalent to a formula f of QF. These results show that ﬁrst-order logic cannot express all regular languages. A position belongs to this set iff it is the second position.2 Monadic Second-Order Logic on Words Monadic second-order logic extends ﬁrst-order logic with variables X. and so it suﬃces to ﬁnd a formula fi of QF equivalent to ∃x Di . the result follows from the induction hypothesis and the fact that negations can be removed using De Morgan’s rules and equivalences like ¬(x < y + k) ≡ x ≥ y + k.

164 CHAPTER 9. and I is a mapping that assigns every free ﬁrst-order variable x a position I(x) ∈ {1. I) is a model of ϕ. . I) where w ∈ Σ∗ . d}. Y. the ﬁrst letter in X is an a and the last letter is a b. . b. |w|} and every free second-order variable X a set of positions I(X) ⊆ {1.} be two inﬁnite sets of ﬁrst-order and second-order variables. y. If (w. . . . We construct a formula of MSO(Σ) expressing the regular language c∗ (ab)∗ d∗ . I[S /X]) |= ϕ where I[S /X] is the interpretation that assigns S to X and otherwise coincides with I — whether I is deﬁned for X or not. . I) |= ∃X ϕ iff iff I(x) ∈ I(X) |w| > 0 and some S ⊆ {1. . |w|} satisﬁes (w.) The satisfaction relation (w. . The set MSO(Σ) of monadic second-order formulas over Σ is the set of expressions generated by the grammar: ϕ := Qa (x) | x < y | x ∈ X | ¬ϕ | ϕ ∨ ϕ | ∃x ϕ | ∃X ϕ An interpretation of a formula ϕ is a pair (w. We use the standard abbreviations ∀x ∈ X ϕ := ∀x (x ∈ X → ϕ) ∃x ∈ X ϕ := ∃x (x ∈ X ∧ ϕ) 9.} be a ﬁnite alphabet. in X b’s and a’s alternate. b. A language L ⊆ Σ∗ is MSO-deﬁnable if L = L(ϕ) for some formula ϕ ∈ MSO(Σ). . The membership predicate of the language can be informally formulated as follows: There is a block of consecutive positions X such that: before X there are only c’s. We start with an example. . . Let Σ = {a. . I) |= ϕ between a formula ϕ of MSO(Σ) and an interpretation (w. I) |= x ∈ X (w. I) of ϕ is deﬁned as for FO(Σ). . Deﬁnition 9. . Notice that in this deﬁnition the set S may be empty. . The language L(ϕ) of a sentence ϕ ∈ MSO(Σ) is the set L(ϕ) = {w ∈ Σ∗ | w |= φ}. after X there are only d’s. . Z.1 Expressive power of MSO(Σ) We show that the languages expressible in monadic second-order logic are exactly the regular languages. . |w|}. c. z. AUTOMATA AND LOGIC We now deﬁne the formal syntax and semantics of the logic.11 Let Σ = {a. The predicate is a conjunction of predicates. We give formulas for each of them. c.} and X2 = {X. . Example 9. . I) |= ϕ we say that (w.2. for instance. Two formulas are equivalent if they have the same models. . . . ay interpretation that assigns the empty set to X is a model of the formula ∃X ∀x ¬(x ∈ X).10 Let X1 = {x. (The mapping may also assign positions to other variables. So. with the following additions: (w.

” Before only c(X) := ∀x Before(x. because the empty set of positions satisﬁes all the conjuncts. Let us now directly prove one direction of the result. X) → Qd (x) • “a’s and b’s alternate in X.” After only d(X) := ∀x After(x. X) := ∀y ∈ X y < x 165 Notice that the empty word is a model of the formula. MONADIC SECOND-ORDER LOGIC ON WORDS • “X is a block of consecutive positions.” Cons(X) := ∀x ∈ X ∀y ∈ X (x < y → (∀z (x < z ∧ z < y) → z ∈ X)) • “x lies before/after X.2. Proposition 9. X) := ∀y ∈ X x < y • “Before X there are only c’s. we get the formula ∃X( Cons(X) ∧ Before only c(X) ∧ After only d(X) ∧ Alternate(X) ∧ First a(X) ∧ Last b(X) ) After(x. .9. X) → Qc (x) • “After X there are only d’s.” Before(x. then L is expressible in MSO(Σ).12 If L ⊆ Σ∗ is regular.” Alternate(X) := ∀x ∈ X ( Qa (x) → ∀y ∈ X (y = x + 1 → Qb (y) ) ∧ Qb (x) → ∀y ∈ X (y = x + 1 → Qa (y) ) ) • ”The ﬁrst letter in X is an a and the last is a b.” First a(X) := ∀x ∈ X ∀y (y < x → ¬y ∈ X) → Qa (x) Last b(X) := ∀x ∈ X ∀y (y > x → ¬y ∈ X) → Qa (x) Putting everything together.

Xn ) ∧ ∃x last(x) ∧ x ∈ Xi qi ∈F iff w has a last letter. . (b) every position i belongs to exactly one Pq . q0 . . Then (w. In words. I) and for every 0 ≤ i ≤ n.. a0 . . . then we can extend the formula to ϕA ∨ ϕA . Xn ) is only true when Xi takes the value Pqi for every 0 ≤ i ≤ n. . . The sets Pq are the unique sets satisfying the following properties: (a) 1 ∈ Pδ(q0 . after reading the letter at position 1 the DFA is in state δ(q0 . . w |= ϕA iff w ∈ L(A). .166 CHAPTER 9. . where ϕA is only satisﬁed by the empty word (e. i ∈ Pq iff A is in state q immediately after reading the letter ai . . . Xn ) with free variables X0 . am be a word over Σ. I) would be a model of ψA := ∃X0 . Xn ) = ∃x a∈Σ (in words: if the letter at position 1 is a. . . j = 0 i j X j ) . .e. . In words. Then A accepts w iff m ∈ q∈F Pq . . Visits(X0 . ai ) = q . i. a). If ∈ L(A).g. . ϕA = ∀x x < x). . ∃Xn Visits(X0 . . . and (c) if i ∈ Pq and δ(q. ai+1 ) = q then i + 1 ∈ Pq . . Xn ) = ∀x i=0 i. . the Pq ’s build a partition of the set positions. then the position belongs to Xia ). the Pq ’s “respect” the transition function δ. For every a ∈ Σ. qn } and L(A) = L. . .. We express these properties through formulas. let qia = δ(q0 . Xn such that I(Xi ) = Pqi holds for every model (w. .e. . So we could take ϕA := ψA if q0 F ψA ∨ ∀x x < x if q0 ∈ F Let us now construct the formula Visits(X0 . . . . . The formula for (a) is: ﬁrst(x) ∧ (Qa (x) ∧ x ∈ Xia ) Init(X0 . a1 ). F) be a DFA with Q = {q0 . . We construct a formula ϕA such that for every w . AUTOMATA AND LOGIC Proof: Let A = (Q. . Assume we were able to construct a formula Visits(X0 . We start with some notations. and let ˆ Pq = i ∈ {1. . Let w = a1 . Formula for (b): n n x ∈ Xi ∧ (x ∈ Xi → x Partition(X0 .a1 ) . . Σ. i. . Xn ).. . i.e. m} | δ(q0 . . and w ∈ L. δ.

**9.2. MONADIC SECOND-ORDER LOGIC ON WORDS Formula for (c): y= x+1→ Respect(X0 , . . . , Xn ) = ∀x∀y (x ∈ Xi ∧ Qa (x) ∧ y ∈ X j ) a∈Σ i, j ∈ {0, . . . , n}
**

δ(qi , a) = q j

167

Altogether we get Visits(X0 , . . . Xn ) := Init(X0 , . . . , Xn ) ∧ Partition(X0 , . . . , Xn ) ∧ Respect(X0 , . . . , Xn )

It remains to prove that MSO-deﬁnable languages are regular. Given a sentence ϕ ∈ MSO(Σ) show that L(ϕ) is regular by induction on the structure of ϕ. However, since the subformulas of a sentence are not necessarily sentences, the language deﬁned by the subformulas of ϕ is not deﬁned. We correct this. Recall that the interpretations of a formula are pairs (w, I) where I assigns positions to the free ﬁrst-order variables and sets of positions to the free second-order variables. For example, if Σ = {a, b} and if the free ﬁrst-order and second-order variables of the formula are x, y and X, Y, respectively, then two possible interpretations are x→1 aab , y → 3 X → {2, 3} Y → {1, 2} x→2 ba , y → 1 X→∅ Y → {1}

Given an interpretation (w, I), we can encode each assignment x → k or X → {k1 , . . . , kl } as a bitstring of the same length as w: the string for x → k contains exactly a 1 at position k, and 0’s everywhere else; the string for X → {k1 , . . . , kl } contains 1’s at positions k1 , . . . , kl , and 0’s everywhere else. After ﬁxing an order on the variables, an interpretation (w, I) can then be encoded as a tuple (w, w1 , . . . , wn ), where n is the number of variables, w ∈ Σ∗ , and w1 , . . . , wn ∈ {0, 1}∗ . Since all of w, w1 , . . . , wn have the same length, we can as in the case of transducers look at (w, w1 , . . . , wn ) as a word over the alphabet Σ × {0, 1}n . For the two interpretations above we get the encodings x y X Y corresponding to the words a 1 0 0 1 a 0 0 1 1 b 0 1 1 0 x y X Y b 0 1 0 1 a 1 0 0 0

and

168

CHAPTER 9. AUTOMATA AND LOGIC

a a b 1 0 0 0 0 1 0 1 1 1 1 0

and

b a 0 1 1 0 0 0 1 0

over Σ × {0, 1}4

Deﬁnition 9.13 Let ϕ be a formula with n free variables, and let (w, I) be an interpretation of ϕ. We denote by enc(w, I) the word over the alphabet Σ × {0, 1}n described above. The language of ϕ is L(ϕ) = {enc(w, I) | (w, I) |= ϕ}. Now that we have associated to every formula ϕ a language (whose alphabet depends on the free variables), we prove by induction on the structure of ϕ that L(ϕ) is regular. We do so by exhibiting automata (actually, transducers) accepting L(ϕ). For simplicity we assume Σ = {a, b}, and denote by free(ϕ) the set of free variables of ϕ. • ϕ = Qa (x). Then free(ϕ) = x, and the interpretations of ϕ are encoded as words over Σ×{0, 1}. The language L(ϕ) is given by a L(ϕ) = 1 b1 . . . ak . . . bk k ≥ 0, ai ∈ Σ and bi ∈ {0, 1} for every i ∈ {1, . . . , k}, and bi = 1 for exactly one index i ∈ {1, . . . , k} such that ai = a

and is recognized by

a b , 0 0 a 1 a b , 0 0

• ϕ = x < y. Then free(ϕ) = {x, y}, and the interpretations of φ are encoded as words over Σ × {0, 1}2 . The language L(ϕ) is given by k ≥ 0, a 1 . . . ak ai ∈ Σ and bi , ci ∈ {0, 1} for every i ∈ {1, . . . , k}, bk bi = 1 for exactly one index i ∈ {1, . . . , k}, L(ϕ) = b1 . . . c 1 ck c j = 1 for exactly one index j ∈ {1, . . . , k}, and i< j and is recognized by

**9.2. MONADIC SECOND-ORDER LOGIC ON WORDS
**

a b 0 0 0 0 a b 1 1 0 0 a b 0 , 0 0 0 a b 0 0 1 1 a b 0 0 0 0

169

• ϕ = x ∈ X. Then free(ϕ) = {x, X}, and interpretations are encoded as words over Σ × {0, 1}2 . The language L(ϕ) is given by k ≥ 0, a 1 . . . ak a ∈ Σ and b , c ∈ {0, 1} for every i ∈ {1, . . . , k}, i i i bk L(ϕ) = b1 . . . b = 1 for exactly one index i ∈ {1, . . . , k}, and i c 1 ck for every i ∈ {1, . . . , k}, if bi = 1 then ci = 1 and is recognized by

a b a b 0 , 0 , 0 , 0 0 0 1 1 a b 1 , 1 1 1 a b a b 0 , 0 , 0 , 0 0 0 1 1

• ϕ = ¬ψ. Then free(ϕ) = free(ψ), and by induction hypothesis there exists an automaton Aψ s.t. L(Aψ ) = L(ψ). Observe that L(ϕ) is not in general equal to L(ψ). To see why, consider for example the case ψ = Qa (x) and ϕ = ¬Qa (x). The word a a a 1 1 1 belongs neither to L(ψ) nor L(ϕ), because it is not the encoding of any interpretation: the bitstring for x contains more than one 1. What holds is L(ϕ) = L(ψ) ∩ Enc(ψ), where Enc(ψ) is the language of the encodings of all the interpretations of ψ (whether they are models of ψ or not). We construct an automaton Aenc recognizing Enc(ψ), and so we can ψ take Aϕ = Aψ ∩ Aenc . ψ Assume ψ has k ﬁrst-order variables. Then a word belongs to Enc(ψ) iff each of its projections onto the 2nd, 3rd, . . . , (k + 1)-th component is a bitstring containing exactly one 1. As states of Aenc we take all the strings {0, 1}k . The intended meaning of a state, say state 101 ψ for the case k = 3, is “the automaton has already read the 1’s in the bitstrings of the ﬁrst and third variables, but not yet read the 1 in the second.” The initial and ﬁnal states are 0k and 1k , respectively. The transitions are deﬁned according to the intended meaning of the states. For instance, the automaton Aenc is x<y

170

a b 0 , 0 0 0

**CHAPTER 9. AUTOMATA AND LOGIC
**

a b 1 , 1 0 0 a b 1 , 1 1 1 a b 0 , 0 0 0

a b 0 , 0 1 1

a b 0 , 0 1 1

a b 0 , 0 0 0

a b 1 , 1 0 0

a b 0 , 0 0 0

Observe that the number of states of Aenc grows exponentially in the number of free variables. ψ This makes the negation operation expensive, even when the automaton Aφ is deterministic. • ϕ = ϕ1 ∨ ϕ2 . Then free(ϕ) = f ree(ϕ1 ) ∪ free(ϕ2 ), and by induction hypothesis there are automata Aϕi , Aϕ2 such that L(Aϕ1 ) = L(ϕ1 ) and L(Aϕ2 ) = L(ϕ2 ). If free(ϕ1 ) = free(ϕ2 ), then we can take Aϕ = Aϕ1 ∪ Aϕ2 . But this need not be the case. If free(ϕ1 ) free(ϕ2 ), then L(ϕ1 ) and L(ϕ2 ) are languages over diﬀerent alphabets Σ1 , Σ2 , or over the same alphabet, but with diﬀerent intended meaning, and we cannot just compute their intersection. For example, if ϕ1 = Qa (x) and ϕ2 = Qb (y), then both L(ϕ1 ) and L(ϕ2 ) are languages over Σ × {0, 1}, but the second component indicates in the ﬁrst case the value of x, in the second the value of y. This problem is solved by extending L(ϕ1 ) and L(Aϕ2 ) to languages L1 and L2 over Σ×{0, 1}2 . In our example, the language L1 contains the encodings of all interpretations (w, {x → n1 , y → n2 }) such that the projection (w, {x → n1 }) belongs to L(Qa (x)), while L2 contains the interpretations such that (w, {y → n2 }) belongs to L(Qb (y)). Now, given the automaton AQa (x) recognizing L(Qa (x))

a b , 0 0 a 1

a b , 0 0

we transform it into an automaton A1 recognizing L1

**9.2. MONADIC SECOND-ORDER LOGIC ON WORDS
**

a a b b 0 , 0 , 0 , 0 0 1 0 1 a a 1 , 1 0 1 a a b b 0 , 0 , 0 , 0 0 1 0 1

171

After constructing A2 similarly, take Aϕ = A1 ∪ A2 . • ϕ = ∃x ψ. Then free(ϕ) = f ree(ψ) \ {x}, and by induction hypothesis there is an automaton Aψ s.t. L(Aψ ) = L(ψ). Deﬁne A∃ xψ as the result of the projection operation, where we project onto all variables but x. The operation simply corresponds to removing in each letter of each transition of Aσ the component for variable x. For example, the automaton A∃x Qa (x) is obtained by removing the second components in the automaton for AQa (x) shown above, yielding

a, b a, b

a

Observe that the automaton for ∃x ψ can be nondeterministic even if the one for ψ is deterministic, since the projection operation may map diﬀerent letters into the same one. • ϕ = ∃X ϕ. We proceed as in the previous case. Size of Aϕ . The procedure for constructing Aϕ proceeds bottom-up on the syntax tree of ϕ. We ﬁrst construct automata for the atomic formulas in the leaves of the tree, and then proceed upwards: given automata for the children of a node in the tree, we construct an automaton for the node itself. Whenever a node is labeled by a negation, the automaton for it can be exponentially bigger than the automaton for its only child. This yields an upper bound for the size of Aϕ equal to a tower of exponentials, where the height of the tower is equal to the largest number of negations in any path from the root of the tree to one of its leaves. It can be shown that this very large upper bound is essentially tight: there are formulas for which the smallest automaton recognizing the same language as the formula reaches the upper bound. This means that MSO-logic allows to describe some regular languages in an extremely succinct form. Example 9.14 Consider the alphabet Σ = {a, b} and the language a∗ b ⊆ Σ∗ , recognized by the NFA

a

b

1 1 Σ × {0. we get following automaton for ∃y x < y a b . First. 0 0 a b .we need an automaton recognizing Enc(∃y x < y). 0 0 a b . we compute an automaton for last(x) = ¬∃y x < y. 0 0 a b . 1 1 a b . 0 0 a b .172 CHAPTER 9. 1} . ϕ = ∃x (last(x) ∧ Qb (x)) ∧ ∀x (¬last(x) → Qa (x)) We ﬁrst bring ϕ into the equivalent form ψ = ∃x (last(x) ∧ Qb (x)) ∧ ¬∃x (¬last(x) ∧ ¬Qa (x)) We transform ψ into an NFA. and then using the procedure described above. and all other letters are a’s. 0 0 0 a b 0 0 1 1 a b 0 0 0 0 [x < y] Applying the projection operation. 0 0 a b . 1} Σ × {0. The formula states that the last letter is b. 0 0 [∃y x < y] Recall that computing the automaton for the negation of a formula requires more than complementing the automaton. 0 0 Second. 1 1 a b . First. AUTOMATA AND LOGIC We derive this NFA by giving a formula ϕ such that L(ϕ) = a∗ b. Recall that the automaton for x < y is a b 0 0 0 0 a b 1 1 0 0 a b 0 . we determinize and complement the automaton for ∃y x < y: a b . a b . We shall see that the procedure is quite laborious.

0 0 b 1 a b . we compute the intersection of the last two automata. 0 0 a b .9. the ﬁrst conjunct of ψ. 0 0 [Qb (x)] The automaton for ∃x (last(x) ∧ Qb (x)) is the result of intersecting this automaton with the NFA for last(x) and projecting onto the ﬁrst component. We get a. MONADIC SECOND-ORDER LOGIC ON WORDS And ﬁnally. 0 0 a 1 a b . 0 0 a b . getting a b . 1 1 a b . 0 0 173 whose last state is useless and can be removed. We ﬁrst obtain an automaton for ¬Qa (x) by intersecting the complement of the automaton for Qa (x) and the automaton for Enc(Qa (x). We start with an NFA for Qb (x) a b . 0 0 [Qa (x)] . 1 1 [last(x)] Next we compute an automaton for ∃x (last(x) ∧ Qb (x)) . 0 0 a b . the second conjunct of ψ. yielding the following NFA for last(x): a b . The automaton for Qa (x) is a b .2. b b [∃x (last(x) ∧ Qb (x))] Now we compute an automaton for ¬∃x (¬last(x) ∧ ¬Qa (x)).

because both formulas have the same free variables. 0 0 a b . which is ﬁne. but after eliminating a useless state we get a b . notice that Enc(Qa (x)) = Enc(∃y x < y). 0 0 a b . 1} For the automaton recognizing Enc(Qa (x)). and so the same interpretations. 0 0 [¬Qa (x)] Notice that this is the same automaton we obtained for Qb (x). b} the formulas Qb (x) and ¬Qa (x) are equivalent. for which we have already compute an NFA. 0 0 [¬last(x)] .174 and after determinization and complementation we get a b 0 0 CHAPTER 9. namely a b . To compute an automaton for ¬last(x) we just observe that ¬last(x) is equivalent to ∃y x < y. 0 0 a b . because over the alphabet {a. 0 0 The intersection of the last two automata yields a three-state automaton for ¬Qa (x). But we have already computed an automaton for Enc(∃y x < y). AUTOMATA AND LOGIC a 1 a b 0 0 b 1 a b 1 1 Σ × {0. 0 0 a b . namely a b . 0 0 b 1 a b . 1 1 a b . 1 1 a b .

b a. ∃x ﬁrst(x) 2. b a b and b whose intersection yields a 3-state automaton. ¬∃x∃y (x < y ∧ Qa (x) ∧ Qb (y)) ∧ ∀x (Qb (x) → ∃y x < y ∧ Qa (y)) ∧ ∃x ¬∃y (x < y ∧ Qa (x)) . Exercises Exercise 76 Characterize the languages described by the following formulas and give a corresponding automaton: 1.2. the automata for the two conjuncts of ψ are a. ∀x ﬁrst(x) 3. b a.9. b a. and removing a useless state yields the following NFA for ¬∃x (¬last(x)∧ ¬Qa (x)): a [¬∃x (¬last(x) ∧ ¬Qa (x))] b Summarizing. we get an automaton for ∃x (¬last(x) ∧ ¬Qa (x)) a. b b [∃x (¬last(x) ∧ ¬Qa (x))] Determinizing. which after removal of a useless state becomes a [∃x (last(x) ∧ Qb (x)) ∧ ¬∃x (¬last(x) ∧ ¬Qa (x))] b ending the derivation. MONADIC SECOND-ORDER LOGIC ON WORDS 175 Intersecting the automata for ¬last(x) and ¬Qa (x). and subsequently projecting onto the ﬁrst component. complementing.

e. Consider for instance the following property: every ﬁnite set of natural numbers has a minimal element2 . Remark: Since the minimal DFA for Ln has 22 states (Exercise 10).1 has length O(k). . You may use any abbrevation deﬁned in the chapter. respectively. give a FO-formula of polynomial length in n abbreviating y = x+2n . (Notice that the abbreviation y = x + k of page 9. an automaton. . Exercise 78 For every n ≥ 1. . • The set of words with odd length and an odd number of occurrences of a. Y. this shows that the number of states of a minimal automaton equivalent to a given FO-formula may be double exponential in the length of the formula. • The set of words.. Construct an automaton for the formula. but we cannot prove it using MSO over ﬁnite words. • The set of words of even length and containing only a’s or only b’s.) Use it to give another FO-formula ϕn . b}∗ | |w| = 2n }. Exercise 79 MSO over a unary alphabet can be used to automatically prove some simple properties of the natural numbers. also of of polynomial length in n. where x. AUTOMATA AND LOGIC Exercise 77 Give a deﬁning MSO-formula. Y. . if it is satisﬁed by every word. i. ϕ4 for the following abbreviations: := := := := ϕ1 ϕ2 ϕ3 ϕ4 “X is a singleton. . and check that it is universal. X contains one element” “X is a subset of Y” “every position of X contains an a” “X and Y are singletons X = {x} and Y = {y} satisfying x < y” n Sing(X) X⊆Y X ⊆ Qa X<Y Exercise 81 Express addition in MSO({a}. ﬁnd a formula +(X. More precisely. z are the numbers encoded by the sets X. y. It is easy to see that this property holds iff the formula ∀Z∃x∀y (y ∈ Z → (x ≤ y ∧ x ∈ Z)) is a tautology.e. this also holds for every inﬁnite set. Z. and so it cannot be directly used. b}.176 CHAPTER 9.. Exercise 80 Give formulas ϕ1 . and a regular expression for the following languages over {a. Z) of MSO({a}) that is true iﬀ x + y = z. where between each two b’s with no other b in between there is a block of an odd number of letters a. i. in lsbf encoding. 2 Of course. for the language Ln = {ww ∈ {a.

The diﬃcult case is the one where ϕ has the form ∃x ψ and ψ is a conjunction. if the conjuncts are x ≥ k1 and y ≥ x + k2 . . • d(¬ϕ) = d(ϕ). for every two conjuncts of D containing x. the new constants are bounded by 2 · 2d = 2d+1 .2. f contains a conjunct last ≥ k. Prove that every formula ϕ of FO({a}) of nesting depth n is equivalent to a formula f of QF having the same free variables as ϕ. f contains a conjunct obtained by “quantifying x away”: for example. Deﬁne f as the following conjunction. then f has the conjunct y ≥ k1 + k2 . d(ϕ1 ∨ ϕ2 ) = max{d(ϕ1 ). All conjuncts of ψ not containing x are also conjuncts of f . Since the constants in the new conjuncts are the sum of two old constants. MONADIC SECOND-ORDER LOGIC ON WORDS Exercise 82 177 The nesting depth d(ϕ) of a formula ϕ of FO({a}) is deﬁned inductively as follows: • d(Qa (x)) = d(x < y) = 0. Hint: The proof is similar to that of Theorem 9.8. and such that every constant k appearing in f satisﬁes k ≥ 2n .9. for every conjunct of D of the form x ≥ k or x ≥ y + k. and • d(∃x ϕ) = 1 + d(ϕ). d(ϕ2 )}.

178 CHAPTER 9. AUTOMATA AND LOGIC .

the set of terms is inductively deﬁned as follows: • the symbols 0 and 1 are terms. 0) and (8. Section 10. z.1 we introduce the syntax and semantics of Presburger arithemetic.Chapter 10 Applications III: Presburger Arithmetic Presburger arithmetic is a logical language for expressing properties of numbers by means of addition and comparison. 0. . 2. but not by others. 4) or (2. 2. We show how to construct for a given formula ϕ of Presburger arithmetic an NFA Aϕ recognizing the solutions of ϕ. The syntax of formulas is deﬁned in three steps. .2 constructs a NFA recognizing all solutions over the natural numbers. 1. 4).3 a NFA recognizing all solutions over the integers. Valuations satisfying the property are called solutions or models. and ∃x ϕ1 . • if t and u are terms. then so are ¬ϕ1 . • if ϕ1 . then t + u is a term. ϕ2 are formulas. where t and u are terms. . y. • every variable is a term. A typical example of such a property is “x + 2y > 2z and 2x − 3z = 4y”. An atomic formula is an expression t ≤ u. First. like (4. like (6. ϕ1 ∨ ϕ2 . ny .} and the constants 0 and 1.1 Syntax and Semantics Formulas of Presburger arithmetic are constructed out of an inﬁnite set of variables V = {x. 10. The set of Presburger formulas is inductively deﬁned as follows: • every atomic formula is a formula. 4). The property is satisﬁed by some triples (n x . 179 . and Section 10. nz ) of natural numbers. In Section 10.

Automata encoding natural numbers. APPLICATIONS III: PRESBURGER ARITHMETIC As usual. The language of a formula φ is deﬁned as L(ϕ) = s∈Sol(ϕ) lsbf(s) . we encode natural numbers as strings over {0. we also introduce: n := 1 + 1 + . . 1} using the least-signiﬁcant-bit-ﬁrst encoding lsbf. compute and manipulate solution spaces of formulas. The solutions of the formula are the pairs {(2n. An interpretation I is extended to terms in the natural way: I(0) = 0. We use transducers to represent.. . variables within the scope of an existential quantiﬁer are bounded.180 CHAPTER 10. The satisfaction relation I |= ϕ for an interpretation I and a formula ϕ is inductively deﬁned as follows. where we assume that the ﬁrst and second components correspond to the values of y and z. and denote it by Sol(ϕ). . + 1 nx := x + x + . 4. respectively. . then either both satisfy the formula. →. We call this subset the solution space of ϕ. and I(t + u) = I(t) + I(u). or as relation of arity k over the universe IN. . 1}k . Example 10. and otherwise free. If a formula has free variables x1 .1 The solution space of the formula x−2 ≥ 0 is the set {2. xk . . then its solutions are encoded as words over {0. if two interpretations assign the the same values to the free variables . + x n times n times t≥t t=t t<t t>t := := := := t ≤t t ≤t ∧t ≥t t ≤ t ∧ ¬(t = t ) t <t An interpretation is a function I : V → IN . or none does). then its set of solutions can be represented as a subset of INk .1 of Chapter 6.e. where I(n/x) denotes the interpretation that assigns the number n to the variable x. ∧. . and the same numbers as I to all other variables: I |= t ≤ u I |= ¬ϕ1 I |= ϕ1 ∨ ϕ2 I |= ∃x ϕ iff I(t) ≤ I(u) iff I |= ϕ1 iff I |= ϕ1 or I |= ϕ2 iff there exists n ≥ 0 such that I[n/x] |= ϕ It is easy to see that whether I satisﬁes ϕ or not depends only on the values I assigns to the free variables of ϕ (i. The solutions of ϕ are the projection onto the free variables of ϕ of the interpretations that satisfy ϕ. 0). . 4n) | n ≥ 0}. Besides standard abbreviations like ∀.}. I(1) = 1. the word x1 1 0 1 0 x2 0 1 0 1 x3 0 0 0 0 encodes the solution (3. As in Section 6. if we ﬁx a total order on the set V of variables and a formula ϕ has k free variables. For instance. . . . The free variables of the formula ∃x (2x = y ∧ 2y = z) are y and z. 10. 3.

we construct a transducer Aϕ such that L(Aϕ ) = L(ϕ). and the solution space is Union(SolV1 ∪V2 (ϕ1 ). b ∈ Z (not N!). . We choose transitions and ﬁnal states of the DFA so that the following property holds: State q ∈ Z recognizes the encodings of the tuples c ∈ INn such that a · c ≤ q. In general. Deﬁne SolV1 ∪V2 (ϕi ) as the set of valuations of V1 ∪V2 whose projection onto V1 belongs to Sol(ϕi ). It only remains to construct automata recognizing the solution space of atomic formulas. The last section of Chapter 6 implements operations relations of arbitrary arity. xn ). where x is a free variable of ϕ. the disjunction of two formulas. 1}n . 10. For Presburger arithmetic this is not necessary. 1}n )∗ is accepted from q iff the word ζw − → . A word w ∈ ({0. 2 are easy to compute from transducers recognizing Sol(ϕi ). . . x = (x1 . Given a Presburger formula ϕ. . AN NFA FOR THE SOLUTIONS OVER THE NATURALS. some clean-up maybe necessary to ensure that automata do not accept words encoding nothing). Since we allow negative integers as coeﬃcients. for every atomic formula there is an equivalent expression in this form (i.2. is Projection I(Sol(ϕ)). . .. an . then some preprocessing is necessary. an ). . an expression with the same solution space).e. These operations can be used to compute the solution space of the negation of a formula. Recall that Sol(ϕ) is a relation over IN whose arity is given by the number of free variables of φ. If ϕ1 and ϕ2 have diﬀerent sets V1 and V2 of free variables. 1}n . .2 An NFA for the Solutions over the Naturals. let us determine the target state q of the transition q − q of the DFA. because in the lsbf encoding every word encodes some tuple of numbers. x ≥ y + 4 is equivalent to −x + y ≤ −4. In other words. Letting a = (a1 . The states of the DFA are integers. .1) Given a state q ∈ Z and a letter ζ ∈ {0. and can be computed as Union(Sol(ϕ1 ). • The solution space of a disjunction ϕ1 ∨ ϕ2 where ϕ1 and ϕ2 have the same free variables is clearly the union of their solution spaces. • The solution space of a formula ∃x ϕ. .e. ζ (10. . where I contains the indices of all variables with the exception of the index of x. 181 where lsbf(s) denotes the set of all encodings of the tuple s of natural numbers.. + an xn ≤ b where a1 . L(ϕ) is the encoding of the relation Sol(ϕ). Sol(ϕ2 )). SolV1 ∪V2 (ϕ2 )).10. . . when computing the complement of a relation we have to worry about ensuring that the NFAs we obtain only accept words that encode some tuple of elements (i. and the existential quantiﬁcation of two formulas: • The solution space of the negation of a formula with k free variables is the complement of its solution space with respect to the universe U k . For example. where ζ ∈ {0. Consider an expression of the form ϕ = a1 x1 + . We construct a DFA for Sol(ϕ). Transducers recognizing SolV1 ∪V2 (ϕi ) for i = 1. . and denoting the scalar product of a and x by a · x we write ϕ = a · x ≤ b.

q0 .1 we must choose q so that a · c ≤ q iff a · (2c + cζ ) ≤ q . Example 10. So in order to satisfy property 10. States 2. . As initial state we choose b.0] [1. ζ) = (q − a · ζ) . Therefore. The initial state is 2.1] . 2 − − → 1. So c ∈ INn is accepted from q iff 2c + cζ is accepted from q. ζ) = [0. q0 ← sb W ← {sb } while W ∅ do pick sk from W add sk to Q if k ≥ 0 then add sk to F for all ζ ∈ {0. . ζy )) = (2 − 2ζ x + ζy ) 2 2 [1. where c ∈ INn is the tuple encoded by w . 1}n do 1 8 j ← (k − a · ζ) 2 9 if s j Q then add s j to W 10 add (sk . s j ) to δ 1 2 3 4 5 6 7 Table 10. The DFA obtained by applying AFtoDFA to it is shown in Figure 10.2 Consider the atomic formula 2x − y ≤ 2.182 CHAPTER 10. 0) ∈ INn . Σ. in order to satisfy property 10. . ζ. where for clarity the state corresponding to an integer k ∈ Z is denoted by sk .1] and so we have 2 − − → 1.1. Transitions leaving state 2 are given by δ(2. 2 − − → 0 and 2 − − → 0. A little arithmetic yields q = 1 (q − a · ζ) 2 and so we deﬁne the transition function of the DFA by 1 δ(q. δ. The DFA −− −− −− −− [0. This leads to the algorithm AFtoDFA(ϕ) of Table 10.1. 2 For the ﬁnal states we observe that a state is ﬁnal iff it accepts the empty word iff it accepts the tuple (0. 1 .1: Converting an atomic formula into a DFA recognizing the lsbf encoding of its solutions. AFtoDFA(ϕ) Input: Atomic formula ϕ = a · x ≤ b Output: DFA Aϕ = (Q. and 0 are ﬁnal. Since we use the lsbf encoding. F ← ∅. −1) · (ζ x .1 we must make state q ﬁnal iff q ≥ 0. F) such that L(Aϕ ) = L(ϕ) Q. APPLICATIONS III: PRESBURGER ARITHMETIC is accepted from q. the tuple of natural numbers encoded by ζw is 2c + ζ. δ.0] 1 1 (2 − (2. .

Consider now the formula x + y ≥ 4. All states s j added to the worklist during the i=1 execution of AFtoDFA(ϕ) satisfy − |b| − s ≤ j ≤ |b| + s. accepts. AN NFA FOR THE SOLUTIONS OVER THE NATURALS. The resulting DFA is shown in Figure 10. indeed 24 − 50 ≤ 2. ζy )) = (−4 + ζ x + ζy ) 2 2 and so we have −4 − − → −2. . [0. for example. 0 1 −1 0 0 0 1 1 1 . ζ) = (−4 − (−1. Details are left to the reader. and apply the algorithm. −1) · (ζ x .0] [1.1] Partial correctness of AFtoDFA is easily proved by showing that for every q ∈ Z and every word w ∈ ({0. We show that this is not the case. 0 1 0 0 . −4 − − → −2 and −4 − − → −1.3 Let ϕ = a · x ≤ b and let s = k |ai |. Termination of AFtoDFA also requires a proof: in principle the algorithm could keep generating new states forever. The initial state is −4.0] [0. 1}n )∗ . −4 − − → −2. 0 1 −2 1 1 0 1 0 0 0 . the state q accepts w iff w encodes c ∈ INn satisfying a · c ≤ q. Notice that the DFA is −− −− −− −− not minimal.2.1] [1. which indeed corresponds to 24 − 18 2. since sttaes 0 and 1 can be merged. and for |w| > 0 from the fact that δ satisﬁes property 10. the word 0 0 1 1 0 0 0 1 0 0 1 1 which encodes x = 12 and y = 50 and. and is not accepted. 0 1 Figure 10. Transitions leaving −4 are given by 1 1 δ(−4. If we remove the last letter then the word encodes x = 12 and y = 18. Lemma 10. 0 1 183 1 0 1 .1 and from the induction hypothesis. 0 1 0 1 1 0 0 1 .2. For |w| = 0 the result follows immediately from the deﬁnition of the ﬁnal states. 0 1 1 0 2 1 1 . The proof proceeds by induction of |w|. We rewrite it as −x − y ≤ −4.10.1: DFA for the formula 2x − y ≤ 2.

1 0 1 −4 1 1 0 0 0 0 1 1 . then so does the next one. Example 10. the ﬁrst state added to the worklist. A little arithmetic yields − |b| − s ≤ |b| + s − a · ζ 2 which together with (10. 0 1 0 1 1 0 1 1 .2: DFA for the formula x + y ≥ 4. 1}n such that 1 j = 2 (k − a · ζ) .2) leads to − |b| − s ≤ j ≤ |b| + s and we are done.184 CHAPTER 10. APPLICATIONS III: PRESBURGER ARITHMETIC 0 0 1 . . if all the states added to the worklist so far satisfy the property. 0 1 0 −2 0 0 1 . . Let s j be this next state. . We show that. 0 1 0 1 1 0 0 1 . Proof: The property holds for sb . .2). at any point in time. Since by assumption sk satisﬁes the property we have − |b| − s ≤ k ≤ |b| + s and so − |b| − s − a · ζ |b| + s − a · ζ ≤ j≤ 2 2 (10. 1 0 1 1 −1 0 0 Figure 10.4 We compute all natural solutions of the system of linear inequations 2x − y ≤ 2 x + y ≥ 4 ≤ − |b| − 2s 2 |b| + 2s 2 ≤ − |b| − s − a · ζ 2 ≤ |b| + s . Then there exists a state sk in the worklist and ζ ∈ {0.2) Now we manipulate the right and left ends of (10. .

States from which no ﬁnal state is reachable have been omitted. but the result is also easy to guess: it is the DFA of Figure 10. 0 1 0 0 −1.2.4.1.3. The result is shown in Figure 10.) 0 0 . q2 0 0 . 0 1 0. q2 1 1 . 0 1 0 1 . The states of the DFA are a trap state qt . 10. 0. 0. 185 such that both x and y are multiples of 4. q0 0 0 1. 0 1 −2. −4. 0 1 Figure 10. AN NFA FOR THE SOLUTIONS OVER THE NATURALS.10.1. 0 1 1 0 2.3.2 (after merging states 0 and 1). 10. 0 0 0 0 0 0 1 1 0 1 0 1 q0 q1 q2 Figure 10.3 (where a trap state has been omitted). −1.1 Equations A slight modiﬁcation of AFtoNFA directly constructs a DFA for the solutions of a · x = b.2. 0.2. without having to intersect DFAs for a · x ≤ b and −a · x ≤ −b. −2. and 10. q1 0 0 0. q2 1 1 .3: DFA for the formula ∃z x = 4z ∧ ∃w y = 4w. (Some states from which no ﬁnal state can be reached are omitted. and 10. q2 0 1 0 1 1 1 . 10. The solutions are then represented by the intersection of the DFAs shown in Figures 10. This corresponds to computing a DFA for the Presburger formula ∃z x = 4z ∧ ∃w y = 4w ∧ 2x − y ≤ 2 ∧ x + y ≥ 4 The minimal DFA for the ﬁrst two conjuncts can be computed using projections and intersections.4: Intersection of the DFAs of Figures 10.

The bottom part of the Figure shows the result of applying EqtoDFA to x + y = 4. Example 10. the equation a · c = 1 (q − a · ζ) has no solution. Partial correctness and termination of EqtoDFA are easily proved following similar steps to the case of inequations. we have iﬀ iﬀ iﬀ c ∈ L(q ) 2c + cζ ∈ L(q) a · (2c + ζ) = q (property 10.4) . 1}n .186 CHAPTER 10. The resulting algorithm is shown in Table 10. then. . A 2-complement encoding of an integer x ∈ Z is any word a0 a1 . since state −1 can be deleted. . Observe that the transitions are a subset of the transitions of the DFA for x + y ≤ 4.5 Consider the formulas x + y ≤ 4 and x + y = 4. since a · c is an integer. property − → 10. . .3 for q) a · c = 1 (q − a · ζ) 2 If q − a · ζ is odd. So the only ﬁnal state is q = 0.2. 1 If q − a · ζ is even then we must choose q satisfying a · c = q . 10. The algorithm does not construct the trap state. As in the case of inequations. satisfying n−1 x= i=0 ai · 2i − an · 2n (10. ζ (10.5. the transition function of the DFA is given by: δ(q. 0). We construct an NFA recognizing the encodings of the integer solutions (positive or negative) of a formula.3 requires c ∈ L(q ) iff a · c = q . by property 10. ζ) = 1 2 (q qt if q = qt or q − a · ζ is odd − a · ζ) if q − a · ζ is even For the ﬁnal states. recall that a state is ﬁnal iff it accepts the tuple (0. 1}n we determine the target state q of transition q − q . . and so we take q = qt . .2. This example shows that the DFA is not necessarily minimal. . Notice the similarities and diﬀerences with the DFA for x + y ≥ 4 in Figure 10. ζ) = qt for every ζ ∈ {0. . In order to deal with negative numbers we use 2-complements.3 we must choose q satisfying L(q ) = ∅.3) For the trap state qt we take δ(qt . So qt is nonﬁnal and. Given a tuple c ∈ INn . The result of applying AFtoDFA to x + y ≤ 4 is shown at the top of Figure 10.3 An NFA for the Solutions over the Integers. For a state q ∈ Z and a letter ζ ∈ {0. q ∈ Z is ﬁnal iff a · (0 . and so we take q = 2 (q − a · ζ). 2 So in order to satisfy property 10. . where n ≥ 1. Therefore.3. APPLICATIONS III: PRESBURGER ARITHMETIC accepting the empty language. plus integers satisfying: State q ∈ Z recognizes the encodings of the tuples c ∈ INn such that a · c = q. 0) = q. an .

. 111. The empty word encodes no number. 110 encodes 1 + 2 − 0 = 3. . 1}n do if (k − a · ζ) is even then 1 j ← (k − a · ζ) 2 if s j Q then add s j to W add (sk . and the word 1 encodes the number −1. In general. an−1 1 and a0 . encode 3. . F ← ∅. . . 11. δ. . . EqtoDFA(ϕ) Input: Equation ϕ = a · x = b Output: DFA A = (Q. . q0 ← sb W ← {sb } while W ∅ do pick sk from W add sk to Q if k = 0 then add sk to F for all ζ ∈ {0. . . and all of 1. 11000. y z ) encoded by 0 0 1 1 0 1 0 1 1 1 0 1 0 0 1 0 0 1 1 0 0 0 1 0 1 1 1 1 1 0 1 0 0 1 1 1 and . Instead of padding with 0. We call an the sign bit. an−1 11m encode the same number because −2m+n + 2m−1+n + .10. For example. an−1 an a∗ encode the same number: for an = 0 this is n obvious. and 111 encodes 1 + 2 − 4 = −1. the word 0 encodes the number 0. .3. AN NFA FOR THE SOLUTIONS OVER THE INTEGERS. 1100. This property allows us to encode tuples of numbers using padding. If the word has length 1 then its only bit is the sign bit. −3. + 2n+1 = 2n . s j ) to δ Table 10.2: Converting an equation into a DFA recognizing the lsbf encodings of its solutions.6 The triple (12. Example 10. Observe that all of 110. we pad with the sign bit. . δ. . y. z) and (x . ζ. F) such that L(A) = L(ϕ) (without trap state) 1 2 3 4 5 6 7 8 9 10 11 187 Q. . it is easy to see that all words of the regular expression a0 . the triples (x. −14) is encoded by all the words of ∗ 0 0 1 1 0 0 1 0 1 1 1 1 0 1 0 0 1 1 For example. in particular. q0 . and for an = 1 both a0 . encode −1. Σ.

0 1 0 −1 −2 1 1 1 0 0 2 0 0 1 1 0 0 1 1 0 1 .5: DFAs for the formulas x + y ≤ 4 and x + y = 4. As usual we take integers for the states. 0 1 0 1 1 0 0 1 1 . . and so. the NFA may have one or two such transitions. q must be nonﬁnal.5. . where ζ ∈ {0. 1 0 1 0 0 1 . . and so we add to the NFA a unique ﬁnal state q f without any outgoing transitions. Given a state q ∈ Z and a letter ζ ∈ {0.188 CHAPTER 10. . 1} ζ . accepting only the empty word.) of the NFA. APPLICATIONS III: PRESBURGER ARITHMETIC 0 0 2 0 0 0 1 1 . integer states are no longer enough. 1 0 4 1 1 0 0 1 . 1 0 −1 1 Figure 10. since q cannot accept the empty word by property 10. we determine the targets q of the transitions q − q − → n . 1 0 1 0 0 0 1 . (10. 1}n . and the NFA should satisfy: State q ∈ Z recognizes the encodings of the tuples c ∈ Zn such that a · c ≤ q. . are given by x = 0+0+4+8+0 = 12 y = 1 + 0 + 4 + 8 − 16 = −3 z = 0 + 2 + 0 + 0 − 16 = −14 x y z = 0+0+4+8+0+0+0+0 = 12 = 1 + 0 + 4 + 8 + 16 + 32 − 64 = −3 = 0 + 2 + 0 + 0 + 16 + 32 − 64 = −14 We construct a NFA (no longer a DFA!) recognizing the integer solutions of an atomic formula a · x ≤ b. (As we will see.5) However. But we need at least one ﬁnal state. . . because no state q ∈ Z can be ﬁnal: in the 2complement encoding the empty word encodes no number. 1 0 1 0 0 0 1 1 . 0 1 0 1 4 0 1 1 .

then ζw encodes the tuple of integers 2c + ζ. Its transitions are a subset of those of the NFA for 2x − y ≤ 2. because in this case ζ is the sign bit. Graphically.6 shows at the top the NFA recognizing all integer solutions of 2x−y ≤ 2.1 Equations If order to construct an NFA for the integer solutions of an equation a · x = b we can proceed as for inequations. The result is again an NFA containing all states and transitions of the DFA for the natural solutions computed in Section 10. AN NFA FOR THE SOLUTIONS OVER THE INTEGERS. and removing all transitions q − q such that − → q 1 2 (q ζ ζ − a · ζ). and in this − → case the automaton has two transitions leaving state q and labeled by ζ. q f if q + a · ζ ≥ 0 δ(q.) (2) If w = . Example 10. we deﬁne the transition function of the NFA by 1 (q − a · ζ) . 1}2 for which q f ∈ δ(−1. and all transitions q − q f such that q + a · ζ − → ζ 0. plus possibly other transitions. where c ∈ Zn is the tuple encoded by w . 1). and if so then it requires q to be a ﬁnal state. 1}2 . 1}n )∗ is accepted from some target state q iff ζw is accepted from q. the only ﬁnal state is qf .2. 0) and (1. ζ 10.3. ζ) = 2 1 (q − a · ζ) otherwise 2 Observe that the NFA contains all the states and transitions of the DFA for the natural solutions of a · x ≤ b.8 The NFA for the integer solutions of 2x − y = 2 is shown in the middle of Figure 10. In order to determine the letters ζ ∈ {0. we compute q + a · ζ = −1 + 2ζ x − ζy for each (ζ x . property 10. and compare the result to 0. The automaton has an additional ﬁnal state q f .7 Figure 10. plus some more (compare with Figure 10. Consider for instance state −1. ζy ) ∈ {0.1. plus possible some more. we can also obtain − → the NFA by starting with the NFA for a · x ≤ b. then ζw encodes the tuple of integers −ζ.5 only requires a target state q if a · (−ζ) ≤ q. (This follows easily from the deﬁnition of 2-complements. Summarizing. and a transition q − q f iff q + a · ζ = 0.5 requires a target state q such that a · c ≤ q iff a · (2c + cζ ) ≤ q .1). We obtain that the letters leading to q f are (1. 189 A word w ∈ ({0. ζ). All integer states are now nonﬁnal. Example 10. The ﬁnal state q f and the transitions leading to it are drawn in red.3. . In the 2-complement encoding there are two cases: (1) If w . In case (1). So we take 1 q = (q − a · ζ) 2 For case (2) property 10.10. So if q + a · ζ ≥ 0 then we add q − q f to the set of transitions.6. It has all states and transitions of the DFA for the natural solutions.

3. . If exactly one of the states is ﬁnal. we are done. Exercises Exercise 83 Let k1 . say. Exercise 85 AFtoDFA returns a DFA recognizing all solutions of a given linear inequation a1 x1 + a2 x2 + . So. 1) and a self-loop labeled by (0. b ∈ Z (∗) encoded using the lsbf encoding of INk . If both states are ﬁnal. . . Find a corresponding automaton for the case k1 = 0 and k2 = 2. 10. Since q + a · ζ = 0 1 holds iff 2 (q + a · ζ) = 1 (2q) = q. a2 . e. ϕ(x. Moreover. their languages are distinct. then they recognize the solutions of a · x = k and a · x = k .6 (the trap state has been omitted). the NFA has a transition q − q f iff it also has a self-loop q − q. the DFA obtained by means of the powerset construction is minimal. k2 ∈ N0 be constants. + ak xk ≤ b with a1 .3. state 1 of the DFA in the middle of Figure 10. Each of the three predecessors of q f gets “duplicated”. Find a Presburger arithmetic formula.190 CHAPTER 10.) Using this property it is not diﬃcult to see that the powerset construction does not cause a blowup in the number of states: it only adds one extra state for each predecessor of the ﬁnal state. y) iﬀ I(x) ≥ I(y) and I(x) − I(y) ≡ k1 (mod k2 ). 1). − → − → 2 (For instance. k and k . This can be proved by showing that any two states recognize diﬀerent languages.. ak . with free variables x and y such that I |= ϕ(x. Additions to the previous algorithms are shown in blue. and their languages are those of k and k . APPLICATIONS III: PRESBURGER ARITHMETIC The NFA for the integer solutions of an equation has an interesting property.9 The DFA obtained by applying the powerset construction to the NFA for 2x − y = 2 is shown at the bottom of Figure 10. again.g. Exercise 84 Construct a ﬁnite automaton for the Presburger formula ∃y x = 3y using the algorithms of the chapter. . . If both states are nonﬁnal. plus the empty word. We may also use the most-signiﬁcant-bit-ﬁrst (msbf) encoding. ∗ 2 0 1 0 msbf =L 3 0 1 1 . and so their languages are not only distinct but even disjoint. . ζ ζ Example 10.6 has a red transition labeled by (0. y).2 Algorithms The algorithms for the construction of the NFAs are shown in Table 10. then they are the “duplicates” of two nonﬁnal states k and k .

q0 . IneqZtoNFA(ϕ) Input: Inequation ϕ = a · x ≤ b over Z Output: NFA A = (Q. Σ. 1. ζ. 1}n do 1 7 j ← (k − a · ζ) 2 8 if s j Q then add s j to W 9 add (sk . ζ.r. δ. δ. msbf encoding. q0 ← sb W ← {sb } while W ∅ do pick sk from W add sk to Q for all ζ ∈ {0. Adapt AFtoDFA to the msbf encoding. . q f ) to δ Table 10. F ← ∅. q0 ← sb W ← {sb } while W ∅ do pick sk from W add sk to Q for all ζ ∈ {0. δ.3. 2. ζ. F) such that L(A) = L(ϕ) Q. s j ) to δ j ←k+a·ζ if j ≥ 0 then add q f to Q and F add (sk . Construct a ﬁnite automaton for the inequation 2x − y ≤ 2 w. δ. 1}n do if (k − a · ζ) is even then if (k + a · ζ) = 0 then add k to F 1 j ← (k − a · ζ) 2 if s j Q then add s j to W add (sk .10. q0 . ζ.t. AN NFA FOR THE SOLUTIONS OVER THE INTEGERS. F ← ∅. s j ) to δ 10 j ←k+a·ζ 11 if j ≥ 0 then 12 add q f to Q and F 13 add (sk . q f ) to δ 1 2 3 4 5 6 191 EqZtoNFA(ϕ) Input: Equation ϕ = a · x = b over Z Output: NFA A = (Q. F) such that L(A) = L(ϕ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Q.3: Converting an inequality into a NFA accepting the 2-complement encoding of the solution space. Σ.

0 1 −2 1 0 qf 0 0 0 . . . 0 0 1 1 0 1 0 0 .6: NFAs for the solutions of 2x − y ≤ 2 and 2x − y = 2 over Z. 0 1 1 1 . 0 0 1 1 0 1 0 0 1 1 1 0 1 1 1 0 1 2 1 0 0 0 −1 1 0 0 0 1 1 qf 0 0 0 1 1 1 1 1 0 1 0 1 2 1 0 0 1 1 0 −1 1 1 0 0 0 1 0 0 0 0 1 1 1 Figure 10. 0 0 1 0 0 . 0 1 0 1 −1 1 1 .192 CHAPTER 10. . . and minimal DFA for the solutions of 2x − y = 2. . 0 1 1 0 2 1 1 . 0 1 0 1 . APPLICATIONS III: PRESBURGER ARITHMETIC 0 1 0 1 . 0 1 0 1 0 1 . 0 1 1 0 1 . 0 1 0 1 1 . 0 1 1 0 1 1 . .

Part II Automata on Inﬁnite Words 193 .

.

We denote by Σω the set of all ω-words over Σ. a formalism to deﬁne ω-languages. . . The concatenation of a language L1 and an ω-language L2 is L1 · L2 = {w1 w2 ∈ Σω | w1 ∈ L1 and w2 ∈ L2 }. The set of all ω-regular expressions over Σ is denoted by REω (Σ). where r ∈ RE(Σ) is a regular expression s ::= rω | rs1 | s1 + s2 Sometimes we write r · s1 instead of rs1 . an and an ω-word w2 = b1 b2 . and conversion algorithms between them. where {∅}∗ = { }.1 ω-languages and ω-regular expressions Let Σ be an alphabet.1 we introduce ω-regular expressions. The ω-iteration of a language L ⊆ Σ∗ is the ω-language Lω = {w1 w2 w3 . sometimes also denoted by w1 · w2 . . . | wi ∈ L \ { }}. an b1 b2 . We extend regular expressions to ω-regular expressions.Chapter 11 Classes of ω-Automata and Conversions In Section 11. . 195 . in contrast to the case of ﬁnite words. The concatenation of a ﬁnite word w1 = a1 . . a textual notation for deﬁning languages of inﬁnite words. 11. . . and • Lω (s1 + s2 ) = Lω (s1 ) ∪ Lω (s2 ). • Lω (rs1 ) = L(r) · Lω (s1 ).1 The ω-regular expressions s over an alphabet Σ are deﬁned by the following grammar. . An inﬁnite word. most of them with the same expressive power as ω-regular expressions. . . The language Lω (s) ⊆ Σ of an ω-regular expression s ∈ REω (Σ) is deﬁned inductively as • Lω (rω ) = (L(r))ω . also called an ω-word. Deﬁnition 11.. is the ω-word w1 w2 = a1 . A set L ⊆ Σω of ω-words is an inﬁnitary language or ω-language over Σ. is an inﬁnite sequence a0 a1 a2 . Observe that {∅}ω = ∅. The other sections introduce diﬀerent classes of automata on inﬁnite words. . of letters of Σ.

such that pi ∈ Q for 0 ≤ i ≤ n and pi+1 ∈ δ(pi . but not bω or a100 bω . i. and F are deﬁned as for NFAs.requires some care. ai ) for every −→ −→ 0 ≤ i. if w is accepted by A. δ. even the name “ﬁnal state” is no longer appropriate for B¨ chi automata. A run ρ is accepting if inf(ρ) ∩ F ∅. and so w contains inﬁnitely many as. observe that immediately after reading any a the automaton A always visits its (only) accepting state (because all transitions labeled by a lead to it). because inﬁnite runs containing only ﬁnitely many nontransitions are never accepting. . Notice that the deﬁnition of NBA. q0 . So. Let inf(ρ) be the set {q ∈ Q | q = pi for inﬁnitely many i’s}. In fact. 1 . Since the set of accepting states is ﬁnite. over Σ. or (ab100 )ω . δ. baω .. u So from now on we speak of “accepting states”. A run of A on an ω-word a0 a1 a2 . Deﬁnition 11.. To prove that this is indeed the language we show that every ω-word containing inﬁnitely many as is accepted by A. a run of A on w never terminates. the set of states that occur in ρ inﬁnitely often. q0 . Σ. for instance. For the ﬁrst part. u A B¨ chi automaton is deterministic if it is deterministic when seen as an automaton on ﬁnite u words. and so we cannot deﬁne acceptance in terms of the state reached at the end of the run. F). A accepts aω . is an inﬁnite a0 a1 sequence ρ = p0 −− p1 −− p2 .2 ¨ Buchi automata B¨ chi automata have the same syntax as NFAs.3 Figure 11.196 CHAPTER 11. but a diﬀerent deﬁnition of acceptance. . For the second part. We refer to the condition inf(ρ) ∩ F ∅ as the B¨ chi condition F.1 Example 11. Observe that the empty ω-language is ω-regular because Lω ((∅)ω ) = ∅.1 shows two B¨ chi automata. when A reads an ω-word containing inﬁnitely many as it visits its accepting state inﬁnitely often. although we still denote by F the set of accepting states.e. the automaton must read inﬁnitely many as during the run. 11. One can also deﬁne NBAs with -transitions. The language recognized by and NBA A is the set Lω (A) = {w ∈ Σω | w is accepted by A}. then there is a run of A on w that visits the accepting state inﬁnitely often. b} that contain inﬁnitely many occurrences of a. . where u Q. “some accepting state is visited inﬁnitely often” is equivalent to “the set of accepting states is inﬁnitely often”. and so it accepts. q0 . but we will not need them. Since all transitions leading to the accepting state are labeled by a. and that every word accepted by A contains inﬁnitely many as. Σ.2 A nondeterministic B¨ chi automaton (NBA) is a tuple A = (Q. (ab)ω . CLASSES OF ω-AUTOMATA AND CONVERSIONS A language L is ω-regular if there is an ω-regular expression s such that L = Lω (s). . We say that a run of a B¨ chi automaton is accepting if some accepting state is visited along u the run inﬁnitely often. Intuitively. The automaton on the left is deterministic. even if they visit some accepting state inﬁnitely often. δ. therefore. F) is given as input an inﬁnite word w = a0 a1 a2 . Suppose u that an NFA A = (Q. Σ. An NBA accepts an ω-word w ∈ Σω if it has an accepting run on w. . . u and recognizes all ω-words over the alphabet {a.

c b. The proof is similar.2 shows three further B¨ chi automata over the alphabet {a.¨ 11. The top right automaton recognizes the ω-words that contain ﬁnitely many occurrences of a.2: Three further B¨ chi automata u The top-left automaton recognizes the ω-words in which for every occurrence of a there is a later occurrence of b. cω . b. a b. b. c Figure 11. c}. u b. So. Example 11. Finally. c b b b a. but not acω or ab(ac)ω . for instance. b} that contain ﬁnitely many occurrences of a. c a b c c a a a.2. c a c. or (bc)ω .4 Figure 11. the automaton accepts (ab)ω . BUCHI AUTOMATA b a b b a a. b b 197 Figure 11. and recognizes all ω-words over the alphabet {a. or inﬁnitely many occurrences of a and inﬁnitely many occurrences of b. c b. the automaton at the bottom recognizes the ω-words in which between every occurrence of a and the next occurrence .1: Two B¨ chi automata u The automaton on the right of the ﬁgure is not deterministic.

For every accepting state q ∈ F. Every word w ∈ Lq can be split into an inﬁnite sequence w1 w2 w3 .198 CHAPTER 11. and no transition leaving the ﬁnal state. if the letter at position i is an a and the ﬁrst occurrence of c after i is at position j. It follows w1 ∈ L(rq0 ). . An NBA for rω is obtained by adding to the NFA. We proceed by induction on the structure of the ωregular expression. 11. In fact. if L(r) then Ar has exactly one -transition. δ. From NBAs to ω-regular expressions. An NBA for r · s is obtained by merging states as shown in the middle of the ﬁgure. Using algorithm NFAtoRE we can construct a regular expression q q q q sq such that L(Aq ) = L(sq ). . q0 . . We have Lω (A) = q∈F Lq . This provides a ﬁrst “sanity check” for NBAs as data structure: they can represent exactly the ω-regular languages. it is even better to construct a regular expression rq whose q language contains the word accepted by L(Aq ) by means of runs that visit q exactly once (how to use NFAtoRE for this is left as a little exercise). Example 11. Finally. for every two numbers i < j. and ω q q q wi ∈ L(rq ) for every i > 1. In fact.2. Notice that we always obtain a NBA with no transitions leading into the initial state. From ω-regular expressions to NBAs. Recall that for every regular expression r we can construct an NFA. as shown at the top of Figure ??.2. more precisely.1 From ω-regular expressions to NBAs and back We present algorithms for converting an ω-regular expression into a NBA. F) be a NBA. then there is at most one number i < k < j such that the letter at position k is a b. let Lq ⊆ Lω (A) be the set of ω-words w such that some run of A on w visits the state q inﬁnitely often. Let A = (Q. can be easily removed. and for every i > 1 wi is the word q read by the automaton between the i-th and the (i + 1)-th visits to q. and vice versa.for r transitions leading from the ﬁnal state to the targets of the transitions leaving the initial state.4.5 Consider the top right NBA of Figure 11. So we have Lq = Lω rq0 rq . and therefore rq0 rq q∈F q q ω is the ω-regular expression we are looking for. shown again in Figure 11. For every two states q q. otherwise none. Σ. an NBA for s1 + s2 is obtained by merging the initial states of the NBAs for s1 and s2 as shown at the bottom. q ∈ Q. no transition leading to the initial state. let Aq be the NFA (not the NBA!) obtained from A by changing the initial state to q and the set of ﬁnal states to {q }.Ar with a unique ﬁnal state. if present. CLASSES OF ω-AUTOMATA AND CONVERSIONS of c there is at most one occurrence of b. We use these expressions to compute an ω-regular expression for Lω (A). nonempty words. the -transition. where w1 is the word read by A until it visits q for the ﬁrst time. of ﬁnite.

an a1 .3: From ω-regular expressions to B¨ chi automata u . BUCHI AUTOMATA a1 199 a1 ... an an NFA for r NBA for rω ··· ··· NFA for r ··· ··· ··· ··· ··· ··· NBA for r · s NBA for s ··· ··· ··· NBA for s1 ··· ··· ··· ··· ··· NBA for s1 + s2 NBA for s2 Figure 11.¨ 11..2..

Intuitively.200 CHAPTER 11. the language of all ω-words in which a occurs only ﬁnitely often) is not recognized by any DBA. c 2 Figure 11.e. δ. We have w1 ∈ L. ˆ Proof: Assume that L = Lω (A). u0 ) ∈ F for some ﬁnite preﬁx u0 of w0 . Q. b} 0 ˆ w0 ∈ L. Consider the ω-word w = bω . Since ˆ ˆ ˆ {a.4: An NBA 1 1 We have to compute r0 r1 ω 2 2 + r0 r2 . and so δ(q0 . 11. and this guess cannot be determinized using only a ﬁnite number of states. (i. F). Using NFAtoRE and simplifying we get 1 r0 = (a + b + c)∗ (b + c) 2 r0 = (a + b + c)∗ b 1 r1 = (b + c)∗ 2 r2 = b + (a + c)(a + b + c)∗ b ω and (after some simpliﬁcations) we obtain the ω-regular expression (a + b + c)∗ (b + c)ω + (a + b + c)∗ b(b + (a + c)(a + b + c)∗ b)ω . a). b. wa) = δ(δ(q. CLASSES OF ω-AUTOMATA AND CONVERSIONS b.2 Non-equivalence of NBA and DBA Unfortunately. c b b 0 a. q0 . ) = q and δ(q. and so they do not have the same expressive power as NBAs. c a. w). and extend δ to δ : Q × ∗ → Q by δ(q. We show that the language of ω-words containing ﬁnitely many as is not recognized by any DBA. c 1 b. Proposition 11. DBAs do not recognize all ω-regular languages. the automaton A has an accepting run on w0 . which . b}. and so A has an accepting run on w1 . Consider now w1 = u0 a bω .6 The language L = (a+b)∗ bω ..2. the NBA for this language “guesses” the last occurrence of a. for some DBA A = ({a.

This way. More formally. A NGA with n states and m sets of accepting states can be translated into an NBA with mn states. m − 1}. . . .. . a ui a . . a ui (a ui+1 . so it does not belong to L. . the transitions of the i-th copy that leave a state of Fi are redirected from the i-th copy to the (i ⊕ 1)-th copy). . . there are indices ˆ 0 . .3 ¨ Generalized Buchi automata Generalized B¨ chi automata are an extension of B¨ chi automata that is convenient for some conu u structions (for instance. u j−1 a u j )ω .1.e. m − 1}.¨ 11. where ⊕ denotes addition modulo m. . i] are states of the i-th copy. a u j ). every visit of ρ to Fi is eventually followed by a later visit to Fi⊕1 . . u0 a u1 a . . A generalized B¨ chi automaton (NGA) diﬀers from a B¨ chi u u automaton in its accepting condition. . but with a modiﬁcation: the NBA moves from the i-th to the i ⊕ 1-th copy whenever it visits a state of Fi (i. a ui ) ∈ F. Fm−1 }. u0 a u1 ) ∈ F for some ﬁnite preﬁx u0 a u1 of w1 . . This suggests to take for the NBA m “copies” of the NGA. . But this word has inﬁnitely many occurrences of a. . Ordinary B¨ chi automata correspond to u u the special case m = 1. . . If q Fi then the successors of [q. (Between the visits to Fi and Fi⊕1 there can be arbitrarily many visits to other sets of F. m − 1}. . the states of the NBA are pairs [q. . a ui ) = δ(q0 . visiting the accepting states of the ﬁrst copy inﬁnitely often is equivalent to visiting the accepting states of each copy inﬁnitely often. intersection). . Formally. The construction is based on the following observation: a run ρ visits each set of F inﬁnitely if and only if the following two conditions hold: (1) ρ eventually visits F0 . i] is the i-th copy of q. Abusing language. . a . we speak of the generalized B¨ chi condition F. .) whose proof is obvious. In a similar fashion we continue ˆ constructing ﬁnite words ui such that δ(q0 . Note that the L = ((a + b)∗ a)ω (the set of inﬁnite words in which a occurs inﬁnitely often) is accepted by the DBA on the left of Figure 11. and otherwise states of the (i ⊕ 1)-th copy. u0 a u1 . . [q. u0 a u1 a . and a run ρ is accepting if for every set Fi ∈ F some state of Fi is visited by ρ inﬁnitely often. GENERALIZED BUCHI AUTOMATA 201 ˆ implies δ(q0 . Intuitively. It follows that A has ˆ 0 ≤ i < j such that δ(q an accepting run on u0 a u1 a . ρ is accepting if inf(ρ) ∩ Fi ∅ for every i ∈ {0. 11. . . i] where q is a state of the NGA and i ∈ {0. . and (2) for every i ∈ {0. Since Q is ﬁnite. .3. . A NGA has a collection of sets of accepting states F = {F0 .

F). 1] are redirected to the pink copy. 0]. It is easy to see that the automaton recognizes the ω-words containing inﬁnitely many occurrences of a and inﬁnitely many occurrences of b. F ) Example 11. where F = {F1 . and the NBA obtained by applying NGAtoNBA to it on the right. q0 . q ∈ δ(q.5: A NGA and its corresponding NBA The NBA on the right consists of two copies of the NGA: the 0-th copy (pink) and the 1-st copy (blue). [q . . Σ. and transitins leaving [r. q0 . 0]} while W ∅ do pick [q. i] to W add ([q.5 shows a NGA over the alphabet {a. i ⊕ 1] Q then add [q . a b a a b a b q. 1 Figure 11. δ . i]) to δ else /* q ∈ Fi */ if [q . . F ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Q . 0 b r. Σ. 1 b a r. 0] W ← {[q0 . 0] are redirected to the blue copy. [q . a. The NGA has two sets of accepting states. Transitions leaving [q. F0 = {q} and F1 = {r}. and so its accepting runs are those that visit both q and r inﬁnitely often . . q0 . i ⊕ 1] to W add ([q. i] Q then add [q . The only accepting state is [q. 0 a q b r F = { {q}. δ . i]. q0 ← [q0 . i]. δ . a) do if q Fi then if [q .7 Figure 11. . F ← ∅. a. δ. b} on the left. i] from W add [q. i] to Q if q ∈ F0 and i = 0 then add [q. Σ. i ⊕ 1]) to δ return (Q . i] to F for all a ∈ Σ.202 CHAPTER 11. {r} } q. . CLASSES OF ω-AUTOMATA AND CONVERSIONS NGAtoNBA(A) Input: NGA A = (Q. Fm } Output: NBA A = (Q .

Formally. We deﬁne a mapping dag that assigns to each w ∈ Σω a directed acyclic graph dag(w). 0 for every initial state q ∈ Q0 . u We show that co-B¨ chi automata can be determinized. but the ﬁrst determinizable classes we ﬁnd will have other problems. 3. However.. . • If dag(w) contains a node q. We ﬁx an NCA A = (Q. Intuitively.6: Running example for the determinization procedure in three steps: 1. We construct a DCA B such that Lω (B)) = Lω (A) a a q b Figure 11.4 Other classes of ω-automata We have seen that not every NBA is equivalent to a DBA. q). there is determinization procedure for B¨ chi automata.4. Figure 11.6 as running example. 2. F) with u n states. Formally. let w = σ1 σ2 . −− r . then dag(w) also contains a node σi+1 q . As we shall see.11. and is inductively deﬁned as follows: • dag(w) contains a node q. So a run of a NCA is accepting iff it is not accepting as run of a NBA (this is the reason for the name “co-B¨ chi”). be a word of Σω . a run ρ of a NCA is accepting if it only visits states of F ﬁnitely often.4.1 ¨ Co-Buchi Automata Like a B¨ chi automaton. δ. the answer is yes. and use Figure 11. dag(w) is the result of “bundling together” all the runs of A on the word w. i.7 shows the initial parts of dag(abaω ) and dag((ab)ω ). .e. i − − → q . 11. We construct a DCA B which accepts w if and only if dag(w) contains ﬁnitely many breakpoints. ρ is accepting if inf(ρ) ∩ F = ∅. i and q ∈ δ(σi+1 . i + 1 . We prove that w is rejected by A iff dag(w) contains only ﬁnitely many breakpoints. q0 . The directed acyclic graph dag(w) has nodes in Q × IN and edges labelled by letters of Σ. This raises the question whether there exists other classes of automata for which u a determinization procedure exists. Σ. and so this section can be seen as a quest for automata classes satisfying more and more properties. i + 1 and an edge q. OTHER CLASSES OF ω-AUTOMATA 203 11. a (nondeterministic) co-B¨ chi automaton (NCA) has a set F of accepting u u states.

0 CHAPTER 11. 1 −− q2 . A accepts w if and only if no path of dag(w) visits accepting states inﬁnitely often. 1 a a r. 4 q. we are done. but dag(abaω ) shows that this is not true: even though all its paths visit accepting states only ﬁnitely often. We partition the nodes of dag(w) into levels: the i-th level contains all the nodes of dag(w) of the form q. 2 q. and moreover all inﬁnite paths visit accepting states inﬁnitely often. let i be the largest breakpoint. 1 b q. then for every j > i there is a path π j from level i to level j that does not visit any accepting state. But obviously π never visits any o accepting state. If dag(w) is inﬁnite. then the next level l > l such that every path between nodes of l and l visits an accepting state is also a breakpoint. Clearly. all all levels i ≥ 3) contain accepting states. 3 q. 3 b q. 3 a a q. inﬁnitely many levels (in fact. One could be tempted to think that the accepting condition “some path of dag(w) only visits accepting states ﬁnitely often” is equivalent to “only ﬁnitely many levels of dag(w) contain accepting states”. as σ1 σ2 σ1 σ2 . 2 · · · is a path −→ −→ −→ −→ of dag(w). If dag(w) is ﬁnite. then clearly dag(w) contains at least an inﬁnite path. q0 −− q1 −− q2 · · · is a run of A if and only if q0 . which therefore must contain an inﬁnite path π by K¨ nigs lemma. if we were able to tell that a level is a breakpoint just by examining it. 4 ··· Figure 11. inductively deﬁned as follows: • The 0-th level of dag(w) is a breakpoint. CLASSES OF ω-AUTOMATA AND CONVERSIONS a a r.e. 1 b q. the powerset of Q. and we are done. we would be done: we would take the set of all possible levels as states of the DCA (i. 3 q. 1 a a r. If the breakpoint set is ﬁnite..7: The (initial parts of) dag(abaω ) and dag((ab)ω ) • dag(w) contains no other nodes or edges. • If level l is a breakpoint. 2 a a r.204 q. i . Now. We claim that “some path of dag(w) only visits accepting states ﬁnitely often” is equivalent to “the set of breakpoint levels of dag(w) is ﬁnite”. For this reason we introduce the set of breakpoint levels of the graph dag(w). 0 q. 4 ··· r. If the breakpoint set is inﬁnite. 0 −− q1 . The paths {π j } j>i build an acyclic graph of bounded degree. Moreover.

a. a) = [P . O ] Q then add [P . {q0 }] ˜ ˜ 2 W ← { q0 } ˜ 3 while W ∅ do ˜ 4 pick [P. if the current set of owing states is nonempty. a) 8 if O ∅ then O ← δ(O. O]. – if O = ∅. O ]... δ. We call O the set of owing states (states that “owe” a visit to the accepting states).8 shows the result of applying the algorithm to our running example. and the DCA below it on the left. Σ. i.e. The run of this automaton on w would be nothing but an encoding of dag(w). q0 . where O ⊆ P ⊆ Q. a). δ. with the following intended meaning: P is the set of states of a level. δ. a) \ F else O ← δ(P. On the right we show the DFA obtained by applying the . O] to F 6 for all a ∈ Σ do 7 P = δ(P. O ]) to δ ˜ 10 if [P .e. and q ∈ O iff q is the endpoint of a path that starts at the last breakpoint and has not yet visited any accepting state.e. The NCA is at the top. where P = δ(P. ∅] if q0 ∈ F. Σ. ˜ • The accepting states are those at which a breakpoint is reached. O]. [P . The algorithm for the construction is: NCAtoDCA(A) Input: NCA A = (Q.4. ˜ • The transition relation is given by δ([P. O] to Q ˜ 5 if P = ∅ then add [P. To guarantee that O indeed has this intended meaning. q0 . F) ˜ ˜ ˜ ˜ Output: DCA B = (Q. and : – if O ∅. O]. as required by the co-B¨ chi acceptance condition.11. ∅] else q0 ← [{q0 }. {q0 }] otherwise. O] from W. The solution to this problem is to add information to the states: we take for the states of the DCA pairs [P. q0 . [P. if q0 ∈ F then q0 ← [q0 . (i. a) \ F. and it would be accepting iff it contains only ﬁnitely many breakpoints. if the current level is a breakpoint. the level does not contain u enough information for that. we deﬁne the ˜ ˜ ˜ ˜ DCA B = (Q. and the automaton must start searching for the next one) then O = δ(P. then O = δ(O. Σ. add [P. OTHER CLASSES OF ω-AUTOMATA 205 in the powerset construction for determinization of NFAs). i. δ.e. all non-ﬁnal states of the next level become owing. F) with Lω (A) = B ˜ ˜ ˜ 1 Q. However. the possible transitions between levels as transitions. F) as follows: • The initial state is the pair [{q0 }. i. O] ∈ F is accepting iff O = ∅. F ← ∅.. a run is accepting iff it contains inﬁnitely many breakpoints. and the breakpoints as accepting states. Q ] to W Figure 11. and the pair [{q0 }. With this deﬁnition. a) \ F ˜ 9 add ([P. a) \ F.

. b b ∅ a.4. O] such that O ⊆ P ⊆ Q. A run ρ is accepting if the set of states ρ visits inﬁnitely often is equal to one of the Fi ’s. . and interpret the result as a DBA. CLASSES OF ω-AUTOMATA AND CONVERSIONS a a q b r {q}.5 (top). . {q} b a b {q. Then there would also be a DCA for the same language. and DFA (lower right) powerset construction. proving the claim. ∅ q a {q. . we u claim that no NCA recognizes the language L of ω-words over {a.2 Muller automata A (nondeterministic) Muller automaton (NMA) has a collection {F0 . r} b Figure 11. . observe that the number of states of the DCA is bounded by the number of pairs [P. . and q ∈ Q \ P. So we now ask whether there is a class of ω-automata that (1) recognizes all ω-regular languages and (2) has a determinization procedure. For every state q ∈ Q there are three mutually exclusive possibilities: q ∈ O. ρ is accepting if inf(ρ) = Fi for some i ∈ {0. Fm−1 }. q ∈ P \ O.206 CHAPTER 11. which by Proposition 11. . . b} containing inﬁnitely many a’s. We speak of the Muller condition {F0 .6 is not accepted by any DBA. r}. which is not accepted by the original NCA. . assume the contrary. and we reach a contradiction. Observe that it does not yield a correct result: the resulting NCA accepts for instance the word bω . For the complexity. Now we swap accepting and non-accepting states. ∅ a. . b {q}. Fm−1 } of sets of accepting states. It is easy to see that this DBA accepts the complement of L. 11. But the complement of L is (a + b)∗ bω . So if A has n states then B has at most 3n states. DCA (lower left). m−1}. Unfortunately.8: NCA of Figure 12. co-B¨ chi automata do not recognize all ω-regular languages. . . . Formally. In particular. To see why. {q} a b ∅.

ρ consists of an initial ﬁnite part. ﬁnally. {Fi }). and we omit it here. q0 . but eventually jump to q1 and never visit q0 again. We can easily give a deterministic Muller automaton for the language L = (a + b)∗ bω . Σ..11. In other words. {F0 . say ρ0 . say ρ1 . . It is easy to see that these runs accept exactly the words containing ﬁnitely many occurrences of a. then we convert Ai into a NBA Ai using NGAtoNBA() and. . we observe that.8 (Safra) Any NBA with n states can be eﬀectively transformed into a DMA of size nO(n) . is not recognized by any DBA. . that only visits states of Fi .9. . q0 . The accepting runs are the runs ρ such that inf(ρ) = {q1 }.6. Fm−1 }). We ﬁnally show that an NMA can be translated into a NBA. . we construct A = m−1 Ai using the algorithm for union of NFAs. . Σ. and an inﬁnite part. m−1 it is easy to see that Lω (A) = i=0 Lω (Ai ). which. . In Chapter 4 we have presented an algorithm that given NFAs A1 and A2 computes an NFA A1 ∪ A2 recognizing L(A1 ) ∪ L(A2 ). The idea for . but with a Muller condition {F ⊆ Q | qn ∈ F}. since an accepting run ρ of Ai satisﬁes inf(ρ) = Fi . It is easy to see that exactly the same algorithm works for NBAs. the B¨ chi and generalized B¨ chi conditions are special cases of the u u Muller condition (as well as the Rabin and Street conditions introduced in the next sections). Theorem 11. δ. In particular. that may visit all states. . i=0 For the ﬁrst step. The proof of this result is complicated. Deterministic Muller automata recognize all ω-regular languages. It is graphically represented in Figure 11. It suﬃces to put in the collection all sets of states for which the predicate holds. i. the runs visiting state 1 inﬁnitely often and state q0 ﬁnitely often. The obvious disadvantage is that the translation of a B¨ chi condition into a Muller condition involves u an exponential blow-up: a B¨ chi automaton with states Q = {q0 . qn } and B¨ chi condition u u {qn } is transformed into an NMA with the same states and transitions. The algorithm puts A1 and A2 “side by side”. Those are the runs that initially move between states q0 and q1 . the condition (q ∈ inf(ρ)) ∧ ¬(q ∈ inf(ρ)) corresponds to the Muller condition containing all sets of states F such that q ∈ F and q F. we convert the NMA Ai into a NGA Ai . Given a Muller automaton A = (Q. So we proceed in three steps: ﬁrst.e. The automaton is a b b q0 a q1 with Muller condition { {q1 } }. from some point on the run only visits states of Fi . OTHER CLASSES OF ω-AUTOMATA 207 NMAs have the nice feature that any boolean combination of predicates of the form “state q is visited inﬁnitely often” can be formulated as a Muller condition. and so that Muller and B¨ chi u automata have the same expressive power. where Ai = (Q. as shown in Proposition 11. a collection containing 2n sets of states. We start with a simple observation. and adds a new initial state and transitions.4. Equipped with this observation we proceed as follows. For instance. δ.

Since in this particular case F0 and F1 only contain singleton sets. 0] of the ﬁrst copy a such that q ∈ Fi there is another transition [q. i.9 Figure 11. 0] − − →[q . while the second one only contains copies of the states of Fi . . Σ. It follows that A accepts the ω-words that contain ﬁnitely many as or ﬁnitely many bs. 0] − − →[q . . A0 and A1 are in fact NBAs. {qk .208 CHAPTER 11. 1]}.10 shows a NMA A = (Q. . Ai simulates ρ by executing ρ0 in the ﬁrst copy.e. Formally. u a Example 11. we have A0 = A0 and A1 = A1 . The ﬁrst one is a “full” copy. The condition that ρ1 must visit each state of Fi inﬁnitely often is enforced as follows: if Fi = {q1 .5. .. The bottom-right part shows the ﬁnal NBA A = A0 ∪ A1 . then we take for Ai the generalized B¨ chi condition { {[q1 . q0 . 1]} }. CLASSES OF ω-AUTOMATA AND CONVERSIONS a b a a b b a a b a a b a b a Figure 11. . While A is syntactically identical to the NGA of Figure 11. an accepting run ρ eventually moves to q and stays there forever. . In other words. the algorithm carrying out the ﬁrst step of the construction looks as follows: . . {r} }. The top-right part of the ﬁgure shows the two NGAs A0 . δ. qk }. and ρ1 in the second. we now interpret F as a Muller condition: a run ρ is accepting if inf(ρ) = {q} or inf(ρ) = {r}.9: Union for NBAs the construction for Ai is to take two copies of Ai . . 1] leading to the “twin brother” [q . For every transition [q. A1 deﬁned above. 1]. Intuitively. F) where F = { {q}. or eventually moves to r and stays there forever.

q0 .3 Rabin automata The acceptance condition of a nondeterministic Rabin automaton (NRA) is a set of pairs F = { F0 .4. . 0]. Fm . Σ. . Formally. 0] to W if q ∈ F then add ([q. 1] Q then add [q . It can be shown that the exponential blow-up in the conversion from NBA to NMA cannot be avoided. δ. 0]. . . (2) has a determinization procedure.4. the conversion NMA to NBA does not. where the Fi ’s and Gi ’s are sets of states. 0]) to δ if [q . . which leads to the next step in our quest: is there a class of ω-automata that (1) recognizes all ω-regular languages. Observe in particular that while the conversion from NBA to NMA involves a possibly exponential blow-up. q ∈ δ(q. ρ is accepting if there is i ∈ {1. . δ . F ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 209 Q . {F} Output: NGA A = (Q . [q . δ . 0]} while W ∅ do pick [q. i] to Q if q ∈ F and i = 1 then add {[q. Am−1 has at most 2n2 states. Gm }. a. Σ. [q . . i] from W. Each of the NGAs A0 . 1] } to F for all a ∈ Σ. . and the ﬁnal NBA has at most 2n2 m + 1 states. m} such that inf(ρ) ∩ Fi ∅ and inf(ρ) ∩ Gi = ∅. add [q. 0] Q then add [q . Am−1 has at most 2n states. q0 . F ) Complexity. . Gi such that ρ visits some state of Fi inﬁnitely often and all states of Gi ﬁnitely often. and an acceptance condition containing at most m acceptance sets. G0 . 11. [q . 1] to W return (Q .F ← ∅ q0 ← [q0 . Σ. .δ . . 1] to W else /* i = 1 */ if q ∈ F then add ([q. . q0 . 1]) to δ if [q . So each of the NBAs A0 . . 1] Q then add [q . 1]) to δ if [q . 1]. a) do if i = 0 then add ([q. and (3) has polynomial conversion algorithms to and from NBA. a.11. A B¨ chi automaton can be easily transformed into a Rabin automaton and vice versa. . 0] W ← {[q0 . without u any exponential blow-up. . a. . A run ρ is accepting if there is a pair Fi . Assume Q contains n states and F contains m accepting sets. OTHER CLASSES OF ω-AUTOMATA NMA1toNGA(A) Input: NMA A = (Q. .

. . then it also . G0 . Given a Rabin automaton A = (Q. Since NRAs and NBAs are equally expressive by the conversions above. it suﬃces to show that DRAs are as expressive as NBAs. . We ﬁnish the chapter with a short discussion of Street automata. m} such that inf(ρ) ∩ Fi The negation is of the form for every i ∈ {1. CLASSES OF ω-AUTOMATA AND CONVERSIONS NBA → NRA. but now instead of NGAs they are NBAs with accepting states [q. ({qn }. . . and ρ1 in the second.10 can be reused to illustrate the conversion of a Rabin into a B¨ chi u automaton. The ﬁnal NBA is exactly the same one. . . as for Muller automata. i=0 In this case we directly translate each Ai into an NBA. . The condition that ρ1 must visit some state of Fi inﬁnitely often is enforced by taking Fi as B¨ chi condition. ∅)}. where Ai = (Q. . .10 Figure 11. Again. say ρ0 . We only present the result. u NRA → NBA. A1 are as u shown at the top-right part. it is easy to see that. if ρ visits some state of Fi inﬁnitely often.210 CHAPTER 11. δ. ∅). . without proof. we have Lω (A) = m−1 Lω (Ai ). δ. q0 . the acceptance condition is again a set of pairs { F1 . that only visits states of Q \ Gi . . . In a Streett automaton. G1 . Σ. respectively. . . we take two copies of Ai . . Gm }. A run ρ is accepting if for every pair Fi . . Fm−1 . . 1]. F1 . Σ. and so the ﬁnal B¨ chi u automaton has at most 2nm + 1 states. Since an accepting run ρ of Ai satisﬁes inf(ρ) ∩ Gi = ∅. Recall the condition: there is i ∈ {1. from some point on the run only visits states of Qi \ Gi . say ρ1 . G1 }. { Fi . Intuitively. For the complexity. . that may visit all states. m}: inf(ρ) ∩ Fi = ∅ or inf(ρ) ∩ Gi ∅ ∅ and inf(ρ) ∩ Gi = ∅ This is called the Streett condition.11 (Safra) Any NBA with n states can be eﬀectively transformed into a DRA of size nO(n) and O(n) accepting pairs. . Ai simulates ρ by executing ρ0 in the ﬁrst copy. A B¨ chi condition {q1 . u Example 11. { F0 . Streett automata The ﬁnal class of ω-automata we consider are Streett automata. Then the automaton accepts the ω-words that contain ﬁnitely many as or ﬁnitely many bs. Gi }). The B¨ chi automata A0 . . The accepting condition of Rabin automata is not “closed under negation”. . and G0 = {r} = F1 . G0 . Gm−1 }). qk } corresponds to the Rabin condition {({q1 }. Fm . . . Gi are sets of states. where Fi . but this time with Rabin accepting condition { F0 . Theorem 11. Consider the automaton on the left. observe that each of the Ai has at most 2n states. 1] and [r. where F0 = {q} = G1 . So ρ consists of an initial ﬁnite part. The proof that NRAs are as expressive as DRAs is complicated. q0 . and an inﬁnite part. Gi .

c}ω | {a. b. {w ∈ {a. . . b. Gm } if and only if it does not satisfy the Rabin condition { F1 . . we represent an inﬁnite sequence x1 . . . b} = in f (w)} 3. c}ω | {a. G1 . A B¨ chi condition {q1 . b. c}ω | {a. for each component i ∈ {1. . c}ω | if a ∈ in f (w) then {b. qk } }. . b. . {q1 . m} inf(ρ) ∩ Fi ∅ implies inf(ρ) ∩ Gi ∅ (equivalently. c}ω | {a. Exercise 87 Construct B¨ chi automata and ω-regular expressions recognizing the following lanu guages over the alphabet {a. {w ∈ {a. It can be shown that the exponential blow-up is unavoidable. Fm . u . It is easy to see that Ln can be accepted by a NSA with 3n states and 2n accepting pairs. . . {w ∈ {a. a run ρ is accepting if for every i ∈ {1. We can transform into an NBA by following the path NSA → NMA → NBA. but a ﬁnite set of ω-words is not always an ω-regular language: ﬁnd an ω-word w ∈ {a. However. . This yields for a NSA with n states an NBA with 2n2 2n states. . 2}. . c} ⊆ in f (w)} Hint: It may be easier to construct a generalized B¨ chi automaton ﬁrst and then transform it into a u B¨ chi automaton. . b. . For n ≥ 1. u Example 11. over Σn . x2 . Gm }. c}. c} = in f (w)} 5. u NSA → NBA. . . b.11. G1 . if inf(ρ) ∩ Fi = ∅ or inf(ρ) ∩ Gi ∅ for every i ∈ {1. .4. Streett automata can be exponentially more succinct than B¨ chi automata. b} ⊆ in f (w)} 4. . but cannot be accepted by any NBA with less than 2n states. Let Ln be the language in which. . . n}. . m}). . 1. . u NBA → NSA. Formally. . x j (i) = 1 for inﬁnitely many j’s if and only if xk (i) = 2 for inﬁnitely many k’s. {w ∈ {a. . . . {w ∈ {a. A B¨ chi automaton can be easily transformed into a Streett automaton and vice versa. 1. Fm . . of vectors of dimension n with components in Σ by the ω-word x1 x2 . . OTHER CLASSES OF ω-AUTOMATA 211 visits some state of Gi inﬁnitely often. u the conversion from Streett to B¨ chi is exponential.12 Let Σ = {0. . . . Exercises Exercise 86 A ﬁnite set of ﬁnite words is always a regular language. b} ⊇ in f (w)} 2. qk } corresponds to the Streett condition { Q. The reader has probably observed that this is the dual of Rabin’s condition: a run satisﬁes the Streett condition { F1 . b}ω such that no B¨ chi automaton u recognizes the language {w}. in other words. b.

Formally. α) is deterministic in the limit if |δ(q. A run ρ is accepting if the number max{n s | s ∈ inf (ρ)} is even. Prove that every language recognized by nondeterministic B¨ chi automata is also accepted by u B¨ chi automata deterministic in the limit.212 Exercise 88 CHAPTER 11. v. CLASSES OF ω-AUTOMATA AND CONVERSIONS Find algorithms for the following decision problems: • Given ﬁnite words u. • Given a B¨ chi automaton A and ﬁnite words u. x. v. y ∈ Σ∗ . Q. decide whether the ω-words = u vω and x yω are equal. b}ω | w has exactly two occurrences of ab}. . u Find ω-regular expressions for the following languages: Exercise 89 1. {w ∈ {a. b}ω | w has no occurrence of bab} Exercise 90 A B¨ chi automaton is deterministic in the limit if all its accepting states and their u descendants are deterministic states. σ)| ≤ 1 for every state q ∈ Q that is reachable from some state of α. A = (Σ. Q0 . decide whether A accepts the ω-word u vω . b}ω | k is even for each substring bak b of w} 2. Every state q of the automaton is assigned a natural number nq . and for every σ ∈ Σ. {w ∈ {a. • Find a parity automaton accepting the language L = {w ∈ {a. • Show that each language accepted by a parity automaton is also accepted by a Rabin automaton and vice versa. u Exercise 91 The parity acceptance condition for ω-automata is deﬁned as follows. δ.

OTHER CLASSES OF ω-AUTOMATA 213 A0 b a a b A1 a a q.10: A Muller automaton and its conversion into a NBA .11. 0 F0 = {q} F1 = {r} b b q. 0 b r. 1] } a A b a a F0 = { [q. 1 b a b a a q. F1} r b q. 1] } b a a q b F = {F0. 0 b b r. 0 b b r. 0 q. 1 b r. 0 a a r. 0 b r. 1 F1 = { [r. 1 a A Figure 11. 0 a q.4.

214 CHAPTER 11. CLASSES OF ω-AUTOMATA AND CONVERSIONS .

In Section 12. and the emptiness.1 we show that union and intersection can be easily implemented using constructions already presented in Chapter 2. similarly. A2 ) in page u a q0 a q1 r0 a a r1 Figure 12. inclusion. We do not discuss implementations on DBAs. intersection. this is not the case. Observer that we now leave the membership test out. and complement in the ﬁrst part. The B¨ chi automaton A1 ∩ A2 obtained by applying algorithm IntersNFA(A1 . One might be tempted to think that.Chapter 12 Boolean operations: Implementations The list of operations of Chapter 4 can be split into two parts. For ωwords of the form w1 (w2 )ω . while the tests are discussed in the next one. However.1. which is more involved. Consider the two B¨ chi automata A1 and A2 of Figure u 12.1 Union and intersection As already observed in Chapter 2. membership in an ω-regular language L can be implemented by checking if the intersection of L and {w1 (w2 )ω } is empty. Observe that a test for arbitrary ω-words does not make sense. because no description formalism can represent arbitrary ω-words. We provide implementations for ω-languages represented by NBAs and NGAs. the intersection algorithm for NFAs also works for NBAs. and equality tests in the second. the algorithm for union of regular languages represented as NFAs also works for NBAs and for NGAs. because they cannot represent all ω-regular languages. w2 are ﬁnite words. where w1 . The rest of the chapter is devoted to the complement operation.1: Two B¨ chi automata accepting the language aω u 215 . with the the boolean operations union. 12. This chapter deals with the boolean operations.

every transition of the a a form [q1 . q2 . 1] − →[q − 1 .2: The automaton A1 ∩ A2 What happened? A run ρ of A1 ∩ A2 on an ω-word w is the result of pairing runs ρ1 and ρ2 of A1 and A2 on w. since q1 is →[q a . 1] − − 1 . Example 12. ρ is accepting if ρ1 and ρ2 simultaneously visit accepting states inﬁnitely often. However. 1] − − 1 . A2 ]. We proceed as in the translation NGA → NBA.e. a q0. Since q0 is not an q 0 . r0 . and as a result Lω (A1 ∩ A2 ) can be a strict subset of Lω (A1 ) ∩ Lω (A2 ). q2 ] are called [q1 . r1 Figure 12. r1 . but Lω (A1 ∩ A2 ) = ∅.1 accepting state of A1 . q2 . respectively. The ﬁrst and second copies of a state [q1 . r0 a q1. the transitions leav→[q ing the states [q1 .2.1. 1] is not redirected.216 CHAPTER 12. q2 . 1] such that q1 ∈ F1 as accepting. q2 . by interpreting the output of the algorithm as a B¨ chi automaton) is shown in u Figure 12. and (2) every visit of ρ to F1 is eventually followed by a visit to F2 (with possibly further visits to F1 in-between). The transitions leaving the states [q1 . Intuitively. This problem is solved by means of the observation we already made when dealing with NGAs: the run ρ visits states of F1 and F2 inﬁnitely often if and only if the following two conditions hold: (1) ρ eventually visits F1 . 2]. the transition [q0 . i. r1. and so Lω (A1 ) = Lω (A2 ) = {aω }. 1] and [q1 . r0 . 2]. 1 Figure 12. q2 . Similarly. Since the accepting set of A1 ∩ A2 is the cartesian product of the accepting sets of A1 and A2 . q2 . 2 a q1. q2 . 1]. and place them one of top of the other.3 shows the result of the construction for the NBAs A1 and A2 of Figure 12. This condition is too strong. 1] is replaced by [q1 . we take two “copies” of the pairing [A1 . We choose [q01 .1 Figure 12.3: The NBA A1 ∩ω A2 for the automata A1 and A2 of Figure 12. and declare the states [q1 . and every visit to F2 is eventually followed by a visit to F1 (with possibly further visits to F1 in-between). 1 a a q 1 . BOOLEAN OPERATIONS: IMPLEMENTATIONS 70 (more precisely. It has no accepting states. q2 . 2] such that q2 ∈ F2 are redirected to the ﬁrst copy. after removing the states that are not reachable form the initial state. 1] such that q1 ∈ F1 are redirected to the corresponding states of the second copy. q02 . r1 . as initial state. q2 ..

1] to W return (Q. A2 = (Q2 . 1] Q then add [q1 . 1]) to δ if [q1 . Finally. 2] to W if i = 2 and q2 F2 then add ([q1 . q2 . q2 . r0 . As usual. F2 ) Output: NBA A1 ∩ω A2 = (Q. The only accepting state is [q1 . Σ. q2 . which is the case iff ρ1 and ρ2 visit states of F1 and F2 . 2]. q2 . q2 . 1] by [q1 . 2] Q then add [q1 . and so we replace a a [q1 . i] to Q if q1 ∈ F1 and i = 1 then add [q1 . [q1 . ρ is accepting iff it visits both copies inﬁnitely often. r1 . 2]) to δ if [q1 . r0 . F) with Lω (A1 ∩ω A2 ) = Lω (A1 ) ∩ Lω (A2 ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Q. 2] − − 1 . q2 . a. r0 . inﬁnitely often. since r0 is an accepting state of A2 . F1 ). transitions leaving [q1 . q0 . the algorithm only constructs states reachable from the initial state. a. q2 . To see that the construction works. 1].12. 2]) to δ if [q1 . r0 . q2 . F) There is an important case in which the construction for NFAs can also be applied to NBAs. shown below. IntersNBA(A1 . a. r0 . 1]) to δ if [q1 . 2] to W if i = 2 and q2 ∈ F2 then add ([q1 . . 1] to W if i = 1 and q1 ∈ F1 then add ([q1 . 1] to F for all a ∈ Σ do for all q1 ∈ δ1 (q1 . q2 . q2 . A2 ) Input: NBAs A1 = (Q1 . r1 . and the language accepted by the →[q NBA is aω . Since all transitions leaving the accepting states jump to the second copy. q0 . q02 . Algorithm IntersNBA(). respectively. δ1 . Σ. q2 ∈ δ(q2 . q2 . 2] by →[q a [q0 . [q1 . q02 . q2 . 1]. 1]. r1 . and so we replace [q0 . δ. δ. q2 . F ← ∅ q0 ← [q01 . q2 .1. Σ. a) do if i = 1 and q1 F1 then add ([q1 . respectively. a. q2 . q2 . UNION AND INTERSECTION 217 an accepting state. q2 . tran→[q a sitions leaving [q0 . returns an NBA A1 ∩ω A2 . 1] must jump to the second copy. q2 . r1 . δ. 2] Q then add [q1 . 1] } while W ∅ do pick [q1 . [q1 . δ2 . 1] W ← { [q01 . q02 . 2]. r1 . q2 . r1 . 2] must return to the ﬁrst copy. observe ﬁrst that a run ρ of this new NBA still corresponds to the pairing of two runs ρ1 and ρ2 of A1 and A2 . q01 . 1]. Σ. 1] − − 0 . 1] Q then add [q1 . 2]. 2] − − 1 . i] from W add [q1 . 1] − →[q − 0 . a). [q1 .

the condition that two runs ρ1 and ρ2 on an ω-word w simultaneously visit accepting states inﬁnitely often is equivalent to the weaker condition that does not require simultaneity: any visit of ρ2 to an accepting state is a simultaneous visit of ρ1 and ρ2 to accepting states. the repeated application of IntersNBA() is not optimal. the result of applying the subset construction to it. It is also important to observe a diﬀerence with the intersection for NFAs. q1 } Figure 12. . This construction yields an NBA with k · n1 · . a NBA and the result of applying the subset construction to it do not necessarily accept the same ω-language. ∩ L(An ) with at most k ni states by repeatedly applying the intersection operation. For NBAs this approach breaks down completely: (a) The subset construction does not preserve ω-languages. In this case. and move from the i-th copy to the (i + 1)-th copy when we hit an accepting state of Ai . . Ak with n1 . .e. . nk states. ∩ Lω (Ak ) it yields an NBA with 2k−1 · n1 · . 12. . × Ak . · nk states. ..e. However. We obtain a better construction proceeding as in the translation NGA → NBA: we produce k copies of A1 × . accepts aω . and then exchanging the ﬁnal and non-ﬁnal states of the DFA. · nk states.2 Complement So far we have been able to adapt the constructions for NFAs to NBAs. for Lω (A1 ) ∩ . we can compute an NFA for L(A1 ) ∩ . .218 CHAPTER 12. . Notice that both automata accept the same ﬁnite words. . Since IntersNBA() introduces an additional factor of 2 in the number of states. . . . The situation is considerably more involved for complement. . . say A1 are accepting.1 The problems of complement Recall that for NFAs a complement automaton is constructed by ﬁrst converting the NFA into a DFA. . 12. In the ﬁnite word case. . . BOOLEAN OPERATIONS: IMPLEMENTATIONS namely when all the states of one of the two NBAs. i. shown on the right. and this construction i=1 is optimal (i. In the NBA case. there is a family of instances of arbitrary size such that the smallest NFA for the intersection of the languages has the same size). The NBA on the left of Figure 12. however.2.4: The subset construction does not preserve ω-languages . a q0 a q1 {q0 } a a {q0 .4 accepts the empty language. given NFAs A1 . .

2. In the second step we proﬁt from the determinization construction for co-B¨ chi automata.12.o. Recall u that the construction maps dag(w) to a run ρ of a new automatonsuch that: every path of dag(w) visits accepting states of A i. COMPLEMENT 219 (b) The subset construction cannot be replaced by another determinization procedure. and so. while a word w has one single dag dag(w). some languages are accepted by NBAs. F) with n states.o.o. Σ. In Figure 12. We give a summary of the procedure. We wish to build an automaton a a q b Figure 12. q0 .o. and use Figure 12.”. if and only if ρ visits accepting states of the new automaton i.o. For the moment it suﬃces to say that a ranking of w is the result of decorating the nodes of dag(w) with numbers.o. The essential property of rankings will be: no path of dag(w) visits accepting states of A i. For the rest of the chapter we ﬁx an NBA A = (Q. We apply the same construction to map every ranking R(w) to a run ρ of a new automaton B such that r .6. (c) The automaton obtained by exchanging accepting and non-accepting states in a given DBA does not necessarily recognize the complement of the language. Further. Observe that as automata for ﬁnite words they accept the words over the letter a of even and odd length. However.5: Running example for the complementation procedure A satisfying: no path of dag(w) visits accepting states of A i. it may have many rankings. both A1 and A2 accept the language aω . NBAs turn out to be closed under complement.1. respectively.o. First.5 as running example. we deﬁne the notion of ranking. because no such procedure exists: As we have seen in Proposition 11. A2 is obtained by exchanging ﬁnal and non-ﬁnal states in A1 . if and only if for some ranking R(w) every path of dag(w) visits nodes of odd rank i. but not by DBAs. if and only if some run of w in A visits accepting states of A i. δ. Despite these discouraging observations. This can be done in diﬀerent ways. we abbreviate “inﬁnitely often” to “i.

satisfying the following two properties: (a) the rank of a node is greater than or equal to the rank of its children. and we are done. we show that a ﬁnite subautomaton A of B already recognizes the same language as B.o. 3 a a 1 q0.6 shows rankings for dag(abaω ) and dag((ab)ω ). 1 b 1 q0.6: Rankings for dag(abaω ) and dag((ab)ω ) The ranks of the nodes in an inﬁnite path form a non-increasing sequence. the automaton B may in principle have an inﬁnite number of states! In the ﬁnal step. . 1 0 q1 .2 Rankings and ranking levels Recall that. if and only if for some ranking R(w) every inﬁnite path of dag(w) visits nodes of odd rank i. 4 2 q1 . Figure 12. We now prove the fundamental property of rank ings: Proposition 12. called a rank. 3 0 q0. A ranking of dag(w) is a mapping R(w) that associates to each node of dag(w) a natural number.220 CHAPTER 12. (in R(w)) if and only if the run ρ visits states of B i. 0 a a 1 q0 . 2 b a a 0 q0 .o.o. we call this number the stable rank of the path. 12. given w ∈ alω .o. 3 0· · · q1. 3 ··· Figure 12. 1 0 q1 . However. and (b) the rank of an accepting node is even. the directly acyclic graph dag(w) is the result of bundling together the runs of A on w. BOOLEAN OPERATIONS: IMPLEMENTATIONS every path of dag(w) visits nodes of odd rank i. This immediately implies Lω (B) = Lω (A). 2 q0. 0 a a 1 q0 . 2 a a 1 q0 . Both have one single inﬁnite path with stable rank 1 and 0. respectively. 4 1 q0. and so there is a node such that all its (inﬁnitely many) successors have the same rank. 4 b 0 q1 . 1 0 q0.2 No path of dag(w) visits accepting nodes of A i.2.

if we represent a level ranking lr of our running example by the column vector lr(q0 ) . . l the rank 2k. For the other direction. and give a non-accepting nodes rank 2k + 1. In the ranking so obtained every inﬁnite path visits nodes of even rank only ﬁnitely often. Since accepting nodes have even ranks. a ∈ Σ. Any ranking r of dag(w) can be decomposed into an inﬁnite sequence lr1 . Give each accepting node q.2. and a rl → rl holds. and − → • lr(q) ≥ lr (q ) for every q ∈ Q satisfying q − q .o.o. then each of them contains only ﬁnitely many nodes with even rank. no path visits accepting nodes i. • The transitions are the triples (rl. For example. i ) if q. i is a node of dag(w).o. and lri (q) = ⊥ otherwise. we write lr → lr if for every q ∈ Q: • lr (q ) = ⊥ iff no q ∈ Q satisﬁes q − q . . of level rankings by deﬁning lri (q) = r( q. where 2k is the maximal rank of its descendants with even rank.12. where k is the maximal number of accepting nodes in the paths starting at q. Let R be the set of all ranking levels. COMPLEMENT 221 Proof: If all inﬁnite paths of a ranking R have odd stable rank. • The initial states are the levels rln deﬁned by: rln (q0 ) = n. l . rl ). assume that no path of dag(w) visits accepting nodes of A i. where rl and rl are ranking levels. and lrn (q) = ⊥ for every q q0 . a. lr2 . i .3 A (possible inﬁnite) complement automaton We construct an NBA B an inﬁnite number of states (and many initial states) whose runs on an ω-word w are the rankings of dag(w). − → a a a 12.2. . The automaton accepts a ranking R iff every inﬁnite path of R visits nodes of odd rank i.o. and therefore it visits nodes of odd rank i. . Recall that the i-th level of dag(w) is deﬁned as the set of nodes of dag(w) of the form q. We start with an automaton without any accepting condition: • The states are all the possible ranking levels.6 correspond to the sequences 2 ⊥ 1 ⊥ ⊥ 2 1 0 1 ⊥ 0 ⊥ 1 0 0 0 ω ω For two level rankings lr and lr and a letter a ∈ Σ. lr(q1 ) then the rankings of Figure 12.

• The transitions are the triples [rl. since ranks cannot increase along paths. Assume w is rejected by A. We decorate the ranking u levels with a set of ‘owing’ states. We get the B¨ chi automaton B: u • The states are all pairs [rl. In the sequel. the levels with an empty set of ‘owing’ states. Proposition 12.3 Let n be the number of states of A. a .o. then we can eliminate all states corresponding to ranking levels in which some node is mapped to a number larger than k: they are redundant. {q0 }] where lr(q0 ) is an even number and lr0 (q) = ⊥ for every q q0 . and of the form [lr. or – O = ∅ and O = {q ∈ Q | lr (q ) is even }. we denote by D ⊆ D the fact that D can be obtained from D through deletion of some nodes and their adjacent edges.. u and so iff A does not accept w. 0 has rank at most 2n. • The initial states are all pairs of the form [lr. The remaining problems with this automaton are that its number of states is inﬁnite. O]δa[rl .o. B accepts a ranking iff it contains inﬁnitely many breakpoints. and (b) the initial node q0 .222 CHAPTER 12. the initial state is now ﬁxed: it is the level ranking that maps q0 to k and all other states to ⊥. • The accepting states (breakpoints) are the pairs [rl. given two s D. For every word w ∈ Σω . 0 has rank 2n. Moreover. namely those that owe a visit to a state of odd rank. 0 has rank k. every node has rank at most k.) If we are able to prove this. if dag(w) admits an odd ranking.e. Now we apply the same construction we used for determinization of co-B¨ chi automata. ∅]. a) | lr (q ) is even }. and take as accepting states the breakpoints i. O ] if lr → lr and – O ∅ and O = {q ∈ δ(O. Proof: In the proof we call a ranking satisfying (a) an odd ranking.. BOOLEAN OPERATIONS: IMPLEMENTATIONS The runs of this automaton on w clearly correspond to the rankings of dag(w). since the change preserves the properties of a ranking. O]. D . and that it has many initial states. ∅]. then it admits an odd ranking whose initial node q0 . (Notice that.. if w is rejected by A then dag(w) has a ranking such that (a) every inﬁnite path of dag(w) visits nodes of odd rank i. where rl is a ranking level and O is a subset of the states q for which rl(q) ∈ N. this is the case iff every inﬁnite path of dag(w) visits nodes of odd rank i. where lr(q0 ) is an odd number and lr0 (q) = ⊥ for every q q0 . We construct an odd ranking in which q0 . As we saw in the construction for co-B¨ chi automata. Then we can just change the rank of the initial node to 2n. Both can be solved by proving the following assertion: there exists a number k such that for every word w.

Observe also that the children of a red node are red. of s as follows: • D0 = dag(w). . 4 D0 q0. yellow nodes are not accepting. 1 a q0. 0 a D2 q1. l | q. We inductively deﬁne an inﬁnite sequence D0 ⊇ D1 ⊇ D2 ⊇ . . . l is yellow in D iff all the nodes reachable from q.7: The s D0 . 4 ··· D1 q0. 1 q0 . 0 a a q0. 4 ··· q1 . 3 a q0 . 3 q0. l is yellow in D2i+1 }. . l is red in a (possibly ﬁnite) D ⊆ dag(w) iff only ﬁnitely many nodes of D are reachable from q. l is red in D2i 2i + 1 if q. l is red in D2i }. l is yellow in D2i+1 We prove that f is an odd ranking. . Figure 12. D1 . D1 . We describe an odd ranking for dag(w). We say that a node q. and the children of a yellow node are red or yellow. • D2i+2 = D2i+1 \ { q. 1 Figure 12. COMPLEMENT 223 Assume that A rejects w. 2n]. • D2i+1 = D2i \ { q. In particular. l (including itself) are not accepting.2. 2 b q1. l . q0. 0 a q0 . D2 for dag(abaω ) Consider the function f that assigns to each node of dag(w) a natural number as follows: f ( q.7 shows D0 . 1 b q1. l ) = 2i if q. and D2 for dag(abaω ). The proof is divided into three parts: (1) f assigns all nodes a number in the range [0 .12. 3 a a q0 . D3 is the empty dag. 2 a a q1. l | q. The node q.

by the assumption. q2 . then f ( q. Let q0 .224 CHAPTER 12. . For all k ≥ 1. it is not red in D2i . and so not accepting. which contradicts the assumption that A rejects w. D2i+2 is empty as well. l + 2 . {q}) and B2 = ({q}. inﬁnitely many nodes of D2i are reachable from q. Nodes that get an odd rank are yellow at D2i+1 for some i. . corresponds to a path in dag(w) visiting inﬁnitely many accepting nodes. q1 . We show that for every i ≥ 0 there exists a number li such that for all l ≥ li . So. and the claim is proved. Assume now that the hypothesis holds for i. and so qk . l1 is also not yellow. every node of D2i+1 has at least one child. . l j such that for all i the node q j . l j . . Part (1). l . l + 1 . l . δ. Therefore. This implies that D2n is ﬁnite. and the children of a yellow node in D2i+1 are yellow.4 We construct the complements A1 and A2 of the two possible NBAs over the alphabet {a} having one state and one transition: B1 = ({q}. {q}. l ) ≤ f ( q. l is yellow. {a}. . Moreover. inﬁnitely many nodes of D2i are reachable from qk . and we are done. Thus. l j+1 is a child of q j . l1 reachable from q1 . j in D2i+2 is strictly smaller than their number in D2i . by way of contradiction. l1 . Example 12. It follows that for all j ≥ l the number of nodes of the form q. ∅). then D2i+1 is empty. l . we prove it for i + 1. {a}. Follows from the fact that the children of a red node in D2i are red. the path q. . then f ( q . exists also in D2i+1 . Therefore. l j . We claim that we can take li+1 = l. q1 . Since D2i is inﬁnite. Consider the D2i . Assume. We claim that D2i+1 contains some yellow node. l1 be a child of q0 . Recall that q. and so that D2n+1 is empty. l . it is not yellow. since D2i+1 is obtained by removing all red nodes from D2i . . q1 . D2i contains an inﬁnite path q. By the assumption. l . l j is accepting. and q j+1 . Part(2). l is an accepting node. l be a yellow node in D2i+1 . l0 . l0 . l + 2 . Hence. l ) is even. l ). let q. is n − i. l + 1 . {q}. l j . q j . l . l is in D2i+1 . which. and so there exists an accepting node q1 . l + k is not red in D2i . however. by the induction hypothesis. . in dag(w) all levels l ≥ 0 have at most n nodes of the form q. By K¨ nig’s o Lemma. Since q. the D2i contains at most n − i nodes of the form q. We can thus construct an inﬁnite sequence of nodes q j . Let q1 . there exists an accepting node q0 .. So assume that D2i is inﬁnite. we claim that for every j ≥ l the dag D2i+2 contains at most n − (i + 1) nodes of the form q. So there are at most n − (i + 1) nodes of the form q. all its successors have rank at most i or lower. Therefore. reachable from q j . l is a child of q. l + k in the path are yellow as well. (3) If q. D2i+1 is also inﬁnite. 2n]. If D2i is ﬁnite. being reachable from q. if a node has rank i. Since. j . l0 reachable from q0 . . Such a sequence. Part(3). The case where i = 0 follows from the deﬁnition of G0 : indeed. l . BOOLEAN OPERATIONS: IMPLEMENTATIONS (2) If q . which in turn implies that f assigns all nodes a number in the range [0 . j in D2i+2 . that is. q2 . l0 be an arbitrary node of D2i+1 . The proof is by an induction on i. all the nodes qk . and the claim is proved. they are not in D2i+2 . that no node in D2i+1 is yellow. δ. l + k .

{q}]. {q} . The only accepting state is [1. R denotes the set of all level rankings. we can identify lr and lr(q)). Since δ(q. {q}] is [0. where lr is the rank of node q (since there is only one state. we have lr 1. Let us compute the successors of 2. but not in B2 . {q}] has three successors: [2. So either lr = 0 or lr = 2. {q} are 2. and so it recognizes the empty language. {q} . [1. since the visit to the node of odd rank is still ‘ owed”. ∅] and [0. O be a successor. B1 is shown on the left of Figure 12. and [0. {q} a 1. because q has even rank and so it “owes” a visit to a node of odd rank. {q} is 0. {q}]. and. O .2. Recall also that lr → lr holds if for every q ∈ Q: lr (q ) = ⊥ iff no q ∈ Q satisﬁes a a q − q . We have lr ⊥ and lr 1 as before. Let us now construct B2 . {q} under the letter a. We have Lω (A1 ) = aω and Lω (A2 ) = ∅.8. Let lr . B1 has no accepting states. a) = {q}. O of 0. ∅] are [1. and lr0 denotes the level ranking given by lr(q0 ) = 2|Q| and lr(q) = ⊥ for a every q q0 . COMPLEMENT 225 where δ(q. In the code. and lr(q) ≥ lr (q ) for every q ∈ Q satisfying q − q . {q}]. The initial state is 2. since ranks cannot increase a long a path. The diﬀerence with B2 is that. ∅]. the only successor of 0. {q} a Figure 12. The successors of [1. {q} and 0. and B2 recognizes aω . The only diﬀerence between B1 and B2 is that the state q is accepting in B1 . which implies O = {q}. we have lr ⊥. {q} a a 0. So lr = 0. we also have lr 2. a) = {q}. − → − → . and since q is accepting.8: The NBAs B1 and B2 accepting. {q}].12. {q} . and the only successor of [0. A state of B1 is a pair lr. it can also have odd rank 1. since q is no longer a a 2. The pseudocode for the complementation algorithm is shown below. In both cases the visit to a node of odd rank is still “owed’. So [2. We begin with B1 . {q} 2. ∅ a a a 0. ∅]. {q} . but now. Consider now the successors lr . Since the set of owing states is never empty. So the successors of 2.

By the deﬁnition of G(w). we have introduced an extra log n factor in the exponent with respect to the subset construction for automata on ﬁnite words. (2) If An accepts w. . . #}. Since a level ranking is a mapping lr : Q → {⊥} ∪ [0. So A has at most (2n + 2)n · 2n ∈ nO(n) states. .226 CHAPTER 12. . Let u An be the automaton shown in Figure 12. P ]) to δ if [lr . Ln is recognized by a B¨ chi automaton with n + 2 states. . and use ai j +1 to move to qi j +1 . F) Complexity. P] to F a for all a ∈ Σ. δ. then use ai j to move to r. . n and there is an edge from i to j if w contains inﬁnitely many occurrences of the word i j. {q0 }] W ← { [lr0 . The next section shows that this factor is unavoidable. . stay there until the next occurrence of the word ai j ai j +1 in w. δ. We associate to a word w ∈ Σω the following directed graph G(w): the n nodes of G(w) are 1. r is visited inﬁnitely often. add [lr.2.4 The size of A We exhibit a family {Ln }n≥1 of inﬁnitary languages such that Ln is accepted by an automaton with n + 2 states and any B¨ chi automaton accepting the complement of Ln has at least n! ∈ 2Θ(n log n) u states. . . lr ∈ R such that lr → lr do if P ∅ then P ← {q ∈ δ(P. We construct an accepting run of An by picking qi1 as initial state and iteratively applying the following rule: If the current state is qi j . P]. 2n]. We show that An accepts Ln . P ] to W return (Q. Let Σn = {1. q0 . Deﬁne Ln as the language of inﬁnite words w ∈ Aω for which G(w) has a cycle and deﬁne Ln as the complement of Ln . Since nO(n) = 2O(n·log n) . and so w is accepted. q0 . δ. Σ. Choose a cycle ai1 ai2 . P ] Q then add [lr . aik ai1 of G(w). q0 . . a. (1) If w ∈ Ln . P] to Q if P = ∅ then add [lr. a) | lr (q) is even } else P ← {q ∈ Q | lr (q) is even } add ([lr. Σ. then w ∈ Ln . P] from W. Let n be the number of states of A. δ. n. there are at most (2n + 2)n level rankings. F) with Lω (A) = Lω (A) 1 2 3 4 5 6 7 8 9 10 11 12 Q. BOOLEAN OPERATIONS: IMPLEMENTATIONS CompNBA(A) Input: NBA A = (Q. . {q0 }] } while W ∅ do pick [lr. We ﬁrst show that for all n ≥ 1. F) Output: NBA A = (Q. 12.9. then An accepts w. F ← ∅ q0 ← [lr0 . [lr . Σ.

2. . qn }. . # r # h 227 Figure 12. an . .5 For all n ≥ 1. . in other words. τ(a2 ). . Now. . respectively. Assume q ∈ inf(ρ) ∩ inf(ρ ). . . it cannot stay in any of the qi forever.12. G(w) contains a cycle. . . which proves the Proposition. and so w ∈ Ln . and so G((τ#)ω ) is acyclic. . for every qi ∈ Qρ there is q j ∈ Qρ such that ai a j appears inﬁnitely often in w. . . . . . . a2 . # qn an an a1 . . an . . there are i and j in {1. Therefore. τ(an−1 ). G(w) has a path from i to j. every NBA recognizing Ln . a2 . Proposition 12. an . Proof: We need some preliminaries. . . . τ be two arbitrary permutations of u (1. G(w) has a path from j to i. a2 . . a1 . τ(a3 ) . . τ(a2 ) . . # q1 a1 a1 a2 a1 . . . We make two observations: (a) (τ#)ω ∈ Ln for every permutation τ. COMPLEMENT a1 . The edges of G((τ#)ω ) are τ(1). a2 . we identify τ and the word τ(1) . . or. . (b) If a word w contains inﬁnitely many occurrences of two diﬀerent permutations τ and τ of 1 . . . By (a). . . n). # q2 a2 . . . τ(n) of 1. We proceed by contradiction. This implies that A contains at least as many ﬁnal states as permutations of (1. . n . . We build an accepting run ρ by “combining” ρ and ρ as follows: . and so w ∈ Ln . . and so for each qi ∈ Qρ there is q j ∈ Qρ such that the sequence qi rq j appears inﬁnitely often in ρ. such that (ai . τ(n). . and let τ. .9: The automaton An Let ρ be a run of An accepting w. . . Since τ and τ are diﬀerent. . let A be a B¨ chi automaton recognizing Ln . has at least n! states. n. . . an . Since w contains inﬁnitely many occurrences of τ. So G(w) contains a cycle. . Since Qρ is ﬁnite. n} such that i precedes j in τ and j precedes i in τ . . a j ) ∈ G(w). . then w ∈ Ln . We prove that the intersection of inf(ρ) and inf(ρ ) is empty. Since ρ is accepting. and let Qρ = inf (ρ) ∩ {q. . Since it contains inﬁnitely many occurrences of τ . Given a permutation τ = τ(1). τ(an ) . there exist runs ρ and ρ of A accepting (τ#)ω and (τ #)ω . . n).

BOOLEAN OPERATIONS: IMPLEMENTATIONS (0) Starting from the initial state of ρ. the DBA is at the state (Q . (3) Go to (1). . By (b). Which is the relation between the languages recognized by A as a deterministic Rabin automaton and as a deterministic Streett automaton? Exercise 95 Consider B¨ chi automata with universal accepting condition (UBA): an ω-word w u is accepted if every run of the automaton on w is accepting. • O is the subset of states of Q that “owe” a visit to a ﬁnal state of the UBA. O) of sets of the UBA where O ⊆ Q is a set of “owing” states (see below). the DBA checks that every path of dag(w) visits some ﬁnal state inﬁnitely often. Fm−1 . Recall that automata on ﬁnite words with existential and universal accepting conditions recognize the same languages. (1) Starting from q.228 CHAPTER 12. { F0 . Exercises Exercise 92 Show that for every DBA A with n states there is an NBA B with 2n states such that Lω (B) = Lω (A). Gm−1 }) be deterministic. and having read at least once the word τ . The states of the DBA are pairs (Q . without going through B¨ chi automata. follow ρ until having gone through a ﬁnal state. follow ρ until having gone through a ﬁnal state.) Hint: On input w. O) given by: • Q is the set of states reached by the runs of the UBA on w .. (2) Starting from q. q0 . u Exercise 94 Let A = (Q. G0 . . then go back to q (always following ρ ). . then go back to q (always following ρ). i. . the transition relation is deﬁned to satisfy the following property: after reading a preﬁx w of w. Exercise 93 Give algorithms that directly complement deterministic Muller and parity automata. Loosely speaking.) u . and having read at least once the word τ. (See the construction for the complement of a B¨ chi automaton. (This implies that the ω-languages recognized by UBAs are a proper subset of the ω-regular languages.e. this word belongs to Ln . δ. Σ. contradicting the assumption that A recognizes Ln . go to q following the run ρ. if every run of the automaton on w visits ﬁnal states inﬁnitely often. The word accepted by ρ contains inﬁnitely many occurrences of both τ and τ . Prove that is no longer the case for automata on ω-words by showing that for every UBA there is a DBA automaton that recognizes the same language.

then r is a successor of a and q is a predecessor of r. in this case. . . δ. A path is a sequence q0 . but check u for emptiness while constructing it. n − 1}. . We write q r to denote that there is a path from q to r. then one of them is selected. followed by a nonempty cycle qi qi+1 . . 13. . More precisely. qi . qn of states such that qi+1 is a successor of qi for every i ∈ {0. we say that the path leads from q0 to qn . . . a. We are interested in emptiness checks that on input A report EMPTY or NONEMPTY. and at least one of {qi . . Notice that a path may consist of only one state. . qi+1 . qn−1 qi . the path is empty. q0 . q ) ∈ Q × Q | (q. If (q. Clearly. then r becomes the current state. If the transition leads to a not yet discovered state r. . we assume the existence of an oracle that.Chapter 13 Emptiness check: Implementations We present eﬃcient algorithms for the emptiness check. . . provided with a state q returns the set δ(q). i. . . in this Chapter we redeﬁne δ from a subset of Q × Σ × Q into a subset of Q × Q as follows: δ := {(q. . as a witness of nonemptiness.1 Algorithms based on depth-ﬁrst search We present two emptiness algorithms that explore A using depth-ﬁrst search (DFS). qn−1 qn such that qn = qi for some i ∈ {0. and leads from a state to itself.. and in the latter case return an accepting lasso. Σ. . we are interested u in on-the-ﬂy algorithms that do not require to know the B¨ chi automaton in advance. . . . qn−1 } is accepting. A is nonempty if it has an accepting lasso. We start with a brief description of depth-ﬁrst search and some of its properties.e. a path q0 q1 . A depth-ﬁrst search (DFS) of A starts at the initial state q0 . A cycle is a path that leads from a state to itself. q ) ∈ δ for some a ∈ Σ} Since in many applications we have to deal with very large B¨ chi automata. . The lasso consists of a path q0 . We need a few graph-theoretical notions. F). q1 . r) ∈ δ. We ﬁx an NBA A = (Q. n − 1}. If all of q’s outgoing transitions have been 229 . . . If the current state q still has unexplored outgoing transitions. Since transition labels are irrelevant for checking emptiness.

In the rest of the chapter. dfs(r) t ← t + 1. led to the discovery of q. we omit the instructions for computing the timestamps. The ﬁrst one. Σ. this state becomes the current state. we always have S = Q after termination. Σ. then the search “backtracks” to the state from which q was discovered. Each state q is assigned two timestamps. together with timestamps for the states.1 shows an example. While timestamps are not necessary for conducting a search itself.. d[q]. namely the one that.e. q0 . It is easy to modify DFS so that it returns a DFS-tree. Figure 13. which we call DFS Tree is shown above. δ.230 CHAPTER 13. The algorithm. EMPTINESS CHECK: IMPLEMENTATIONS explored. T. we can assume that the timestamps are integers ranging between 1 and 2|Q|. Here is a pseudocode implementation (ignore the algorithm DFS Tree for the moment). and r is a descendant of q (in the tree). Moreover. because we do not ﬁx the order in which the states of δ(q) are examined by the for-loop. It is well-known that the graph with states as nodes and these transitions as edges is a tree with root q0 . δ. d. t ← 0 3 dfs(q0 ) 4 5 6 7 8 9 10 11 proc dfs(q) t ← t + 1. after termination every state q q0 has a distinguished input transition. and just assume they are there. DFS(A) Input: NBA A = (Q. i. q0 . every state of an automaton is reachable from the initial state. by hypothesis. Since we are only interested in the relative order in which states are discovered and ﬁnished. many algorithms based on depth-ﬁrst search use them for other purposes1 . 1 . f ) 1 S ←∅ 2 T ← ∅. when explored by the search. d[q] ← t add q to S for all r ∈ δ(q) do if r S then add (q. then we say that q is an ascendant of r. and the second. F) Output: Time-stamped tree (S . Since. If some path of the DFS-tree leads from q to r. and in order to present the algorithms is more compact form. records when q is ﬁrst discovered. The process continues until q0 becomes the current state again and all its outgoing transitions have been explored. r) to T . called a DFS-tree. records when the search ﬁnishes examining the outgoing transitions of q. f [q]. f [q] ← t return Observe that DFS is nondeterministic. F) 1 S ←∅ 2 dfs(q0 ) 3 4 5 6 7 proc dfs(q) add q to S for all r ∈ δ(q) do if r S then dfs(r) return DFS Tree(A) Input: NBA A = (Q.

q0 q1 q2 q3 q4 q6 [1. and a possible run of DFS Tree on it. and I(q) I(r) denotes that f [q] < d[r] holds. q is white. Both follow easily from the fact that a procedure call suspends the execution of the caller. .1: An NBA (the labels of the transitions have been omitted). nor r is a descendant of q.. It is easy to see that at all times the grey states form a path (the grey path) starting at q0 and ending at the state being currently explored.1. moreover.12] q0 [2. We recall two important properties of depth-ﬁrst search. or black. The numeric intervals are the discovery and ﬁnishing times of the states. or • I(q) • I(r) I(r). f [q] ]. A state q is white during the interval [ 0.7] q4 q6 [3. nor r is a descendant of q. Theorem 13. f [q]]. or black if all its outgoing edges have been explored. this path is always part of the DFS-tree. and black during the interval ( f [q]. • I(q) ⊆ I(r) and q is a descendant of r. 2|Q| ]. i.1 (Parenthesis Theorem) In a DFS-tree. and neither q is a descendant of r. grey if it has already been discovered but still has unexplored outgoing edges.13.8] q3 [6. grey. loosely speaking. exactly one of the following four conditions holds. and neither q is a descendant of r. for any two states q and r. ALGORITHMS BASED ON DEPTH-FIRST SEARCH 231 In our analyses we also assume that at every time point a state is white.9] q2 [5. grey during the interval ( d[q]. So.10] Figure 13. which is only resumed after the execution of the callee terminates.11] q1 [4. where I(q) denotes the interval ( d[q]. d[q] ]. shown in the format [d[q]. or • I(r) ⊆ I(q) and r is a descendant of q. at the state q such that dfs(q) is being currently executed. if it has not been yet discovered.e. f [q] ]. or I(q).

q s and s q hold. We prove that s q. we have I(q) ⊆ I(s). and for cycles in the second. More precisely. • q s. because the exploration cannot return any cycle containing r.2 (White-path Theorem) In a DFS-tree. Proof: Let π be a path leading from q to r.1 The nested-DFS algorithm To determine if A is empty we can search for the accepting states of A. but also to sort them. i.. We claim that it is not necessary to explore the successors of s again. and this search discovers some state s that had already been discovered by the search starting at q. searching for accepting ı states in the ﬁrst. r is a descendant of q (and so I(r) ⊆ I(q)) if and only if at time d[q] state r can be reached from q in A along a path of white states. The nested-DFS algorithm runs in time O(n+m) by using the ﬁrst phase not only to discover the reachable accepting states. The ﬁrst phase is carried out by a DFS. then some cycle of A contains q. we claim that s r. we obtain a O(n2 + nm) bound. A na¨ve implementation proceeds in two phases. 13. which implies that some cycle of A contains q. . then at time d[q] the path π is white. The proof of the claim is based on the following lemma: Lemma 13. and so it is useless to explore the successors of s. conducting the search in this order avoids repeated visits to the same state. By the deﬁnition of s. So either I(q) ⊆ I(s) or I(s) I(q). Suppose we proceed with a search from another state r (which implies f [q] < f [r]). which proves the claim. which implies s q. By deﬁnition we have d[s] ≤ d[q]. • s q.e. we have I(r) ⊆ I(s). we have d[s] ≤ d[q]. and the accepting states are sorted by increasing ﬁnishing (not discovery!) time. and so I(r) ⊆ I(q). But I(r) ⊆ I(s) and I(s) I(q) contradict f [q] < f [r].3 If q r and f [q] < f [r] in some DFS-tree. The searches of the second phase are conducted according to the order determined by the sorting. and check if at least one of them belongs to a cycle. If s = q. Since at time d[s] the subpath of π leading from s to r is white. We claim that I(s) I(q) is not possible. This is known as the postorder induced by the DFS. contradicting f [q] < f [r].1. because s belongs to π. Assume that in the second phase we have already performed a search starting from the state q that has failed. and since s q. As we shall see. The runtime is quadratic: since an automaton with n states and m transitions has O(n) accepting states. and since searching for a cycle containing a given state takes O(n + m) time. • q s. and let s be the ﬁrst node of π that is discovered by the DFS. and hence q is a descendant of s. no cycle of A contains q.232 CHAPTER 13. EMPTINESS CHECK: IMPLEMENTATIONS Theorem 13. Obvious. Since I(s) I(q) is not possible.

and the DFS-tree displayed satisﬁed f [q1 ] = 11 < 12 = f [q0 ]. Notice that the search backtracks after exploring 1. f [q1 ] < .3. The running time of the algorithm can be easily determined.. qi−1 . . because the state q2 was already visited by the previous search. The search from q2 explores the transitions labelled by 2. say q. namely the cycle q1 q6 q0 . When the second phase answers NONEMPTY. . To prove the claim.2 shows the transitions explored during the searches of the second phase. Since f [q] < f [r]. . As guaranteed by lemma 13.1. Let q1 .5 reaches state q1 . i.1 contains a path from q1 to q0 . perform a DFS from the state qi . . This guarantees the correctness of the following algorithm: • Perform a DFS on A from q0 . During the searches of the second phase each transition is explored at most once. . q1 .5 We apply the algorithm to the example of Figure 13. The algorithm we have described is not good for this purpose. . by Lemma 13. during the second phase we only need to explore a transition at most once. Deﬁne the DFS-path of a state as the unique path of the DFS-tree leading from the initial state to it.3 some cycle of A contains q. it suﬃces to output an accepting state immediately after blackening it. because transition 1. < f [qk ]. .1. in the order q2 . contradicting the assumption that the search from q failed. we assume that s r holds and derive a contradiction. and so a cycle containing q1 has been found. .1. 1. The search outputs the accepting states in postorder. . ALGORITHMS BASED ON DEPTH-FIRST SEARCH 233 Example 13. q0 . The ﬁrst DFS requires O(|Q| + |δ|) time.13.. • For i = 1 to k. 2 . is an accepting cycle. • If none of the searches from q1 . .3.e. . . namely when its source state is discovered for the ﬁrst time.1. . This search is successful. Assume that the ﬁrst DFS runs as in Figure 13.4 The NBA of Figure 13. Since s was already discovered by the search starting at q. – If the search visits qi . the DFS-path of the state being currently explored. then the search backtracks. qk returns NONEMPTY. and so q r. Nesting the two searches Recall that we are looking for algorithms that return an accepting lasso when A is nonempty.5. 2. Figure 13. 2. . return EMPTY. and output the accepting states of A in postorder2 . Hence. and so they can be executed together in O(|Q| + |δ|) time. some cycle contains q1 .1. but Notice that this does not require to apply any sorting algorithm. we have q s. it stops and returns NONEMPTY. The search from q1 explores the transitions 1. . .1. . with the following changes: – If the search visits a state q that was already discovered by any of the searches starting at q1 . qk be the output of the search.e. Example 13.2. i. .

A variable seed is used to store the state from which the second DFS is launched.5 CHAPTER 13. A pseudocode implementation is shown below. launch a new DFS from q. then it is stored at the hash address. due to Courcoubetis. • If the ﬁrst DFS terminates. stop with NONEMPTY. or a third phase is necessary. usually not an accepting lasso. So either the ﬁrst search must store the DFS-paths of all the accepting states it discovers..4 1. This problem can be solved by nesting the ﬁrst and the second phases: Whenever the ﬁrst DFS blackens an accepting state q. for clarity. Wolper. will be identiﬁed by the second phase as belonging to an accepting lasso. or both.2: The transitions explored during the search starting at qi are labelled by the index i. Instead. the program on the left does not include the instructions for returning an accepting lasso. • Whenever the search blackens an accepting state q. since the ﬁrst phase cannot foresee the future. only [q.2 q6 1.e. and Yannakakis: • Perform a DFS from q0 . We obtain the nested-DFS algorithm. Notice that it is not necessary to store states [q. . and two extra bits are used to store which of the following three possibilities hold: only [q. The search starting at q1 stops with NONEMPTY. when a state q is discovered. if any. For an accepting lasso we can preﬁx this path with the DFS-path of q obtained during the ﬁrst phase.1 q2 2.3 1. Vardi. 1] has ben discovered so far. Otherwise. if a state is encoded by a bitstring of length c. if it explores some transition leading to q). we immediately launch a second DFS to check if q is reachable from itself. 2]. it does not know which accepting state.3 q4 Figure 13. So. in which a new DFS-path is recomputed.2 q3 2. then the algorithm needs c + 2 bits of memory per state. 2] separately.234 q0 1. The instruction report X produces the output X and stops the execution. continue with the ﬁrst DFS.1 2. when the second DFS terminates. EMPTINESS CHECK: IMPLEMENTATIONS q1 1. However. 1] and [q. either during the ﬁrst or the second searches. output EMPTY. If this second DFS visits q again (i. The set S is usually implemented by means of a hash-table.

and that the . succ ← false 2 dfs1(q0 ) 3 report EMP proc dfs1(q) add [q. 1] to S for all r ∈ δ(q) do if [r. 2] S then dfs2(r) 15 return 10 11 12 13 proc dfs2(q) add [q. 1] to S for all r ∈ δ(q) do if [r. F) Output: EMP if Lω (A) = ∅ NEMP otherwise 1 S ←∅ 2 dfs1(q0 ) 3 report EMP proc dfs1(q) add [q. succ ← true 20 return 13 14 15 16 17 The algorithm on the right shows how to modify NestedDFS so that it returns an accepting lasso.e. 1] and return[q. Σ. 1] 12 return 4 5 6 7 8 proc dfs2(q) add [q. respectively. δ. It uses a global boolean variable succ (for success). q0 . If at line 11 the algorithm ﬁnds that r = seed holds. q0 . i. This causes procedure calls in dfs1(q) and dfs2(q) to be replaced by return[q. Let qk be an accepting state. dfs2(q) 11 if succ = true then return [q.1. initially set to false. 2] S then dfs2(r) if succ = true then return [q. 1] S then dfs1(r) if q ∈ F then { seed ← q. Asssume that dfs1(q0 ) discovers qk . 1] S then dfs1(r) if succ = true then return [q. The lasso is produced in reverse order. Σ.13. ALGORITHMS BASED ON DEPTH-FIRST SEARCH NestedDFS(A) Input: NBA A = (Q. F) Output: EMP if Lω (A) = ∅ NEMP otherwise 1 S ← ∅. 2] to S for all r ∈ δ(q) do if [r. it sets success to true. 2] 18 if r = seed then 19 report NEMP. 2]. 2] to S for all r ∈ δ(q) do if r = seed then report NEMP 14 if [r. dfs2(q) } 9 return 4 5 6 7 8 235 NestedDFSwithWitness(A) Input: NBA A = (Q. with the initial state at the end. 1] 9 if q ∈ F then 10 seed ← q. δ. A small improvement We show that dfs2(q) can already return NONEMPTY if it discovers a state that belongs to the DFSpath of q in dfs1..

δ.236 CHAPTER 13. qk−1 qk . We discuss these two points separately. since qk is accepting. it is an accepting lasso. the algorithm just needs two extra bits for each state of A. also has two important weak points: It cannot be extended to NGAs. ImprovedNestedDFS(A) Input: NBA A = (Q. a hash table. . . F) Output: EMP if Lω (A) = ∅. We do not need the variable seed anymore. NEMP otherwise 1 S ← ∅. . So we can use a set P to store the states of the path. add q to P for all r ∈ δ(q) do if [r. Apart from the space needed to store the stack of calls to the recursive dfs procedure. Σ. 2] to S for all r ∈ δ(q) do if r ∈ P then report NEMP if [r. that we do not need information about their order. . Then the path q0 q1 . and that the DFS-path of qi in dfs2 is qk qk+1 . qk−1 qk . 2] S then dfs2(r) return Evaluation The strong point of the the nested-DFS algorithm are its very modest space requirements. and implement P as e. A can easily have millions or tens of millions of states. q0 . In these cases. however. qk+l qi . in a formal sense deﬁned below. and it is not optimal. . 1] to S . Assume further that dfs2(qk ) discovers qi for some 0 ≤ i ≤ k − 1. Implementing this modiﬁcation requires to keep track during dfs1 of the states that belong to the DFS-path of the state being currently explored. So stopping with NONEMPTY is correct. P ← ∅ 2 dfs1(q0 ) 3 report EMP 4 5 6 7 8 9 10 11 12 13 14 15 16 proc dfs1(q) add [q. . however. EMPTINESS CHECK: IMPLEMENTATIONS DFS-path of qk in dfs1 is q0 q1 . . Notice. the two extra bits per state are negligible. qk+l qi is a lasso. The algorithm. and each state may require many bytes of storage. . In many practical applications. 1] S then dfs1(r) if q ∈ F then dfs2(q) remove q from P return proc dfs2(q) add [q. . because the case r = seed is subsumed by the more general r ∈ P.g. and.

1. which form a sub-NBA At = (Qt . q0 .3: Two bad examples for NestedDFS the algorithm chooses between the transitions (q0 .. and then checking if they belong to some cycle.. if it reports NONEMPTY at the earliest time t such that At contains an accepting lasso. dfs1 continues its execution with (q0 . since q0 has not been blackened yet. q1 q0 q2 • • • qn−1 qn q1 qn+1 q0 q2 • • • qn−1 qn Figure 13. a search-based algorithm can have only reported NONEMPTY at a time t if At contains an accepting lasso. but. If it selects (q0 . q2 ). A search-based algorithm for emptiness checking explores the automaton A starting from the initial state. q0 ) at some time t. Qt ⊆ Q. Ft ) of A (i. the algorithm has explored a subset of the states and the transitions of the algorithm. can be arbitrarily large. Σ.13. . and reports NONEMPTY. ALGORITHMS BASED ON DEPTH-FIRST SEARCH 237 The nested-DFS algorithm works by identifying the accepting states ﬁrst. and Ft ⊆ F)). Clearly. The automaton at the bottom of Figure 13. q1 ) (the algorithm does not know that there is a long tail behind q2 ). This principle no longer works for the acceptance condition of NGAs. where we look for cycles containing at least one state of each family of accepting states. i. and explores all transitions before dfs2 is called for the ﬁrst time. and the moment at which the report occurs. . Consider the automaton on top of Figure 13. q1 ) and (q0 . No better procedure than translating the NGA into an NBA has been described so far. δt ⊆ δ. The algorithm explores (q0 . and it succeeds. q1 ) and then (q1 . reporting the lasso q0 q2 . A search-based algorithm is optimal if the converse holds. . So the time elapsed between the ﬁrst moment at which the algoritm has enough information to report NONEMPTY. At each point t in time. q2 ). dfs2(qn ) is executed before dfs2(q0 ).5 shows another problem of NestedDFS related to non-optimality. δt . It is easy to see that NestedDFS is not optimal. For NGAs having a large number of accepting families. instead of the much shorter lasso q0 q1 q0 . The automaton At already contains an accepting lasso.e.5. then. Assume it chooses (q0 . qn qn+1 qn . since qn precedes q0 in postorder.e. q1 ) as ﬁrst transition. Initially. the translation may involve a substantial penalty in performance.

Let r be the last state of π after q such that all sttaes in the subpath from q to r are black. containing the states for which it is not yet known whether they belong to some cycle or not. By the Parenthesis theorem. If not. and checks whether they are accepting. all the transitions in this path have been explored at time f [q]. and we are done. EMPTINESS CHECK: IMPLEMENTATIONS In the next section we describe an optimal algorithm that can be easily extended to NGAs. If a state q belongs to some cycle of A. it already has explored enough to decide whether it belongs to a cycle: Lemma 13. and then checks if they belong to some cycle. the preﬁx of π from q to r.1. The two-stack algorithm proceeds the other way round: It searches for states that belong to some cycle of A by means of a single DFS. A ﬁrst observation is that by the time the DFS blackens a state.4). since all successors of r have q π π 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 r s Figure 13. At any time t.6 necessarily been discovered at time f [r]. While the state is grey. s is a DFS-ascendant of q. the candidates are the currently grey states that do not belong to any cycle of At . and consider the snapshot of the DFS at time f [q]. then π is a cycle of A f [q] . A state is added to C when it is discovered. .2 The two-stack algorithm Recall that the nested-DFS algorithm searches for accepting states of A. and so the cycle belongs to A f [q] This lemma suggests to maintain during the DFS a set C of candidates. Let π be the cycle obtained by concatenating the DFS-path from s to q. and the transition (r. The price to pay is a higher memory consumption. If it succeeds. and store it (apart from maintaining other data structures). 13. If r q. s).6 Let At be the sub-NBA of A containing the states and transitions explored by the DFS up to (and including) time t. If r = q. the new algorithm needs to assign a number to each state. As we shall see. We have f [r] < f [q] < f [s]. then the state is removed from C. Moreover. then the state is removed from C when it is blackened. we have d[s] < f [r] < f [q] < f [s]. the algorithm tries to ﬁnd a cycle containing it. let s be the successor of r in π (see Figure 13. Proof: Let π be a cycle containing q.238 CHAPTER 13. then it already belongs to some cycle of A f [q] .4: Illustration of the proof of Lemma 13. We have f [r] ≤ f [q]. By the Parenthesis Theorem.

if q has not been removed yet. all of q1 . Then the DFS explores (q5 . When exploring a successor r. and r q then the addition of (q. because. q1 are popped from C. So it suﬃces to inspect the top of the stack: if q is at the top. q4 belong to a cycle. otherwise q is not in the stack. and removing when they are blackened or earlier (if some cycle contains them). and C contains only q0 . q3 . and when it pops q0 it reports . But this leads to an incorrect result.5).6] q4 [4.5: A DFS on an automaton discoveries of q1 and q4 suggests to implement C using a stack: By pushing states into C when they are discovered. Then the algorithm checks if q has already been removed by inspecting the top of the stack (line 13). the algorithm pops all states from C. If r has not been discovered yet (i. After exploring (q4 .8] Figure 13. since q1 no longer belongs to C. and explores (q3 . Consider the DFS of Figure 13. if r has not been discovered yet then dfs(r) is called (line 7). the update to C after exploring a transition (q. if it does not belong to At ).10] q1 [2. r) does create new cycles. Now the DFS backtracks to state q3 . r). and that the DFS explores a new transition (q. The NBA below FirstAttempt shows. and if r q holds at the time (i. again no new cycle is created. If r belongs to At but no path of At leads from r to q. shown on the top left corner of Figure 13.6.5 we pop q4 .e.9] q2 [3. we pop it. that the algorithm needs to be patched. This leads to our ﬁrst attempt at an algorithm. q3 . q1 ) the states q4 . Then we have already learnt that both q and r belong to some cycle of A. The fact that these are the states discovered by the DFS between the [5. and we do nothing. r) can be performed very easily: it suﬃces to pop from C until r is hit (in Figure 13. q1 ). q2 . q2 . Otherwise the oracle is consulted (line 8). pushing q5 into the stack. q2 . q1 ) to the set of explored transitions at time 5. However. then states are popped from C until r is hit (lines 9-12). and its successors explored (lines 6-12). the addition of r and (q. and removes q if that is the case. But if r belongs to At . and q1 ). and the oracle answers ‘yes’. however. and the set C does not change.7] q3 q0 [1. since every state is removed at the latest when it is blackened.1. q3 . Let us assume we can ask an oracle whether r q holds. we may have to remove other states as well. (This is the case of state q0 in Figure 13..5: after adding (q4 .e. r) to At does not create any new cycle. Observe also that removing q from C when it is blackened does not require to inspect the complete stack. ALGORITHMS BASED ON DEPTH-FIRST SEARCH 239 Assume that at time t the set C indeed contains the current set of candidates. and pops from C until it hits q1 . in that order. in the part of A explored so far). and so both of them must be removed from C..13. We need to update C. q5 ). and the update just adds r to C. then it is necessarily at the top of C. When a state q is discovered it is pushed into C (line 5).

7] q3 [1.6] q4 [4. q0 . EMPTINESS CHECK: IMPLEMENTATIONS FirstAttempt(A) Input: NBA A = (Q. C ← ∅. F) Output: EMP if Lω (A) = ∅ NEMP otherwise 1 S .10] q2 [5.6] q4 [4.11] q1 [3. push(q. C) for all r ∈ δ(q) do if r S then dfs(r) else if r q then repeat s ← pop(C) if s ∈ F then report NEMP 12 until s = r 13 if top(C) = q then pop(C) 4 5 6 7 8 9 10 11 SecondAttempt(A) Input: NBA A = (Q.10] Figure 13. Σ. F) Output: EMP if Lω (A) = ∅ NEMP otherwise 1 S .6: Two incorrect attempts at an emptiness checking algorithm .12] q0 [2.8] q5 [5. C ← ∅.240 CHAPTER 13. C) 14 if top(C) = q then pop(C) 4 5 6 7 8 9 10 11 [7.9] q0 [1. 2 dfs1(q0 ) 3 report EMP proc dfs(q) add q to S .11] q2 [3.9] q3 q5 [8. Σ.12] q1 [2. push(q. δ. C) for all r ∈ δ(q) do if r S then dfs(r) else if r q then repeat s ← pop(C) if s ∈ F then report NEMP 12 until s = r 13 push(r. δ. q0 . 2 dfs(q0 ) 3 report EMP proc dfs(q) add q to S .

and {c. and q1 is pushed again. since q4 does not belong to C. and so it is empty. respectively.13. After exploring (q4 . Below the NBA at the top. This second attempt is shown at the top right of the ﬁgure. the ﬁgure shows diﬀerent snapshots of the run of OneStack. However. NEMP otherwise 1 S . again. But. while dark red states have already been blackened. f. i. The dotted states and transitions have not been discovered yet. This problem can be solved as follows: If the DFS explores (q. C ← ∅. the states q4 . C contains now q1 q0 . the algorithm pops from C until q4 is found. So we pop until either r or some state discovered before r is hit. pushing q5 . q2 . h) is the only accepting state. The DFS explores (q3 . This new patch leads to the OneStack algorithm: OneStack(A) Input: NBA A = (Q. Σ. However. q1 ) (with stack content q4 q3 q2 q1 q0 ). q0 . followed by (q5 . and pushes r back into the stack. q3 . push(q. in that order. At each snapshot. g}. The current content of stack C is shown on the right (ignore stack L for the moment). The NBA has no accepting states. F) Output: EMP if Lω (A) = ∅. Observe that the NBA has three sccs: {a. during the run we will see how the algorithm answers NONEMPTY (resp. and i. In the example. then it pops from C until r is popped. r) and r q holds. b. q4 ). q1 are popped from C. r) and r q holds. The discovery and ﬁnishing times are shown at the top. Since q4 q5 holds. and then pushes q1 back again.7 Figure 13.1. δ. the NBA below SecondAttempt shows that it is again incorrect. the algorithm pops q5 and q1 . q4 ) with stack content q5 q1 q0 . C) for all r ∈ δ(q) do if r S then dfs(r) else if r q then repeat s ← pop(C). q5 ) next. 2 dfs(q0 ) 3 report EMP 4 5 6 7 8 9 10 11 12 13 dfs(q) add q to S . we cannot be sure that r is still in the stack. the result is incorrect. the current grey path is shown in red/pink. after exploring (q5 .7 shows a run of OneStack on the NBA shown at the top. A patch for this problem is to change the condition of the repeat loop: If the DFS explores (q. with roots a. ALGORITHMS BASED ON DEPTH-FIRST SEARCH 241 NONEMPTY. {h}. EMPTY) when f (res. d. j}. and then we push this state back again. if s ∈ F then report NEMP until d[s] ≤ d[r] push(s. h. e. . C) if top(C) = q then pop(C) Example 13.

g. We have two proof obligations: • If A is nonempty. The states c. This is equivalent to: every state popped during the repeat loop belongs to some cycle. then A is nonempty. Since r q . EMPTINESS CHECK: IMPLEMENTATIONS • The ﬁrst snapshot is taken immediately before state j is blackened. If state f is accepting.8 If q belongs to a cycle. has popped the states c. which answers b. and no states are popped from C. The algorithm has just explored the transition ( f.11 below. g. Since r is grey or black at time d[q]. a have been identiﬁed as belonging to some cycle. let q be the last successor of q along π such that at time d[q] there is a white path from q to q . So h is never popped by the repeat loop. So when (q . r) is explored before q is blackened. • The ﬁnal snapshot is taken immediately before a is blackened and the run terminates. i). . and has pushed state i back. a from C. The algorithm has just explored the transition ( j. Proposition 13. either q has already been popped by at some former execution of the repeat loop. Observe that when the algorithm explores transition (b. • The third snapshot is taken immediately before state f is blackened. Observe that after this the algorithm backtracks to dfs(h).8 and 13. has popped the states f. then OneStack reports NONEMPTY. By the White-path Theorem. Proof: Let π be a cycle containing q. i from C. • The fourth snapshot is taken immediately after state e is discovered. r) is explored. then q is eventually popped by the repeat loop. and let r be the successor of q in π. • If OneStack reports NONEMPTY. This is equivalent to: every state that belongs to some cycle is eventually popped during the repeat loop.242 CHAPTER 13. d. and. b. q is a descendant of q. or it is popped now. but we still must prove it correct. it pops h from C at line 13. since state h is at the top of the stack. and has pushed state a back. The state has been pushed into C. i have been identiﬁed as belonging to some cycle. State a is going to be removed from C at line 13. j. c The algorithm looks now plausible. d. These properties are shown in Propositions 13. b. j. State i has been popped from C at line 13. at this point the algorithm reports EMPTY and stops. a). because d[r] ≤ d[q ]. • The second snapshot is taken immediately after state i is blackened. and so even if h is accepting the algorithm does not report NONEMPTY. The states f. we have d[r] ≤ d[q] ≤ d[q ]. q has not been popped at line 13. c) it calls the oracle. and so the transition (q . and the run terminates.

9] grey path black a e f a e f a e f a e f a e f b c d t =9− g b h i c j d a b g h a b g h i i j d c C V C V C V C t = 12 + g b h i c j d a b g h a b g h t = 15 − g b h i c j d a a b g f t = 16 g b h i c j d a b g e a b g f e V C t = 20 − g h i j a a b g f e V Figure 13.7: A run of OneStack .11] [7.1.18] h [4.19] [6.17] b e c d unexplored f [14.10] a [16.12] j [8. ALGORITHMS BASED ON DEPTH-FIRST SEARCH 243 [1.15] g [3.13.20] [2.13] i [5.

In the latter case we have ρ ρ q. then all states s popped during the execution of the loop satisfy d[ρ] ≤ d[s]. If a root belongs to C at line 9. if the loop removes a root. or ρ ρ and ρ is a DFS-ascendant of ρ. Since r ∈ δ(q) (line 6) and r q (line 8). r). The following lemma states an important invariant of the algorithm. A strongly connected component (scc) of A is a maximal set of states S ⊆ Q such that q r for every q. If ρ is popped. and so in particular we have d[ρ] ≤ d[ρ ] ≤ d[r]. 3 Notice that a path consisting of just a state q and no transitions is a path leading from q to q. Let ρ be the root of this scc. then the push instruction at line 12 reintroduces it again. and so both q and r belong to the same scc of A. r )). the repeat loop cannot remove a root from the stack.244 CHAPTER 13. which leads to: Corollary 13. Since the algorithm has reached line 10. So. Since ρ is also grey at time t. Let ρ be the root of this scc.8 proves not only that q is eventually popped by the repeat loop. starting from the top. more precisely. and so every state s popped before ρ satisﬁes d[ρ] ≤ d[s]. loosely speaking. So ρ is a DFS-ascendant of ρ . if q is the root of an scc. since d[ρ] ≤ d[r].11 Any state popped during the repeat loop (at line 10) belongs to some cycle. contradicting that ρ is the root of q’s scc. In other words. But this is precisely the optimality property. or earlier. If ρ belongs to C before an execution of the repeat loop at lines 911. . Lemma 13. Proof: Consider the time t right before a state s is about to be popped at line 10 while the for-loop (lines 6-12) is exploring a transition (q. the repeat loop pops q immediately after all transitions of π have been explored. then d[q] ≤ d[r] for every state r of the scc. and assume ρ belongs to C at time t. Proof: Let t be the time right before an execution of the repeat loop starts at line 9. Proposition 13. Since states are added to C when they are discovered. but also that for every cycle π containing q. and ρ still belongs to C after the repeat loop has terminated and line 12 has been executed. then.10 Let ρ be a root. and ρ is pushed again at line 12. the states of C are always ordered by decreasing discovery time. either ρ is a DFS-ascendant of ρ . the execution of the repeat loop terminates. ρ is still grey at time t. The property that every state popped during the repeat loop belongs to some cycle has a more involved proof. we have r q (line 8). and grey states always are a path of the DFS-tree. before the repeat loop is executed.3 The ﬁrst state of a reachable scc that is discovered by the DFS is called the root of the scc (with respect to this DFS). both q and r belong to the same scc. Since states are removed from C when they are blackened or earlier. We show that s belongs to a cycle. then it still belongs to C after the loop ﬁnishes and the last popped state is pushed back.9 OneStack is optimal. (Notice that the body of the for-loop may have already been executed for other transitions (q. EMPTINESS CHECK: IMPLEMENTATIONS Actually. r ∈ S . the proof of Proposition 13.

Since s belongs to C at time t. where a state is active if its scc has not yet been completely explored. both s and q are grey at time t. (2) ρ is a DFS-ascendant of s. q is the last state in the path. and so at least one state s ∈ R is grey. and so they belong to the current path of grey states. the grey path contains s and ends at q. Together with (1). By Lemma 13. At ﬁrst sight the oracle seems diﬃcult to implement. Assume that OneStack calls the oracle at line 8 at some time t. ALGORITHMS BASED ON DEPTH-FIRST SEARCH 245 (1) s is a DFS-ascendant of q. Let R be the scc of A satisfying r ∈ R. and is easy to check. and. r). By Lemma 13. Then I(q) ⊆ I(r). r) is being explored. Assume every state of R is black. Then r q iff some state of R is not black. By (1). if some state of the scc is not black. checking if r q holds amounts to checking if all states of R are black or not. since dfs(q) is being currently executed.12. and removing all the states of a scc right after the last of them is blackened. at time d[ρ] there is a white path from ρ to q. ρ is a DFS-ascendant of q. This can be done as follows: we maintain a set V of actiVe states. Then r and q belong to R.12 Assume that OneStack(A) is currently exploring a transition (q.. Lemma 13. Since grey states form a path ending at the state whose output transitions are being currently explored. The next lemma shows that the last of them is always the root: Lemma 13.1. We show that this is not the case. So s q. By the White-path and the Parenthesis Theorems. . (In words: The root is the ﬁrst state of a scc to be grayed. we have ρ s q r ρ. and the last to be blackened) Proof: By the deﬁnition of a root. and r q. We look for a condition that holds at time t if and only if r q. and since q is not black because (q. and let q be a state such that ρ q ρ.13. i. So s is a DFS-ascendant of q. checking r q reduces to checking whether r is active.13 Let ρ be a root. and so by the Parenthesis Theorem ρ is a DFS-ascendant of s.10 we have d[ρ] ≤ d[s].e. Proof: Assume r q. at time d[ρ] there is a white path from ρ to q. we have r q. since s and r belong to R. Not all states of R are white because r has already been discovered. and so s belongs to a cycle. R is not black. I(q) ⊆ I(r). By the Whitepath and Parenthesis Theorem. (2). and the state r has already been discovered. Then. this implies that either ρ is a DFS-ascendant of s or s is a DFS-ascendant of ρ. Implementing the oracle Recall that OneStack calls an oracle to decide at time t if r q holds. Since ρ is a root of the scc of q. The set V can be maintained by adding a state to it whenever it is discovered. Moreover.

these are the states c. we have d[ρ] ≤ d[q]. g.13. When the root i is blackened. or it removes it now. all states of V above it (including the root itself) are popped. and none of these states is pushed back at line 12. and maintaining it as follows: • when a state is discovered (greyed). when some other root ρ ρ is blackened at time f [ρ ]. So whenever the DFS blackens a state q. states f and e are discovered.10. i. We show that this cannot be the case. the state q does not belong to C and in particular q top(C) Tasks (2) can be performed very elegantly by implementing V as a second stack.14 When OneStack executes line 13. and so top(C) = q at line 13. Proof: Assume q is a root. either OneStack has already removed q from C. L contains all states in the grey path. when OneStack executes line 14 for dfs(q). which form a scc. unless it has been popped before. When OneStack explores (q . j. Assume now that q is not a root. Then. q still belongs to C after the for loop at lines 6-12 is executed. Proof: Let q be a state of ρ’s scc.246 CHAPTER 13. remove all the states of q’s scc. So q is popped when ρ is blackened. and (2) if q is a root. q is a descendant of q. r) is explored. State h is also a root. q is a root if and only if top(C) = q.7 shows the content of L at diﬀerent times when the policy above is followed. Then there is a path from q to the root of q’s scc. Finally. Checking if q is a root is surprisingly simple: Lemma 13. all states above it (including i itself).e. f. in order to maintain V it suﬃces to remove all the states of an scc whenever its root is blackened.15 Figure 13. which correspond to the third and last scc. and it is . r). when the root a is blackened. We show that the states popped when ρ is blackened are exactly those that belong to ρ’s scc. it pops all states s from C satisfying d[s] > d[r]. Right before state j is blackened. because q is in the stack at time f [ρ ] (implying d[q] < f [ρ ]). are popped. and pushed into V. Since ρ is a root.. a are popped. Let r be the ﬁrst state in the path satisfying d[r] < d[q]. and it is also popped. and so when transition (q. By the White-path theorem. b. q is not yet black. Lemma 13. By Lemma 13. and • when a root is blackened. In particular. and so q lies above ρ in L. Since q has not been blackened yet. EMPTINESS CHECK: IMPLEMENTATIONS By this lemma. We collect some facts: (a) d[ρ] ≤ d[q] ≤ d[q] ≤ f [ρ] by Lemma 13. i. we have to perform two tasks: (1) check if q is a root.16 The states popped from V right after blackening a root ρ are exactly those belonging to ρ’s scc. d. and let q be the predecessor of r in the path. Assume q is popped before ρ is blackened. (b) d[ρ ] ≤ d[q] < f [ρ ]. states c. Example 13. it is pushed into the stack (so states are always ordered in V by increasing discovery time).

k − 1}. . ALGORITHMS BASED ON DEPTH-FIRST SEARCH 247 above ρ in the stack (implying d[ρ ] ≤ d[q]).13. But then. C). and so belong to diﬀerent sccs. V ← ∅. The solution is to implement V both as a stack and use an additional bit in the hash table for S to store whether the state belongs to V or not. if s ∈ F then report NEMP until d[s] ≤ d[r] push(s. . Σ. From (a)-(c) and the Parenthesis Theorem we get I(q) ⊆ I(ρ ) ⊆ I(ρ).1. When the algorithm blackens a root (line 13). . Recall that a NGA has in general several sets {F0 . because q has not been popped yet at time f [ρ ]. where K = {0. δ. V) for all r ∈ δ(q) do if r S then dfs(r) else if r ∈ V then repeat s ← pop(C). 2 dfs(q0 ) 3 report EMP 4 5 6 7 8 9 10 11 12 13 14 15 proc dfs(q) add q to S . and that a run ρ is accepting if inf ρ ∩ Fi ∅ for every i ∈ {0. . . contradicting that ρ and ρ are diﬀerent roots. . So we have the following characterization of nonemptiness. push(q. and so ρ cannot have been blackened yet. and so in particular ρ ρ q. we get ρ ρ ρ. (c) f [ρ ] < f [ρ]. and q itself (line 15). because at line 8 we have to check if a state belongs to the stack or not. Observe that V cannot be implemented only as a stack. . The check at line 8 is performed by checking the value of the bit. C. k − 1}: . . which is possible because V ⊆ S holds at all times. . since q ρ. F) Output: EMP if Lω (A) = ∅. Fk−1 } of accepting states. Extension to GBAs We show that TwoStack can be easily transformed into an emptiness check for generalized B¨ chi u automata that does not require to construct an equivalent NBA. . . The oracle r q is replaced by r ∈ V (line 8). This ﬁnally leads to the two-stack algorithm TwoStack(A) Input: NBA A = (Q. C) if top(C) = q then pop(C) repeat s ← pop(V) until s = q The changes with respect to OneStack are shown in blue. push(q. . it pops from V all elements above q. NEMP otherwise 1 S . q0 .

. I)for some I then pop(C) repeat s ← pop(V) until s = q For the correctness of the algorithm. in particular. J] ← pop(C). F(q)] to S . it pops from L one scc. . F(q)]. . The proof shows that a state s popped at line 10 belongs to the ssc of state r. if I = K then report NEMP until d[s] ≤ d[r] push([s. δ. Σ. NEMP otherwise 1 S . Fk−1 }. If so. C) if top(C) = (q. observe that. V ← ∅.248 CHAPTER 13. C). and keep checking whether I = K. Fk−1 }) Output: EMP if Lω (A) = ∅. C. . at every time t.17 Let A be a NGA with accepting condition {F0 . V) for all r ∈ δ(q) do if r S then dfs(r) else if r ∈ V then I←∅ repeat [s. where F(q) denotes the set of all indices i ∈ K such that q ∈ Fi : TwoStackNGA(A) Input: NGA A = (Q. The key invariant for the correctness proof is the following: . . Since every time the repeat loop at line 15 is executed. the resulting algorithm would not be optimal. A is nonempty iff some scc S of A satisﬁes S ∩ Fi ∅ for every i ∈ K. This yields algorithm TwoStackNGA. {F0 . and each of these components has a root. we attach the set I to the state s that is pushed back into the C at line 12. I ← I ∪ J. EMPTINESS CHECK: IMPLEMENTATIONS Fact 13. To solve this problem. . Otherwise. So. . So we collect the set I of indices of the sets of accepting states they belong to. 2 dfs(q0 ) 3 report EMP 4 5 6 7 8 9 10 11 12 13 14 15 16 17 proc dfs(q) add [q. because the condition is not checked until the scc has been completely explored. However. we have a closer look at Proposition 13. . q0 . we can easily check this condition by modifying line 15 accordingly.11. then we report NONEMPTY. all the states popped during an execution of the repeat loop at lines 9-11 belong to the same scc. push(q. the states of teh subautomaton At can be partitioned into strongly connected components. push([q. I].

By Proposition 13. We show that this is exactly what the execution of lines 9-14 achieves. and let ρ be the state of NR with minimal discovery time.9. K]. then some scc S of A satisﬁes S ∩ Fi ∅ for every i ∈ K. r) are alsoroots of At . then the repeat loop at lines 10-13 pops some pair [q. r) has not changed the sccs of the automaton explored so far.1. The strong point of the the nested-DFS algorithm were its very modest space requirements: just two extra bits for each state of A. Both are strong points of the two-stack algorithm. F(r)] must be added to C. then it has no successors in At . Let t be the time immediately after (q.18. because they have discovery time smaller than or equal to d[r]. I] iff q is a root of At . For. By Lemma 13. We can now easily prove: Proposition 13. r) is explored. . let NR be the set of states of At that belong to a cycle containing both q and r. and some states stop being roots. We show that whenever a new transition (q.18. then r ρ. If r is a new state. r) creates new cycles. or are removed now at lines 9-14. the addition of (q. TwoStackNGA carries out the necessary changes to keep the invariant. with root r. and it is not optimal. ALGORITHMS BASED ON DEPTH-FIRST SEARCH 249 Lemma 13.19 TwoStackNGA(A) reports NONEMPTY iff A is nonempty. It remains to show that all states popped at line 11 belong to NR (that ρ is not removed follows then from the fact that the state with the lowest discovery time is pushed again at line 14. By Lemma 13. and we are done. r) is explored. all roots before the exploration of (q. we have a closer look at the proof of Proposition 13. So a new pair [r. Moreover. Then the algorithm must remove from C all states of NR with the exception of ρ (and no others). More precisely. and I is the subset of indices i ∈ K such that some state of Fi belongs to q’s scc. Proof: If TwoStackNGA(A) reports NONEMPTY. Let us examine the space needed by the two-stack algorithm. then the addition of (q. and so it is optimal. So there is an earliest time t such that At contains an scc S t ⊆ S satisfying the same property. because in this case both the nested-DFS and the two-stack algorithms must visit all states. this.13. q belongs to a cycle of A containing some state of Fi for every i ∈ K. and that state is ρ). all the states of NR \ {ρ} have already been removed at some former execution of the loop. and that is what dfs(r) does. and so it builds an scc of At by itself. but also that they all belong to cycles that containing q and r (see the last line of the proof). If r ∈ L. TwoStackNGA(A) reports NONEMPTY at time t or earlier. and nothing must be done.8 and Corollary 13.11. If r L. It is conveniet to compute it for empty automata. Proof: Initially the invariant holds because both At and C are empty. the stack C contains a pair [q. If A is nonempty. Evaluation Recall that the two weak points of the nested-DFS algorithm were that it cannot be directly extended to NGAs. TwoStackNGA is optimal. Moreover.18 At every time t. The proposition shows not only that the states popped by the repeat loop belong to some cycle.

however. In a forward search. making the algorithms competitive. However. q0 . because if the number states of A exceeds 2w . So the hash table S requires c + w + 1 bits per state (the extra bit being the one used to check membership in L at line 8). A BFS from a set Q0 of states (in this section we consider searches from an arbitrary set of states of A) initializes both the set of discovered states and its frontier to Q0 . Σ. the gain obtained by the use of the data structure more than compensates for the quadratic worse-case behaviour. The pseudocode implementations of both BFS variants shown below use two variables S and B to store the set of discovered states and the boundary. and so this section may look at ﬁrst sight superﬂuous. δ. This is done by extending the hash table S . and they become the next frontier. EMPTINESS CHECK: IMPLEMENTATIONS Because of the check d[s] ≤ d[r]. In most cases w << c. repeat B ← δ(B) \ S S ←S ∪B until B = ∅ S . Breadth-ﬁrst search (BFS) maintains the set of states that have been discovered but not yet explored. B ← Q0 .2 Algorithms based on breadth-ﬁrst search In this section we describe algorithms based on breadth-ﬁrst search (BFS). ForwardBFS[A](Q0 ) Input: NBA A = (Q. then log n bits are needed to store d[q]. Σ. So the two-stack algorithm uses a total of c + 3w + 1 bits per state (c + 3w + k + 1 in the version for NGA). often called the frontier or boundary. Ignoring hashing collisions. Q0 ⊆ Q 1 2 3 4 5 BackwardBFS[A](Q0 ) Input: NBA A = (Q. we must also add the k bits needed to store the subset of K in the second comu ponent of the elements of C. F). δ. A backward BFS proceeds similarly. where w is the number of bits of a word. The stacks C and L do not store the states themselves. repeat B ← δ−1 (B) \ S S ←S ∪B until B = ∅ . We assume the existence of oracles that. in practice d[q] is stored using a word of memory. then A cannot be stored in main memory anyway. B ← Q0 . and then proceeds in rounds. In many cases. the algorithm needs to store the discovery time of each state. the new states found during the round are added to the set of discovered states. BFSbased algorithms can be suitably described using operations and checks on sets. Q0 ⊆ Q 1 2 3 4 5 S . 13. q0 . compared to the c + 2 bits required by the nested-DFS algorithm. For generalized B¨ chi automata. and so the inﬂuence of the additional memory requirements on the performance is small. this requires 2w additional bits per state. return either δ(B) = q∈B δ(q) or δ−1 (B) = q∈B δ−1 (q). respectively. which allows us to implement them using automata as data structures. given the current boundary B. but explores the incoming instead of the outgoing transitions. If a state q can be stored using c bits. but the memory addresses at which they are stored. a round explores the outgoing transitions of the states in the current frontier. F). No linear BFS-based emptiness check is known.250 CHAPTER 13.

To compute L[n + 1] given L[n] we observe that a state is n + 1-live if some nonempty path leads from it to an n-live accepting state. the ﬁxpoint L[i] exists. we can represent δ ⊆ Q × Q by a ﬁnite transducer T δ .2. . a state is n-live if starting at it it is possible to visit accepting states n-times. there is a least index i ≥ 0 such that L[i + 1] = L[i]). and and B j+1 ∩ S j = ∅ by line 3. There is a path containing at least one transition that leads from q to an accepting state r ∈ L[n]. To prove (b). • a state q is (n + 1)-live if some path containing at least one transition leads from q to an accepting n-live state. let B1 . Clearly.. since Q is ﬁnite. δ) and Pre(B. So B j ∩ Bi = ∅ for every j > i.. the n-live states of A are inductively deﬁned as follows: • every state is 0-live. if in the course of the algorithm the oracle is called twice with arguments Bi and B j . . For every n ≥ 0. Proof: We prove (a) by induction on n. reaches a ﬁxpoint L[i] (i. ﬁrst notice that. δ). be the sequence of values of the variables B and S right before the i-th execution of line 3. Emerson-Lei’s algorithm computes the ﬁxpoint L[i] of the sequence L[0] ⊇ L[1] ⊇ L[2] . (b) The sequence L[0] ⊇ L[1] ⊇ L[2] . .1 Emerson-Lei’s algorithm A state q of A is live if some inﬁnite path starting at q visits accepting states inﬁnitely often. and reduce the computation of δ(B) and δ−1 (B) in line 3 to computing Post(B. ALGORITHMS BASED ON BREADTH-FIRST SEARCH 251 Both BFS variants compute the successors or predecessors of a state exactly once. L ⊆ L[i] for every i ≥ 0. . Loosely speaking. and so L[n + 1] = BackwardBFS( Pre(L[n] ∩ F) ) . Let L[n] denote the set of n-live states of A. We have Bi ⊆ S i by the invariant. and let q ∈ L[n + 1]. every state of L[i] has a proper descendant that is accepting and belongs to L[i]. The case n = 0 is trivial. respectively. B2 . i.e. Clearly. and use automata for ﬁxed-length languages to represent both S and B. .13. Now. We have: Lemma 13. respectively. Let L be the set of live states. To prove this in the forward case (the backward case is analogous). then Bi ∩ B j = ∅.2. Moreover.e. So L[i] ⊆ L. and that the value of S never decreases. But we can also take the set Q of states of A as ﬁnite universe. Moreover. and L[i] is the set of live states. r ∈ L[n − 1]. . .. respectively. observe that B ⊆ S is an invariant of the repeat loop. S i ⊆ S j for every j ≥ i. We describe an algorithm due to Emerson and Lei for computing the set of live states. and so q ∈ L[n]. since L[i] = L[i + 1].20 (a) L[n] ⊇ L[n + 1] for every n ≥ 0. Assume n > 0. S 1 . By induction hypothesis. 13. S 2 . As data structures for the sets S and B we can use a hash table and a queue. A is nonempty if and only if its initial state is live.

. δ. if q ∈ F and there is a transition (q. . . . q0 . For instance. Since each iteration takes O(|Q| + |δ|) time.21 GenEmersonLei(A) reports NEMP iff A is nonempty. then after line 4 is executed the state still belongs to L. . δ. L[2]. the variable L is used to store the elements of the sequence L[0]. the algorithm runs in O(|Q| · (|Q| + |δ|)) time. . NEMP otherwise 1 2 3 4 5 6 7 8 L←Q repeat OldL ← L L ← Pre(OldL ∩ F) L ← BackwardBFS(L) until L = OldL if q0 ∈ L then report NEMP else report NEMP L←Q repeat OldL ← L L ← Pre(OldL ∩ F) \ OldL L ← BackwardBFS(L) ∪ OldL until L = OldL if q0 ∈ L then report NEMP else report NEMP The repeat loop is executed at most |Q| + 1-times. Emerson-Lei’s algorithm can be easily generalized to NGAs (we give only the generalization of the ﬁrst version): GenEmersonLei(A) Input: NGA A = (Q. δ.. q0 . {F0 . F) Output: EMP if Lω (A) = ∅. because each iteration but the last one removes at least one state from L. q0 .252 CHAPTER 13. The version on the right avoids this problem. Σ. NEMP otherwise 1 2 3 4 5 6 7 8 EmersonLei2(A) Input: NBA A = (Q. F) Output: EMP if Lω (A) = ∅. . q). EMPTINESS CHECK: IMPLEMENTATIONS The pseudocode for the algorithm is shown below on the left-hand-side. Σ. EmersonLei(A) Input: NBA A = (Q. Fm−1 }) Output: EMP if Lω (A) = ∅. Σ. . NEMP otherwise 1 2 3 4 5 6 7 8 9 L←Q repeat OldL ← L for i=0 to m − 1 L ← Pre(OldL ∩ Fi ) L ← BackwardBFS(L) until L = OldL if q0 ∈ L then report NEMP else report NEMP Proposition 13. The algorithm may compute the predecessors of a state twice. L[1].

2. . . Given a set S ⊆ Q of states.. . Fm−1 . For every state q ∈ L[(i + 1) · m] there is a path π = πm−1 πm−2 π0 such that for every j ∈ {0. every occurrence of a state of F j is always followed by some later occurrence of a state of F( j+1) mod m . We claim that the sequence L[0] ⊇ L[m] ⊇ L[2 · m] . which we call the Modiﬁed Emerson-Lei’s algorithm (MEL). . for every i ∈ {0. Since Q is ﬁnite.13. we can easily show that L[(n + 1) · m] ⊇ L[n · m] holds for every n ≥ 0. . there is a least index i ≥ 0 such that L[(i + 1) · m] = L[i · m]). π visits states of F0 . In particular. and L[i · m] is the set of live states.2 A Modiﬁed Emerson-Lei’s algorithm There exist many variants of Emerson-Lei’s algorithm that have the same worst-case complexity. In this path. the ﬁxpoint L[i · m] exists. . 13. since L[(i + 1) · m] = L[i · m]. which proves the claim. m − 1}. . m − 1}. MEL computes Pre(inf (OldL) ∩ Fi ). . and. redeﬁne the n-live states of A as follows: every state is 0-live. Instead of computing Pre(OldL ∩ F) at each iteration step. .e. reaches a ﬁxpoint L[i · m] (i. m − 1} the segment π j contains at least one transition and leads to a state of L[i · m + j] ∩ F j . There is a path starting at q that visits F j inﬁnitely often for every j ∈ {0. . . So q ∈ L[i · m]. . So every state of L[(i + 1) · m] = L[i · m] is live. Since GenEmersonLei(A) computes the sequence L[0] ⊇ L[m] ⊇ L[2 · m] . . ALGORITHMS BASED ON BREADTH-FIRST SEARCH 253 Proof: For every k ≥ 0. Let L[n] denote the set of n-live states. . let inf (S ) denote the states q ∈ S such that some inﬁnite path starting at q contains only states of S . .2. . . We present here one of these variants.20. by means of heuristics. but try to improve the eﬃciency. it leads from a state of L[(i + 1) · m] to another state of L[(i + 1) · m]. . at least in some cases. . We now show that every state of L[i · m] is live. Let q be a live state.. and q is (n + 1)-live if some path having at least one transition leads from q to a n-live state of F(n mod m) . Proceeding as in Lemma 13. . after termination L contains the set of live states. .

Clearly. while MEL introduces the overhead of repeatedly computing inf operations. MEL computes the sequence L [0] ⊇ L [1] ⊇ L [2] . {F0 . Interestingly. we can look at this procedure as the computation of the live states of D0 using MEL. . Since inf (S ) can be computed in time O(|Q| + |δ|) for any set S . In the proof of Proposition 12. Σ. and then compare it with Emerson-Lei’s algorithm. D2i+1 was obtained from D2i by removing all nodes having only ﬁnitely many descendants. in fact. .3 we deﬁned a sequence D0 ⊇ D1 ⊇ D2 ⊇ . of inﬁnite acyclic graphs. it still makes sense in many cases because it reduces the number of executions of the repeat loop. . . As we shall see. . and so MEL computes L[i]. By the deﬁnition of liveness we have inf (L[i]) = L[i]. . So. we have already met Emerson-Lei’s algorithm in Chapter ??. To prove correctness we claim that after termination L contains the set of live states. . we have that L[i] is also the ﬁxpoint of the sequence L [0] ⊇ L [1] ⊇ L [2] . q0 . Recall that the set of live states is the ﬁxpoint L[i] of the sequence L[0] ⊇ L[1] ⊇ L[2] . and L [n + 1] = inf (pre+ (L [i] ∩ α)). . . MEL runs in O(|Q| · (|Q| + |δ|)) time. Since L[n] ⊇ L [n] ⊇ L[i] for every n > 0. repeat OldL ← L L ← inf (OldL) L ← Pre(L ∩ F) L ← BackwardBFS(L) until L = OldL if q0 ∈ L then report NEMP else report NEMP function inf(S ) repeat OldS ← S S ← S ∩ Pre(S ) until S = OldS return S In the following we show that MEL is correct.254 CHAPTER 13. NEMP otherwise 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 L ← Q. δ. and D2i+2 was obtained from D2i+1 by removing all nodes having only non-accepting descendants. . EMPTINESS CHECK: IMPLEMENTATIONS MEL(A) Input: NGA A = (Q. . In the terminology of this chapter. Deﬁne now L [0] = Q. . . Fk−1 }) Output: EMP if Lω (A) = ∅.... This corresponds to D2i+1 := inf(D2i ) and D2i+2 := pre+ (D2i+1 ∩ α).

Consider the automaton of Figure 13. and so diﬀerent runs on the same input may return diﬀerent lassos. The i-th iteration of EmnersonLei’s algorithm removes qn−i+1 The number of calls to BackwardBFS is (n + 1). ALGORITHMS BASED ON BREADTH-FIRST SEARCH 255 13. with the same simple modiﬁcation. of Emerson-Lei’s algorithm removes q(n−i+1). the algorithm stops after on iteration.1 q0.2 .2 as result of the call to BackwardBFS.1 qn.2 ••• qn.2 Figure 13. but not always. and so.2 q1. and states q(n−i+1).9. q0 q1 ••• qn−1 qn Figure 13.2. although a simple modiﬁcation allowing the algorithm to stop if L = ∅ spares the (n + 1)-th operation.13. Consider the automaton of Figure 13.) .3 Comparing the algorithms We give two families of examples showing that MEL may outperform Emerson-lei’s algorithm. So in this case the inf -operations are all redundant.1 and q(n−i+1). The i-th iteration.2.9: An example in which the EL-algorithm outperforms the MEL-algorithm Exercises Exercise 96 Which lassos of the following NBA can be found by a run of NestedDFS? (Recall that NestedDFS is a nonderterministic algorithm. and so the algorithm calls BackwardBFS (n + 1) times The i-th iteration of MEL-algorithm removes no state as result of the inf -operation. On the other hand. A good case for MEL.1 q1.1 and q(n−i+1). the ﬁrst inf-operation of MEL already sets the variable L to the empty set of states.8. q0.8: An example in which the MEL-algorithm outperforms the Emerson-Lei algorithm A good case for Emerson-Lei’s algorithm.

. Show that the following algorithm correctly decides emptiness of weak B¨ chi automata. NEMP otherwise 1 S ←∅ 2 dfs(q0 ) 3 report EMP 4 5 6 7 8 9 proc dfs(q) add q to S for all r ∈ δ(q) do if r S then dfs(r) if r ∈ F then report NEMP return Exercise 98 Consider Muller automata whose accepting condition contains one single set of states F. F) Output: EMP if Lω (A) = ∅. Hint: Consider the version of TwoStack for NGAs.256 CHAPTER 13. Consider the root of a scc containing only accepting states. Σ.e. Exercise 99 (1) Given R. S ). u Hint. a run ρ is accepting if inf (ρ) = F. i.. SingleDFS(A) Input: weak NBA A = (Q. S ⊆ Q. S ) as the set of ascendants q of R such that there is a path from q to R that contains only states of S . EMPTINESS CHECK: IMPLEMENTATIONS start 0 1 2 3 Exercise 97 A B¨ chi automaton is weak if no strongly connected component contains both acu cepting and non-accepting states. Transform TwoStack into a linear algorithm for checking emptiness of these automata. q0 . Give an algorithm to compute pre+ (R. deﬁne pre+ (R. δ.

F) Output: EMP if Lω (A) = ∅. Σ. L) 5 until L = OldL 6 if q0 ∈ L then report NEMP 7 else report NEMP 257 Is MEL2 correct? What is the diﬀerence between the sequences of sets computed by MEL and MEL2? . q0 .13. δ. NEMP otherwise 1 L←Q 2 repeat 3 OldL ← L 4 L ← pre+ (L ∩ F. ALGORITHMS BASED ON BREADTH-FIRST SEARCH (2) Consider the following modiﬁcation of Emerson-Lei’s algorithm: MEL2(A) Input: NBA A = (Q.2.

258 CHAPTER 13. EMPTINESS CHECK: IMPLEMENTATIONS .

and execution. Usually this is undesirable. In this way. In Chapter 8 we showed how to construct a system NFA recognizing all the executions of a given system. The same construction can be used to deﬁne a system NBA recognizing all the ω-executions. We extend these notions to the inﬁnite case.Chapter 14 Applications I: Veriﬁcation and Temporal Logic Recall that. Notice that according to this deﬁnition. For instance. possible execution. and it is more convenient to assume that such a conﬁguration c has exactly one legal successor.1 Consider the little program of Chapter 8. The set of terminating conﬁgurations can usually be identiﬁed syntactically. The terminating executions are then the ω-executions of the form c0 . cn−1 cω for some terminating conﬁguration n cn .e. by examining only a ﬁnite preﬁx of an inﬁnite execution it is not possible to determine whether the inﬁnite execution violates the property or not. they are properties that are only violated by inﬁnite executions of the systems. . namely c itself. Example 14. . 259 .. i. of conﬁgurations where c0 is some initial conﬁguration. . . if a conﬁguration has no legal successors then it does not belong to any ω-execution. More formally. u 14. and for every i ≥ 1 the conﬁguration ci is a legal successor according to the semantics of the system of the conﬁguration ci−1 . liveness properties are those stating that the system will eventually do something good. every reachable conﬁguration of the system belongs to some ω-execution.1 Automata-Based Veriﬁcation of Liveness Properties In Chapter 8 we introduced some basic concepts about systems: conﬁguration. in a program the terminating conﬁgurations are usually those in which control is at some particular program line. An ω-execution of a system is an inﬁnite sequence c0 c1 c2 . intuitively. In this chapter we apply the theory of B¨ chi automata to the problem of automatically verifying liveness properties.

1] 1. and transform the regular expression vi into an NBA Vi using the algorithm of Chapter 11. 0] and [5. and let Σ stand for the set of all conﬁgurations. 1. and so no self-loops need to be added. 0. We then . let NCi . 0. 1. 1. 1] [2. 1. 1}. T i . 0 i [1.1. Figure 14. 1 [2. 0. 1] 3. 0. 0] [5. 0] 2. 1 [4. 1]. For i ∈ {0. 0] [4. We construct the system NBA ωE recognizing the ω-executions of the algorithm (the NBA has just two states). 1] 2. 1] [1. 0. 0 1. 1. 0. 1. We can check this property using the same technique as in Chapter 8. APPLICATIONS I: VERIFICATION AND TEMPORAL LOGIC 1. 0. 1. 0. 0. 1. 0. 0.2 shows again the network of automata modelling the algorithm and its asynchronous product.260 CHAPTER 14. 0] 4.1. 0] [1. 1 [3. 1. 0 [1.1: System NBA for the program 1 2 3 4 5 while x = 1 do if y = 1 then x←0 y←1−x end Its system NFA is the automaton of 14. 1 Figure 14. The possible ω-executions that violate the property for process i are represented by the ω-regular expression vi = Σ∗ T i (Σ \ Ci )ω . and how they can be automatically checked. respectively. We do the same now for liveness properties. 0. 1. but without the red self-loops at states [5. 1] 5. from which we can easily gain its system NBA. The ﬁnite waiting property for process i states that if process i tries to access its critical section. 0. The system NBA is the result of adding the self-loops 14. 0. 1 [5. 0 [1. 1 [5. Observe that in this case every conﬁguration has at least a successor. and is in the critical section. Ci be the sets of conﬁgurations in which process i is in the non-critical section. 1] [1. 1. 1. 1. 1] 4.1 Checking Liveness Properties In Chapter 8 we used Lamport’s algorithm to present examples of safety properties. 0] 1. is trying to access the critical section. 0. 0] 5. 0 [5. it eventually will.

0. b0 = 1 b1 ← 0. 1. 0. 0. t0. q1 b1 = 0 b1 ← 0 0. q1 b0 ← 0 b0 ← 1 b0 ← 0 b1 ← 0 1. b1 = 0 bi ← 1 1 b1 ← 1. nc0. 0. . c1 b0 = 1 1. q1 b0 = 0 b1 = 1 1. b1 = 1 261 q1 q1 b0 = 1 b0 = 0 b0 = 0 b1 ← 1 nc1 t1 b1 ← 0 c1 b1 = 0 0. t0. nc0. nc0. q1 b1 = 1 1. 1. t1 b0 ← 0 b1 = 1 1. nc1 b1 ← 1 b0 ← 1 0. t0.1.2: Lamport’s algorithm and its asynchronous product. t1 b1 ← 0 b0 ← 0 b0 = 1 0. c1 b1 = 1 b1 = 1 b1 ← 1 b1 = 1 b0 = 1 1. t1 b1 ← 0 1. nc1 b0 ← 0 b1 = 0 1. 1. b0 = 0 b0 ← 1 0 b0 ← 0 b1 = 1 nc0 b0 ← 1 t0 b0 ← 0 b0 = 1 b1 ← 0 b1 = 0 c0 1 0 bi ← 0 b0 ← 1. 0. 0. nc1 b1 ← 1 b0 ← 1 0. t0. 1. q1 Figure 14. c0. nc0. c0. AUTOMATA-BASED VERIFICATION OF LIVENESS PROPERTIES b0 ← 0. nc0. c0. t0. 1. 1. 1. c0. q1 b1 = 0 b1 ← 0 1. 1.14.

0. But we do not expect the ﬁnite waiting property to hold if processes may break down while waiting. 0. we can construct NBAs for vi and inf. 1.nc1 ]. nc1 ] to reﬂect the fact that [1. If we denote M0 = Σ × {0} and M1 = Σ × {1} for the ‘moves” of process 0 and process 1. t0 . t1 ] [1. and check its emptiness using one of the algorithms of Chapter 13. q1 ] [0. nc0 . but all it does is continuously check that the current value of b1 is 1. 0. t0 . q1 ] [1. nc0 .t0 . 0. t1 ] [1. The condition that both processes must move inﬁnitely often is called a fairness assumption. t0 . nc0 . and compute their intersection. For instance. in fact. nc0 . For process 1 the property still fails because of. We can repair the deﬁnition by reformulating the property as follows: in any ω-execution in which both processes execute inﬁnitely many steps. we do not need to use the special algorithm for intersection of NBAs.262 CHAPTER 14. So the new alphabet of the NBA is Σ × {0. the sequence ( [0. and so we can apply the construction for NFAs. c0 . nc1 ] is reached by a move of process 0.0) [1. 0. nc1 ] [1.0. 0. Intuitively. 1. we also label it with the number of the process responsible for the move leading to that conﬁguration. nc1 ] −− − − − − − − −→[1. for instance. Observe that. 0. 1. this corresponds to process 1 breaking down after requesting access. q1 ] )ω in which process 1 repeatedly tries to access its critical section. 1. The result of the check for process 0 yields that the property fails because for instance of the ω-execution [0. if process 0 tries to access its critical section. 0. nc1 ] − − − − − − − − − − − →[1. nc0 . t0 . t0 . but always lets process 0 access ﬁrst. respectively. then the regular expression inf = (M0 + M1 )∗ M0 M1 ω ([1. Only process 0 continues. Instead of labeling a transition only with the name of the target conﬁguration. APPLICATIONS I: VERIFICATION AND TEMPORAL LOGIC construct an NBA for ωE ∩ Vi using intersNBA(). our deﬁnition of the ﬁnite waiting property is wrong. t1 ]ω In this execution both processes request access to the critical section. t0 . 1}. then it eventually will. nc1 ] [0. . the transition [0. t0 . 0. The simplest way to solve this problem is to enrich the alphabet of the system NBA. nc0 .nc1 ] represents all ω-executions in which both processes move inﬁnitely often. since all states of ωE are accepting. To check if some ω-execution is a violation. q1 ] [1. 0.t0 . but from then on process 1 never makes any further step. t0 . nc1 ] [1. and L(vi ) ∩ L(inf ) (where vi is suitably rewritten to account for the larger alphabet) is the set of violations of the reformulated ﬁnite waiting property. nc1 ] becomes [0. For process 0 the check yields that the properly indeed holds. 0.0. So.

2 Let AP be a ﬁnite set of atomic propositions.. Formally: Deﬁnition 14. Atomic propositions are combined by means of the usual Boolean operators and the temporal operators X (“next”) and U (“until”).2 Linear Temporal Logic In Chapter 8 and in the previous section we have formalized properties of systems using regular. For instance. We call these sequences computations. • σ |= ϕ1 ∧ ϕ2 iff σ |= ϕ1 and σ |= ϕ2 . or ω-regular expressions. . . LTL is close to natural language. the meaning of the atomic propositions is ﬁxed by a valuation function V : AP → 2C that assigns to each abstract name the set of conﬁgurations at which it holds. . is the set of expressions generated by the grammar ϕ := true | p | ¬ϕ1 | ϕ1 ∧ ϕ2 | Xϕ1 | ϕ1 U ϕ2 . Formulas of LTL are constructed from a set AP of atomic propositions. . where σi ⊆ AP for every i ≥ 0. Deﬁnition 14. .3 Given a computation σ ∈ C(AP). ∨. In this section we introduce a new language for specifying safety and liveness properties. such that for every i ≥ 0 the set of basic properties satisﬁed by ci is exactly σi . The executable computations of a system are the computations σ for which there exists an ω-execution c0 c1 c2 . of σ. and it is diﬃcult to be convinced that they correspond to the intended property. letσ j denote the suﬃx σ j σ j+1 σ j+2 . Formally. • σ |= p iff p ∈ σ(0). Formulas are interpreted on sequences σ = σ0 σ1 σ2 . interpreted in the usual way. denoted by LTL(AP). LINEAR TEMPORAL LOGIC 263 14. This becomes rather diﬃcult for all but the easiest properties. the NBA or the ω-regular expression for the modiﬁed ﬁnite waiting property are already quite involved.2. called Linear Temporal Logic (LTL). The set of LTL formulas over AP. The satisfaction relation σ |= ϕ (read “σ satisﬁes ϕ”) is inductively deﬁned as follows: • σ |= true. given a system with a set C of conﬁgurations. . • σ |= Xϕ iﬀ σ1 |= ϕ. . → and ↔. • σ |= ϕ1 U ϕ2 iff there exists k ≥ 0 such that σk |= ϕ2 and σi |= ϕ1 for every 0 ≤ i < k. Intuitively.14. We use the following abbreviations: • false. atomic propositions are abstract names for basic properties of conﬁgurations. • σ |= ¬ϕ iff σ |= ϕ. but still has a formal semantics. NFAs. The set of all computations over AP is denoted by C(AP). or NBAs. whose meaning is ﬁxed only after a concrete system is considered.

y=0. y=0} {at 5. y=0} {at 2. 1] [3. y = 0. σ4 |= φ1 . σ |= Fϕ iff there exists k ≥ 0 such that σk |= ϕ. 1. 0] [2. 1. According to the semantics above. x=0. at 5. • Gϕ = ¬F¬ϕ (“always ϕ” or “globally ϕ”). x=1. 1]ω Let C be the set of conﬁgurations of the program. Under this valuation. x=1. y=1} and deﬁne the valuation function V : AP → 2C as follows: • V(at i) = {[ . 1. 0] )ω [1. but σ3 |= φ1 . y=1} {at 2. x=1. y=1}ω We give some examples of properties: • φ0 = x=1 ∧ Xy=0 ∧ XXat 4. 1] [5. 1] [4.264 CHAPTER 14. 0. . σ2 . 0. 1. 0. y=1} {at 4. y=1} {at 5. x=0. 1] [2. x=1. and similarly for x = 1. x. the value of y in the second conﬁguration is 0. y=1}ω {at 1. 5}. 0] [4. and x=j expresses that the current value of x is j. σ3 . y=1} {at 3. 1]ω [1. In natural language: x eventually gets the value 0.1has exactly four ω-executions: e1 e2 e3 e4 = = = = [1. 0. y] ∈ C | x = 0}. 1. σ4 |= φ0 . We have σ2 |= φ0 . at 2. . y] ∈ C | = i} for every i ∈ {1. . x=0. x=1. 1] [1. We have σ1 . and in the third conﬁguration the program is at location 4. In natural language: the value of x in the ﬁrst conﬁguration of the execution is 1. y=0} {at 4. y = 1. 1. x=0. • φ1 = Fx=0. and σ1 . The executable computations corresponding to the four ω-executions above are σ1 σ2 σ3 σ4 = = = = {at 1. . y=1} {at 1. 0. x=0. We choose AP = {at 1. . The set of computations that satisfy a formula ϕ is denoted by L(ϕ). 0]ω ( [1. We write some formulas expressing properties of the possible ω-executions of the program. x=1. . at i expresses that the program is at line i. . x=0. APPLICATIONS I: VERIFICATION AND TEMPORAL LOGIC • Fϕ = true U ϕ (“eventually ϕ”). y=1} {at 5. 0. x. Example 14.4 Consider the little program at the beginning of the chapter. y=0} )ω {at 1. 0. Observe that the system NBA of Figure 14. x=0. A system satisﬁes ϕ if all its executable computations satisfy ϕ. y=0}ω ( {at 1. . x=0. σ |= Gϕ iff σk |= ϕ for every k ≥ 0. 0] [5. . • V(x=0) = {[ .. 1] [5. x=1. According to the semantics above.

and y never decreases during the execution. • The property that process i cannot access the critical section without having requested it ﬁrst is expressed by ¬(¬T i U Ci ) Both processes satisfy this property. We have σ1 . to the second ‘yes’. T 0 .2. • The mutual exclusion property is expressed by the formula G(¬C0 ∨ ¬C1 ) The algorithm satisﬁes the formula. As system NBA we use the one in which transitions are labeled with the name of the target conﬁguration. σ3 |= φ2 and σ2 . Example 14.14. LINEAR TEMPORAL LOGIC 265 • φ2 = x=0 U at 5. the execution terminates in a conﬁguration with y = 0. with the obvious valuation. Notice however that the natural language description is ambiguous: Do executions that never reach location 5 satisfy the property? Do executions that set x to 1 immediately before reaching location 5 satisfy the property? The formal deﬁnition removes the ambiguities: the answer to the ﬁrst question is ‘no’. and the property that ω-executions satisfying the fairness assumption also satisfy φ is expressed by ψ → φ. and with the number of the process responsible for the move leading to that conﬁguration. • φ3 = y=1∧F(y=0 ∧ at 5) ∧ ¬(F(y=0 ∧ Xy=1)). • The na¨ve ﬁnite waiting property for process i is expressed by ı G(T i → FCi ) The modiﬁed version in which both processes must execute inﬁnitely many moves is expressed (GFM0 ∧ GFM1 ) → G(T i → FCi ) Observe how fairness assumptions can be very elegantly expressed in LTL.5 We express several properties of the Lamport-Bruns algorithm (see Chapter 8) using LTL formulas. None of the processes satisﬁes the na¨ve version of the ﬁnite waiting property. This is one of the properties we analyzed in Chapter 8. NC1 . Process 0 ı satisﬁes the modiﬁed version. M1 }. We take AP = {NC0 . and it is not satisﬁed by any ω-execution. The assumption itself is expressed as a formula ψ. σ4 |= φ2 . T 1 . M0 . C1 . In natural language: the ﬁrst conﬁguration satisﬁes y = 1. . but process 1 does not. In natural language: x stays equal to 0 until the execution reaches location 5. C0 .

The closure cl(φ) of a formula φ is the set containing all subformulas of φ and their negations. p U q. A nonempty set α ⊆ cl(φ) is an atom of cl(φ) if it satisﬁes the following properties: (a0) If true ∈ cl(φ). followed by a (possibly empty!) interval at which C1 holds. and the formula ¬φ otherwise. condition (a1) holds because σ satisﬁes a conjunction if and only if it satisﬁes its conjuncts.3 ¨ From LTL formulas to generalized Buchi automata We present an algorithm that. Example 14. p p U q.3. . then true ∈ α. The only two atoms containing p ∧ (p U q) are {p. α. given a formula ϕ ∈ LTL(AP) returns a NGA Aϕ over the alphabet 2AP recognizing L(ϕ). (a2) For every ¬φ1 ∈ cl(φ): ¬φ1 ∈ α if and only if φ1 The set containing all atoms of cl(φ) is denoted by at(φ). ¬p.6 Given a formula φ. p ∧ (p U q)} . then α is necessarily an atom.1 Satisfaction sequences and Hintikka sequences We introduce the notion of satisfaction sequence and Hintikka sequence for a computation σ and a formula φ. decides whether the executable computations of the system satisfy the formula. given a system and an LTL formula. For instance. 14. p U q . The property holds. q.266 CHAPTER 14. ¬q. (a1) For every φ1 ∧ φ2 ∈ cl(φ): φ1 ∧ φ2 ∈ α if and only if φ1 ∈ α and φ2 ∈ α. ¬q. the computation continues with a (possibly empty!) interval at which we see ¬C1 holds. p ∧ (p U q). ¬p U q. 14. Observe that if α is the set of all formulas of cl(φ) satisﬁed by a computation σ. followed by a point at which C0 holds. ¬(p ∧ (p U q))} . p ∧ (p U q)} and {p. Deﬁnition 14. q. the negation of φ is the formula ψ if φ = ¬ψ.7 The closure of the formula p ∧ (p U q) is { p. and then derive a fully automatic procedure that. APPLICATIONS I: VERIFICATION AND TEMPORAL LOGIC • The bounded overtaking property for process 0 is expressed by G( T 0 → (¬C1 U (C1 U (¬C1 U C0 )))) The formula states that whenever T 0 holds.

Deﬁnition 14. where sats(σ. there exists j ≥ i such that φ2 ∈ α j . p ∈ αi if and only if p ∈ σi .3. p U q})ω Observe that σ satisﬁes φ if and only if and only if φ ∈ sats(σ.10 A pre-Hintikka sequence for φ is an inﬁnite sequence α = α0 α1 α2 . someone who ignores the semantics can still decide whether a sequence is a Hintikka sequence or not. ¬q. i. φ) = {p. φ. 1) sats(σ. and so α contains neither ¬p nor ¬(p U q). but not both. α0 α1 α2 . the satisfaction sequence of a computation σ is obtained by “completing” σ: while σ only indicates which atomic propositions hold at each point in time. an atom always contains either a subformula or its negation.. A pre-Hintikka or Hintikka sequence α matches a computation σ if for every atomic proposition p ∈ AP.8 The satisfaction sequence for a computation σ and a formula φ is the inﬁnite sequence of atoms sats(σ. ¬(p U q)}ω sats(σ2 . φ) = sats(σ. and exactly one of q and ¬q. . Deﬁnition 14. . and σ2 = ({p}{q})ω . FROM LTL FORMULAS TO GENERALIZED BUCHI AUTOMATA 267 Let us see why. the satisfaction sequence also indicates which atom holds. By (a1). Intuitively.¨ 14. matches σ if it extends σ with new formulas. every atom α containing p ∧ (p U q) must contain p and p U q. . φ) = ({p. Hintikka sequences provide a syntactic characterization of satisfaction sequences. We prove that a sequence is a satisfaction sequence if and only if it is a Hintikka sequence. ¬q. p U q}{¬p.9 Let φ = p U q.e. We have sats(σ1 . q. . 2) .1 1 Loosely speaking. i.e. . φ. A pre-Hintikka sequence is a Hintikka sequence if it also satisﬁes (g) For every φ1 U φ2 ∈ αi . φ. (l2) For every φ1 U φ2 ∈ cl(φ): φ1 U φ2 ∈ αi if and only if φ2 ∈ αi or φ1 ∈ α and φ1 U φ2 ∈ αi+1 . i) is the atom containing the formulas of cl(φ) satisﬁed by σi .. σ1 = {p}ω . The deﬁnition of a Hintikka sequence does not involve the semantics of LTL. Example 14. φ. 0). . φ. if and only if φ belongs to the ﬁrst atom of σ. Satisfaction sequences have a semantic deﬁnition: in order to know which atom holds at a point one must know the semantics of LTL. by (a2). of atoms satisfying the following conditions for every i ≥ 0: (l1) For every Xφ ∈ cl(φ): Xφ ∈ αi if and only if φ ∈ αi+1 . 0) sats(σ. .

φ). The proof is divided into two parts. is a satisfaction sequence. APPLICATIONS I: VERIFICATION AND TEMPORAL LOGIC Observe that conditions (l1) and (l2) are local: in order to know if α satisﬁes them we only need to inspect every pair αi . φ. φ. • ψ = φ1 ∧ φ2 . . (b) If φ1 U φ2 ∈ sats(σ. i). By induction hypothesis. We prove that for every i ≥ 0: ψ ∈ αi if and only if ψ ∈ sats(σ. p ∈ σi if and only if p ∈ sats(σ.11 Let σ be a computation and let φ be a formula. i). i + 1). • ψ = p for an atomic proposition p. i) (induction on ψ). . φ. φ. condition (g) is global. On the contrary. Theorem 14. i) ⇔ φ1 ∧ φ2 ∈ sats(σ. φ. It follows immediately from the deﬁnition above that if α = α0 α1 α2 . and so φ1 U φ2 ∈ sats(σ. there is at least one index j ≥ i such that φ2 ∈ α j . φ. We consider again two cases. φ. By condition (g). and the sequence α itself satisﬁes (g). i). then every pair αi . since the the distance between the indices i and j can be arbitrarily large. by induction hypothesis (induction on jm − i). Then true ∈ sats(σ. i) and. i). let α = α0 α1 α2 . In the second case. i) • ψ (condition (a1)) (induction hypothesis) (deﬁnition of sats(σ. By condition (l2) of the deﬁnition of Hintikka sequence. and so φ1 U φ2 ∈ sats(σ. φ2 ∈ sats(σ. and so φ1 U φ2 ∈ sats(σ. The unique Hintikka sequence for φ matching σ is the satisfaction sequence sats(σ. . true ∈ αi .268 CHAPTER 14. If i = jm . . φ. . φ)) φ1 or ψ = Xφ1 . • ψ = true. (a) If φ1 U φ2 ∈ αi . So every satisfaction sequence is a Hintikka sequence. we have either φ2 ∈ αi+1 or φ1 ∈ αi+1 . Proof: It follows immediately from the deﬁnitions that sats(σ. i). we have φ1 ∈ sats(σ. then φ2 ∈ α j . We have φ1 ∧ φ2 ∈ αi ⇔ φ1 ∈ αi and φ2 ∈ αi ⇔ φ1 ∈ sats(σ. and we proceed as in the case φ2 ∈ αi . since αi is an atom. then since φ1 ∈ αi . • ψ = φ1 U φ2 . αi+1 of consecutive atoms. φ. By the deﬁnition of satisfaction sequence. we have to consider two cases: – φ2 ∈ αi . φ. For the other direction. Since α matches σ. – φ1 ∈ αi and φ1 U φ2 ∈ αi+1 . be a Hintikka sequence for φ matching σ. φ. φ) is a Hintikka sequence for φ matching σ. φ. i) and φ2 ∈ sats(σ. i + 1). We prove the result by induction on jm − i. Let jm be the smallest of these indices. The proof is by induction on the structure of ψ. i). The proofs are very similar to the last one. Since φ1 U φ2 ∈ αi+1 . φ. then φ1 U φ2 ∈ sats(σ. then φ1 U φ2 ∈ αi . i). The following theorem shows that the converse also holds: Every Hintikka sequence if a satisfaction sequence. φ. αi+1 satisﬁes (l1) and (l2). we have φ1 U φ2 ∈ sats(σ. In the ﬁrst case we have φ2 ∈ sats(σ. we have p ∈ αi if and only if p ∈ σi . If i < jm . and let ψ be an arbitrary formula of cl(φ). φ).

. FROM LTL FORMULAS TO GENERALIZED BUCHI AUTOMATA – φ2 ∈ sats(σ. An atom belongs to Fφ1 U φ2 if it does not contain φ1 U φ2 or if it contains φ2 . for every atomic proposition p ∈ AP. p ∈ σ0 if and only if p ∈ α0 ). By the u deﬁnition of a satisfaction sequence. . is a computation. φ.The output transitions of any other state α (where α is an atom) are the triples α −→ β such − that σ matches β. . i). . αi+1 ). . Moreover. Proceed now as in case (a). we must σ0 σ1 σ2 guarantee that in every run q0 −− α0 −− α1 −− . φ. −→ −→ −→ such that σ = σ0 σ1 . By condition (l2).. .The states of Aφ (apart from the distinguished initial state q0 ) are atoms of φ. there is at least one index j ≥ i such that φ2 ∈ sats(σ. φ) is the (unique) Hintikka sequence for φ matching σ. i) and φ1 U φ2 ∈ sats(σ. φ. sats(σ. .The accepting condition contains a set Fφ1 U φ2 of accepting states for each subformula φ1 U φ2 of φ. and initial state of Aφ : . and φ ∈ α. By the deﬁnition of a Hintikka sequence. i + 1).2 Constructing the NGA Given a formula φ. by Theorem 14. By the deﬁnition of a satisfaction sequence. we construct Aφ so that its runs are all the sequences q0 −− α0 −− α1 −− .11. 269 – φ1 ∈ sats(σ. and so φ1 U φ2 ∈ αi . σ σ0 σ0 σ1 σ2 . and the pair α.e. and α = α0 α1 . .¨ 14. for every i ≥ 0 such that αi contains a −→ −→ −→ subformula φ1 U φ2 .The alphabet of Aφ is 2AP .The output transitions of the initial state q0 are the triples q0 −− α such that σ0 matches α −→ (i. . By induction hypothesis. The condition that runs must be pre-Hintikka sequences determines the alphabet. So we choose the sets of accepting states as follows: . there exists j ≥ i such that φ2 ∈ α j . . this amounts to guaranteeing that every run contains inﬁnitely many indices i such that φ2 ∈ αi . 0) and. The sets of accepting states of Aφ are determined by the condition that accepting runs must exactly correspond to the Hintikka sequences. transitions. states. the sets of accepting states (recall that Aφ is a NGA) are chosen so that a run is accepting if and only if the pre-Hintikka sequence is also a Hintikka sequence. 14. we construct a generalized B¨ chi automaton Aφ recognizing L(φ).3. . or inﬁnitely many indices j such that ¬(φ1 U φ2 ) ∈ α j . φ. So Aφ must recognize the computations σ such that the ﬁrst atom of the Hintikka sequence for φ matching σ contains φ. φ2 ∈ αi . To achieve this goal. a computation σ satisﬁes φ if and only if φ ∈ sats(σ. . j).3. φ. β satisﬁes conditions (l1) and (l2) (where α and β play the roles of αi resp. is a pre-Hintikka sequence of φ matching σ.

12 We construct the automaton Aφ for the formula φ = p U q. So these states can be eliminated. β satisﬁes (l1) and (l2) then add (α. q0 . we can easily see that the atoms {p. p U q and ¬(p U q). ¬(p U q)}. There is no transition from β to α because it would violate condition (l2): p U q ∈ β. and we can represent the accepting states as for NBAs. Aφ contains a transition α − − β because −→ {p} matches β. but neither q ∈ β nor p U q ∈ α. 2AP . However. F) Example 14. q. Moreover. p U q}. The closure cl(φ) has eight atoms. It is worth noticing that the NGAs obtained from conversions of LTL formulas satisfy the following property: every word accepted by the NGA has one single accepting run. δ ← ∅. P. since the label of a transition can be deduced from its target state. β) to δ if β Q then add β to W F←∅ for all φ1 U φ2 ∈ cl(φ) do F ← F ∪ {Fφ1 U φ2 } return (Q. because φ contains no subformulas of the form X ψ. and we are left with the ﬁve atoms shown in Figure 14. and α. LTLtoNGA(φ) Input: LTL-Formula φ over AP Output: NGA Aφ = (Q. β satisfy conditions (l1) and (l2). since φ only has one subformula of the form φ1 U φ2 . ¬q. add α to W while W ∅ do pick α from W add α to Q for all φ1 U φ2 ∈ cl(φ) do if φ1 U φ2 α or φ2 ∈ α then add α to Fφ1 U φ2 for all P ⊆ AP do for all β ∈ at(φ) matching P do if α. ¬q. Condition (l1) holds vacuously. corresponding to all the possible ways of choosing between p and ¬p. δ. Therefore. plus the initial state q0 . ¬(p U q)} and β = {p. and {¬p. F) with L(Aφ ) = L(φ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Q = {q0 }. p U q} have no output transitions. Observe that every transition of Aφ leading to an atom α is labeled by α ∩ AP. Let α = {¬p. α ∩ AP. q and ¬q.3. APPLICATIONS I: VERIFICATION AND TEMPORAL LOGIC The pseudocode for the translation algorithm is shown below. q0 . This follows {p} . condition (l2) holds because p U q α and q β and p α. {¬p. α) to δ.270 CHAPTER 14. δ. q. 2AP . they are omitted in the ﬁgure. for all α ∈ at(φ) such that φ ∈ α do add (q0 . ¬q.3 uses some conventions to simplify the graphical representation. the NGA is in fact a NBA. Figure 14. ¬(p U q)}. because they would violate condition (l2).

¬q. an accepting run is the satisfaction sequence of the computation it accepts. {F1 . Observe that we not only have that qn and rn are both ﬁnal or nonﬁnal: the same holds for every a3 a1 a2 pair qi .3. Consider the partition of Q into blocks given by: two states q. This implies L(q) = L(r). which implies a1 .3. . FROM LTL FORMULAS TO GENERALIZED BUCHI AUTOMATA 271 ¬p. r belong to the same block if for every i ∈ {1. p U q p. . let A = (Q. for every two states q.3: NGA (NBA) for the formula p U q. This immediately proves Lω (A) = Lω (A/CSR). a. and returns the quotient A/CSR. r ) such that q . Fn }) ba a NGA. the property also for ω-words: every inﬁnite run q −− q1 −− q2 −− q3 · · · −→ −→ −→ a3 a1 a2 is “matched” by a run r −− r1 −− r2 −− r3 · · · so that for every i ≥ 1 the states qi . Σ. . and so the reduction algorithm also works for NBAs. ri . Recall that given an NFA A = (Q. r belonging to the same block of CSR . δ. qn is ﬁnal if and only if rn is ﬁnal. q0 . an ∈ L(r). q.¨ 14. n} either q.3 Reducing the NGA The reduction algorithm for NFAs shown in Chapter 3 can be adapted to NBAs and to NGAs. 14. r Fi . in particular. from the fact that. because an an a1 a2 a1 a2 every run q −− q1 −− q2 · · · qn−1 −− qn can be “matched” by a run r −− r1 −− r2 · · · rn−1 −− rn −→ −→ −→ −→ −→ −→ in such a way that for every i ≥ 1 the states qi . ri are both −→ −→ −→ accepting or non-accepting. r of a block of CSR and for every (q. . . . . F) the algorithm computes the relation CSR. . Moreover. p U q Figure 14. Q \ F}. For the extension to NGA. ¬(p U q) q0 p. For every two states q. . we now have that every inﬁnite run a3 a3 a1 a2 a1 a2 q −− q1 −− q2 −− q3 · · · is “matched” by a run r −− r1 −− r2 −− r3 · · · so that for every i ≥ 1 −→ −→ −→ −→ −→ −→ . there is a transition (r. Deﬁne CSR as the coarsest stable reﬁnement of this new partition. . Σ. δ. . loosely speaking. and the states of a block are either all ﬁnal or all nonﬁnal (because every equivalence class of CSR is included in F or Q \ F). q ) ∈ δ. r belong to the same block. p U q ¬p. an ∈ L(q) if and only if a1 . since CSR s stable. ri belong to the same block. q0 . r ∈ Fi or q. ¬(p U q) p. Moreover. The relation CSR partitions the set of states into blocks. the coarsest stable reﬁnement of the equivalence relation {F. and so. q. a. and the satisfaction sequence of a given computation is by deﬁnition unique. ¬q. ¬q. .

¬(p U q)} } Q2 = { {p. 1. we take AP = {0. {¬p. w2 ∈ {0. δ.272 CHAPTER 14. So we get Lω (A) = Lω (A/CSR ). APPLICATIONS I: VERIFICATION AND TEMPORAL LOGIC and for every j ∈ {1. Aφ has at most n sets of ﬁnal states. #} with the obvious valuation. p U q}. It is easy to see that the set cl(φ) has size O(n). 14. {p} ∅. We now prove a matching lower bound on the number of states. The relation CSR has three equivalence classes: Q0 = { q0 . {q}. .3. i=1 This formula expresses that # does not hold at any of the ﬁrst 2n positions.4: Reduced NBA for the formula p U q. This formula expresses that at every position exactly one proposition of AP holds. Since φ contains at most n subformulas of the form φ1 U φ2 . #} given by Dn = {ww#ω | w ∈ {0. {¬p.4. ri ∈ F j or qi . p U q}. ¬(p U q)} } which leads to the reduced NBA of Figure 14. Therefore. q0 . 1. n} either qi . ¬q. . Consider the family of ω-languages {Dn }n≥1 over the alphabet {0.3. and construct the following formulas: • φn1 = G((0 ∨ 1 ∨ #) ∧ ¬(0 ∧ 1) ∧ ¬(0 ∧ #) ∧ ¬(1 ∧ #)). Fk }) recognizes Dn and |Q| < 2n . . #}. We ﬁrst show that every NGA recognizing Dn has at least 2n states. 1}n . ri F j . {p. {q}. the NGA Aφ has at most O(2n ) states. q. qw1 = qw2 for two distinct words w1 . p U q} } Q1 = { {p. . 1}n }. We exhibit a family of formulas {φn }n≥1 such that φn has length O(n). . 1}n there is a state qw such that A accepts w#ω from qw . q} {q}. {p. contradicting the hypothesis. . 1. . ¬q. {0. {p. ¬q. . The algorithm can be applied to the NBA of Figure 14. . Assume A = (Q. By the pigeon hole principle. and it holds at at all later positions. q} Figure 14. and every NGA recognizing Lω (φn ) has at least 2n states. Then for every word w ∈ {0. {p.4 Size of the NGA Let n be the length of the formula φ. which does not belong to Dn . 2n−1 • φn2 = ¬# ∧ X i ¬# ∧ X 2n G #. q. It now suﬃces to construct a family of formulas {φn }n≥1 of size O(n) such that Lω (φn ) = Dn . {F1 . But then A accepts w1 w2 #ω . q} {p} {p} ∅ {p}. For this.

It is important to observe that the three steps can be carried out simultaneously. and then the set of atoms β such that (a) β matches P and (b) the pair α. Then for each successor c .4. By applying the valuation ν we transform Av into an automaton with C as alphabet. Steps (1) can be carried out by applying LTLtoNGA. and Step (3) by. The following routine takes a pair [α. The routine ﬁrst computes the successors of c in A s . 14. The states of Av ∩A s are pairs [α. where C is the set of conﬁgurations of A s . .14. c] are the pairs [β. AUTOMATIC VERIFICATION OF LTL FORMULAS 273 • φn3 = G( (0 → X n (0 ∨ #)) ∧ (1 → X n (1 ∨ #) ) ). say. β satisﬁes conditions (l1) and (l2). For Step (2). the automaton Av ∩ A s can be computed by interNFA. • a formula φ of LTL over a set of atomic propositions AP. c ]. The procedure follows these steps: (1) Compute a NGA Av for the negation of the formula φ. c]. it computes ﬁrst the set P of atomic propositions satisfying c according to the valuation. Av recognizes all the computations that violate φ. where α is an atom of φ. φn = φn1 ∧ φn2 ∧ φn3 is the formula we are looking for.4 Automatic Veriﬁcation of LTL Formulas We can now sketch the procedure for the automatic veriﬁcation of properties expressed by LTL formulas. This formula expresses that if the atomic proposition holding at a position is 0 or 1. or #. observe ﬁrst that the alphabets of Av and A s are diﬀerent: the alphabet of Av is 2AP . or by computing the asynchronous product of a network of automata. and c is a conﬁguration. The successors of [α. then n positions later the atomic proposition holding is the same one. describing for each atomic proposition the set of conﬁgurations at which the proposition holds. (3) Check emptiness of Av ∩ A s . c] as input and returns its successors in the NGA Av ∩ A s . and • a valuation ν : AP → 2C . (2) Compute a NGA Av ∩ A s recognizing the executable computations of the system that violate the formula. Clearly. while the alphabet of A s is the set C of conﬁgurations. The input to the procedure is • a system NBA A s obtained either directly from the system. Since all the states of system NBAs are accepting. Observe that φn contains O(n) characters. the two-stack algorithm.

c]) do Exercises Exercise 100 Let AP = {p. ψ of LTL are equivalent. (GFp) ⇒ (Fq).274 CHAPTER 14. then we just replace line 6 6 for all r ∈ δ(q) do by a call to the routine: 6 for all [β. Give NBAs accepting the languages deﬁned by the LTL formulas XG¬p. q} ∅ Σω Σ∗ {q}ω ∗ {p} Σ∗ {q} Σω Σ {p}∗ {q}∗ ∅∗ Σω Exercise 101 Let AP = {p. c ] ∈ Succ([α. APPLICATIONS I: VERIFICATION AND TEMPORAL LOGIC Succ([α. q} and let Σ = 2AP . and p ∧ ¬XFp. Which of the following equivalences hold? Xϕ ∧ Xψ Fϕ ∧ Fψ Gϕ ∧ Gψ FXϕ (ψ1 U ϕ) ∧ (ψ2 U ϕ) (ϕ U ψ1 ) ∧ (ϕ U ψ2 ) ψ U Fϕ ψ U GFϕ ≡ ≡ ≡ ≡ ≡ ≡ ≡ ≡ Xϕ ∧ ψ Fϕ ∧ ψ Gϕ ∧ ψ XFϕ (ψ1 ∧ ψ2 ) U ϕ ϕ U (ψ1 ∧ ψ2 ) Fϕ GFϕ Xϕ ∨ Xψ Fϕ ∨ Fψ Gϕ ∨ Gψ GXϕ (ψ1 U ϕ) ∨ (ψ2 U ϕ) (ϕ U ψ1 ) ∨ (ϕ U ψ2 ) ψ U Gϕ ψ U FGϕ ≡ ≡ ≡ ≡ ≡ ≡ ≡ ≡ Xϕ ∨ ψ Fϕ ∨ ψ Gϕ ∨ ψ XGϕ (ψ1 ∨ ψ2 ) U ϕ ϕ U (ψ1 ∨ ψ2 ) Gϕ GFϕ . c]) 1 2 3 4 5 6 7 8 S ←∅ for all c ∈ δ s (c) do P←∅ for all p ∈ AP do if c ∈ ν(p) then add p to P for all β ∈ at(φ) matching P do if α. if they are satisﬁed by the same computations. Give LTL formulas deﬁning the following languages: {p. if we use TwoStack. β satisﬁes (l1) and (l2) then add c to S return S This routine can be inserted in the algorithm for the emptiness check. denoted by φ ≡ ψ. For instance. Exercise 102 Two formulas φ. q}.

Hint: Observe that. Give formulas that hold for the computations satisfying the following properties. it suﬃces to prove FGp ≡ GFGp and GFp ≡ FGFp. • p precedes q before r. AUTOMATIC VERIFICATION OF LTL FORMULAS 275 Exercise 103 Prove FGp ≡ VFGp and GFp ≡ VGFp for every sequence V ∈ {F. • after p and q eventually r. Here are two solved examples: • p is false before q: F(q) → (¬p U q). since FFp ≡ Fp and GGp ≡ Gp. q. The weak until operator W has the following semantics: • σ |= φ1 W φ2 iff there exists k ≥ 0 such that σk |= φ2 and σi |= φ1 for all 0 ≤ i < k.4. Exercise 104 (Santos Laboratory). Exercise 106 (Schwoon). or σk |= φ2 for every k ≥ 0. G}∗ of the temporal operators F and G . • p precedes q after r. and explicitely indicate your choice. If in doubt about what the property really means. r}. . give a computation that does not satisfy it. Prove: p W q ≡ Gp ∨ (p U q) ≡ F¬p → (p U q) ≡ p U (q ∨ Gp).14. • Gp → Fp • G(p → q) → (Gp → Gq) • F(p ∧ q) ↔ (Fp ∧ Fq) • ¬Fp → F¬Fp • (Gp → Fq) ↔ (p U (¬p ∨ q)) • (FGp → GFq) ↔ G(p U (¬p ∨ q)) • G(p → Xp) → (p → Gp). choose an interpretation.) If the formula is not a tautology. Exercise 105 (Santos Laboratory). Let AP = {p. Now it is your turn: • p is false between q and r. • p becomes true before q: ¬q W (p ∧ ¬q). Which of the following formulas of LTL are tautologies ? (A formula is a tautology if all computations satisfy it.

276 CHAPTER 14. APPLICATIONS I: VERIFICATION AND TEMPORAL LOGIC .

z.) The satisfaction relation (w.Chapter 15 Applications II: Monadic Second-Order Logic and Linear Arithmetic In Chapter 9 we showed that the languages expressible in monadic second-order logic on ﬁnite words were exactly the regular languages. (The mapping may also assign positions to other variables. Deﬁnition 15.1 we show that the languages expressible in monadic second-order logic on ω-words are exactly the ω-regular languages. .} be two inﬁnite sets of ﬁrst-order and second-order variables. . I) where w ∈ Σω . . a logical language for expressing properties of the integers. . . 15. a language for describing properties of real numbers with exactly the same syntax as Presburger arithmetic. constructs an NFA accepting exactly the set of interpretations of the formula. and I is a mapping that assigns every free ﬁrst-order variable x a position I(x) ∈ N and every free second-order variable X a set of positions I(X) ⊆ N.} be a ﬁnite alphabet.1 Monadic Second-Order Logic on ω-Words Monadic second-oder logic on ω-words has the same syntax as and a very similar semantics to its counterpart on ﬁnite words. Z. Let Σ = {a. . In Section 15.2 we extend this result to Linear Arithmetic. The set MSO(Σ) of monadic second-order formulas over Σ is the set of expressions generated by the grammar: ϕ := Qa (x) | x < y | x ∈ X | ¬ϕ | ϕ ∨ ϕ | ∃x ϕ | ∃X ϕ An interpretation of a formula ϕ is a pair (w. Y. and derived an algorithm that. b. In Chapter 10 we introduced Presburger Arithmetic. given a formula.} and X2 = {X. . .1 Let X1 = {x. . c. and showed how to construct for a given formula ϕ of Presburger Arithmetic an NFA Aϕ recognizing the solutions of ϕ. I) |= ϕ between a formula ϕ of MSO(Σ) and an interpretation (w. y. We show that this result can be easily extended to the case of inﬁnite words: in Section 15. I) of ϕ is deﬁned as follows: 277 .

Consider the interpretation . I) (w. If (w.1 Expressive power of MSO(Σ) on ω-words We show that the ω-languages expressible in monadic second-order logic are exactly the ω-regular languages. . . . I) as an ω-word. even a bit simpler.278CHAPTER 15. δ. be an ω-word over Σ. . qn } and L(A) = L. i ∈ Pq iff A can be in state q immediately after reading the letter ai . Given a sentence ϕ ∈ MS O(Σ). Σ. I) (w. I) and for every 0 ≤ i ≤ n. APPLICATIONS II: MONADIC SECOND-ORDER LOGIC AND LINEAR ARITHMETIC (w. Proposition 15. I) (w. Let w = a1 a2 a3 . I) |= ϕ we say that (w. I) (w. . The proof is very similar to its counterpartfor languages of ﬁnite words (Proposition 9. The language L(ϕ) of a sentence ϕ ∈ MSO(Σ) is the set L(ϕ) = {w ∈ Σω | w |= φ}. . Consider for instance a formula with ﬁrst-order variables x. and let ˆ Pq = i ∈ N | q ∈ δ(q0 . We construct a formula Visits(X0 . . Xn ) with free variables X0 . and I[S /X] is the interpretation that assigns S to X and otherwise coincides with I — whether I is deﬁned for i.12). We construct a formula ϕA such that for every w ∈ Σω . I) |= ϕ1 or (w. . I[i/x]) |= ϕ I(x) ∈ I(X) some S ⊆ N satisﬁes (w. . So we can take ϕA := ∃X0 . . Xn exactly as in Proposition 9. . a0 . q0 . ai ) . Xn ) ∧ ∀x ∃y y ∈ Xi qi ∈F We proceed to prove that MSO-deﬁnable ω-languages are regular. w |= ϕA iff w ∈ L(A). I) |= ϕ2 |w| ≥ 1 and some i ∈ N satisﬁes (w. . Proof: Let A = (Q. . I) is a model of ϕ. We start with some notations. In words. . .2 If L ⊆ Σω is regular. F) be a NBA with Q = {q0 . I) (w. I) |= ϕ (w. I[i/x] is the interpretation that assigns i to x and otherwise coincides with I. 15. We proceed as for ﬁnite words. we encode an interpretation (w. In words. . The formula has the property that I(Xi ) = Pqi holds for every model (w. ∃Xn Visits(X0 . . I) (w. . An ω-language L ⊆ Σω is MSO-deﬁnable if L = L(ϕ) for some formula ϕ ∈ MSO(Σ). then L is expressible in MSO(Σ). . y and second-order variables X.12. .1. Y. Xn ) is only true when Xi takes the value Pqi for every 0 ≤ i ≤ n. Visits(X0 . . I[S /X]) |= ϕ where w[i] is the letter of w at position i. . Two formulas are equivalent if they have the same models. I) |= |= |= |= |= |= |= Qa (x) x<y ¬ϕ ϕ1 ∨ ϕ2 ∃x ϕ x∈X ∃X ϕ iff iff iff iff iff iff iff w[I(x)] = a I(x) < I(y) (w. X or not. Then A accepts w iff m ∈ q∈F Pq .

.1 discusses how to encode real numbers as strings.. The ω-language of ϕ is Lω (ϕ) = {enc(w. Given a formula ϕ or real Presburger Arithmetic.3 Let ϕ be a formula with n free variables. I) be an interpretation of ϕ. It has the same syntax as Presburger arithmetic (see Chapter 10).. 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 . a a b a b a b a b . .. We can now prove by induction on the structure of ϕ that Lω (ϕ) is ω-regular. LINEAR ARITHMETIC 279 x→2 y→6 a(ab)ω .. but formulas are interpreted over the reals.. 1}4 Deﬁnition 15. Section 15. and Section 15. I) | (w. . .. we show how to construct a NBA Aϕ recognizing the solutions of ϕ. X → set of prime numbers Y → set of even numbers We encode it as a 0 0 0 0 a 1 0 1 1 b 0 0 1 0 a 0 0 0 1 b 0 0 1 0 a 0 1 0 1 b 0 0 1 0 a 0 0 0 1 b 0 0 0 0 x y X Y corresponding to the ω-word . I) |= ϕ}. ..3 constructs the NBA.2. instead of over the naturals or the integers. . 0 1 1 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 . 15. First we encode reals as pairs of numbers.. . We denote by enc(w.2 Linear Arithmetic Linear Arithmetic is a language for describing properties of real numbers. I) the word over the alphabet Σ × {0. . and let (w. and then these pairs as strings..1 Encoding Real Numbers We encode real numbers as inﬁnite strings in two steps. .2.15.. 1}ω described above.. 15. over Σ × {0.2. . The proof is a straightforward modiﬁcation of the proof for the case of ﬁnite words. The case of negation requires to replace the complementation operation for NFAs by the complementation operation for NBAs.

Tuples of reals are encoded using padding to make the -symbols fall on the same column. We call xI and xF the integer and fractional parts of x. 2 is encoded by (1.4 The encodings of 3 are 0∗ 011 0ω and 0∗ 010 1ω . xF ∈ [0.3. for example 2 is represented by both 2 and 1. So.g. 0). it leads to a more elegant construction. .2 also has two solutions for fractions of the form 2−k . the encodings of the integer 1 are 0∗ 01 0ω and 0∗ 0 1ω . We decompose the problem into two subproblems for .2) The only word b1 b2 b3 . Example 15. 12. we construct a NBA Aϕ accepting the encodings of the solutions of ϕ. We are not bothered by this. 2/3) encodes −1/3 (not −5/3). the encodings of the triple (−6. (In the standard decimal representation of real numbers integers also have two representations. For instance. The encodings of −3. (1. 3) are ∗ 1 0 0 1 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 1 0 0 0 0 ω 0 1 0 15. and (−1.3 are 0∗ 011 (01)ω . ∈ {0. . 1] and x = xI + xF . APPLICATIONS II: MONADIC SECOND-ORDER LOGIC AND LINEAR ARITHMETIC We encode a real number x ∈ R as a pair (xI . 1}∗ satisfying n−1 xI = i=0 ai · 2i − a0 · 2n (15. The NBA Aϕ must accept the encodings of all the tuples c ∈ Rn satisfying a · c ≤ b. . unlike Chapter 10.) We encode pairs (xI . we use the msbf instead of the lsbf encoding (this is not essential. . However.) So wI is any word wI = an an−1 . Consider now an atomic formula ϕ = a · x ≤ b.75 are 1 ∗ 100 010ω and 1 ∗ 100 001ω . or existential qunatiﬁcation. in particular.280CHAPTER 15. the encodings of 1/2 are 0∗ 0 10ω and 0∗ 0 01ω . The string wi is a 2-complement encoding of xI (see Chapter 10). where xI ∈ Z. e. 1) and (2. So. If ϕ is a negation. 1/3) encodes 4/3.9.1) The string wF is any ω-word b1 b2 b3 . . xF ). replacing the operations on NFAs and transducres by operations on NBAs. Other fractions have a unique encoding: 0∗ 0 (01)ω is the unique encoding of 1/3. for which we have xF = 1 is 1ω .3 Constructing an NBA for the Real Solutions Given a real Presburger formula ϕ. 1}ω satisfying ∞ xF = i=1 bi · 2−i (15. we proceed as in Chapter 10.. . . The encodings of 3. . ∈ {0. For instance. a0 . Equation 15. xF ) as inﬁnite strings wI wF .75. disjunction. for instance. Every integer is encoded by two diﬀerent pairs.

1).3. −2. Given c ∈ Rn . 0) and cF = (0. a · cF ≤ b − β } Example 15. or • a · cI ≤ β for some integer β ∈ [b − α+ + 1. we have α− ≤ a · cF ≤ α+ and therefore. then we can decompose the solution space of ϕ as follows: Sol(ϕ) = ∪ β+ ≤β≤β− {cI + cF | a · cI < β+ } { cI + cF | a · cI = β and .75. we get that cI + cF is a solution of ϕ iff: • a · cI ≤ b − α+ .3) (15. b − α− ] and a · cF ≤ b − β. 1). 0. 3. Since cF ∈ [0. or • 2xI − yI = −1 and 2xF − yF ≤ 1.5 We use ϕ = 2x − y ≤ 0 as running example. corresponding to.4) (15.25. corresponding to the encoding. [010 (01)ω . −3. . If we denote β+ = b − α+ + 1 and β− = b − α− . or • 2xI − yI = 0 and 2xF − yF ≤ 0.3. then we can have cI = (2. For instance. 111 010ω .3. We have α+ = 2 α− = 0 + = −1 β− = 0 . then a · cI + a · cF ≤ b − (15. [010 (01)ω . 0. if a = (1. 0 1ω ] . β So x. 1) and cF = (0. for instance.3-15. let cI and cF be the integer and fractional part of c for some encoding of c. −2. if c is a solution of ϕ. Let α+ (α− ) be the sum of the positive (negative) components of a. −1) then α+ = 4 and α− = −3. say.25. 01 0ω ] or cI = (2. y ∈ R is a solution of ϕ iff: • 2xI − yI ≤ −2. 0. if c = (2. −3. 11111 001ω . 1]n . CONSTRUCTING AN NBA FOR THE REAL SOLUTIONS 281 integer and fractional parts. say.15.5) a · cI ≤ b − α Putting together 15. 0).5.3.

0 0 1 0 1 0 1 .6) . 0 1 0 0 0 . 15. Recall that both algorithms use the lsbf encoding. 0 1 1 0 −2 0 0 . 0 1 1 1 . We choose transitions and ﬁnal states so that the following property holds: State q ∈ Z recognizes the encodings of the tuples cF ∈ [0. . and therefore: (15. and exchange the initial and ﬁnal states: the new automaton recognizes a word w iff the old one recognizes its reverse w−1 . Example 15. 0 1 0 0 0 . The NFA on the right is obtained by reversing the transition. 1]n such that a · cF ≤ q. 0 1 0 0 .282CHAPTER 15. also in lsbf and msbf encoding. APPLICATIONS II: MONADIC SECOND-ORDER LOGIC AND LINEAR ARITHMETIC The solutions of a·cI < β+ and a·cI = β can be computed using algorithms IneqZNFA and EqZNFA of Section 10.1 A NBA for the Solutions of a · xF ≤ β We construct a DBA recognizing the solutions of a · xF ≤ β. 0 1 −1 0 1 1 1 . but it is easy to transform their output into NFAs for the msbf encoding: since the algorithms deliver NFAs with exactly one ﬁnal state. 1]n . it suﬃces to reverse the transitions of the NFA. The states of the DBA are integers.3. 0 0 1 qf qf −1 1 1 .6 Figure 15. and exchanging the initial and ﬁnal state.2.3. 0 1 Figure 15. 0 1 1 0 1 1 . 0 1 0 1 1 . and so it recognizes exactly the msbf-encodings.1 shows NFAs for the solutions of 2xI − yI ≤ −2 in lsbf (left) and msbf encoding (right). Figure 15. The algorithm is similar to AFtoNFA in Section 10. 0 1 0 1 1 . However. 0 1 0 1 −2 1 0 1 0 0 1 . .1: NFAs for the solutions of 2x−y ≤ −2 over Z with lbsf (left) and msbf (right) encodings.2 shows NFAs for the solutions of 2xI − yI = −1. recall that α− ≤ a · cF ≤ α+ for every cF ∈ [0.

7 Figure 15.3. Since α+ = 2 and α− = −1. • all states q ≥ α+ accept all tuples of reals in [0. then q = none. the possible states of the DBA (not all of them may be reachable from the initial state) are all . ζ) = q − a · ζ if α− ≤ q − a · ζ ≤ α+ − 1 all if q = all or α+ − 1 < q − a · ζ Example 15. 1}n . the possible states of the DBA are all. and so we deﬁne the transition 2 2 function of the DBA as follows: none if q = none or q − a · ζ < α− δ(q.15. and the initial state is β.3 shows the DBA for the solutions of 2xF − yF ≤ 1 (the state none has been omitted). Given a state q and a letter ζ ∈ {0. Let us now deﬁne the set of transitions. and if q = none. if v < α− . CONSTRUCTING AN NBA FOR THE REAL SOLUTIONS 1 1 1 1 1 1 1 0 0 0 1 1 283 −1 0 1 −1 0 1 1 0 qf qf 0 0 Figure 15. 1]n . we set q = all. to fulﬁl property 15.6. Clearly. α+ − 1]. let us determine the target state q of the unique transition q − q . recall that a word w ∈ ({0.2: NFAs for the solutions of 2x−y = −1 over Z with lbsf (left) and msbf (right) encodings. and 0 0 .6 we must choose v so that a · ( 1 ζ + 1 c ) ≤ q holds iff a · c ≤ v holds. none . We get v = q − a · ζ. and can be merged with the state α+ . If q ∈ Z. and then: If v ∈ [α− . All these states but none are ﬁnal. and can be merged with the state α− − 1. Calling these two merged states all and none. and {q ∈ Z | α− ≤ q ≤ α+ − 1} . 1}n )∗ is accepted from q iff the word ζw is accepted from q. none. To compute v. and if q > α+ − 1. we compute − → the value v that q must have in order to satisfy property 15. So the tuple c ∈ Rn encoded by w and the tuple c ∈ Rn of real numbers encoded by ζw are linked by the equation 1 1 c= ζ+ c 2 2 (15. we set q = none. if q = all.7) ζ Since c is accepted from q iff c is accepted by q. 1]n . • all states q < α− accept no tuples in [0. then q = all. we set q = v.

Figure 15. 0... respectively. with 0 or −1 as initial state. This allows us to immediately derive the DBAs for 2xF − yF ≤ 0 or 2xF − yF ≤ −1: it is the DBA of Figure 15. α+ = 2 and α− = −1. 0 0 0 1 0 1 . APPLICATIONS II: MONADIC SECOND-ORDER LOGIC AND LINEAR ARITHMETIC −1. Let us determine the target state of the transitions leaving state 1. Recall that (x.4 shows at the top a DBA for the x. a state q ∈ Z accepts the encodings of the pairs (xF .3: DBA for the solutions of 2x − y ≤ 1 over [0. 1]n such that 2xF − yF ≤ q. It is easily obtained from the NFA for the solutions of 2xI − yI ≤ −2 shown on the right of Figure 15.. ζ) = −1 0 if ζ x if ζ x if ζ x if ζ x = 0 and ζy = 0 and ζy = 1 and ζy = 1 and ζy =0 =1 =0 =1 if if if 2ζ x + ζy < −2 −2 ≤ −2ζ x + ζy ≤ 0 0 < −2ζ x + ζy Recall that. ζ) with q = 1. ζ) = 1 − 2ζ x + ζy all which leads to 1 all δ(1..1. by property 15. y) ∈ R2 is a solution of ϕ iff: (i) 2xI − yI ≤ −2. 0 1 1 0 1 1 1 all 1 0 1 1 0 0 0 0 1 −1 0 0 Figure 15. Example 15. or (iii) 2xI − yI = 0 and 2xF − yF ≤ 0. The initial state is 1. We instantiate the deﬁnition of δ(q. 1].8 Consider again ϕ = 2x − y ≤ 0. and get none δ(1.3. or (ii) 2xI − yI = −1 and 2xF − yF ≤ 1.284CHAPTER 15. 1] × [0.6. yF ) ∈ [0. . 1. y satisfying (i).

0 1 0 1 1 . . CONSTRUCTING AN NBA FOR THE REAL SOLUTIONS 285 The DBA at the bottom of Figure 15. To construct it.2. Simarly with the DBA. which is adequate for (ii). 0 1 −2 1 0 1 1 .4 recognizes the x. . adequate for (iii). 0 1 0 1 . and state 0 of the DFA to state 0 of the DBA. 0 1 1 1 1 1 0 1 0 0 0 0 −1 1 0 0 1 1 1 1 0 1 1 0 1 1 0 1 . The DFA recognizes the integer solutions of 2xI − yI = −1. 0 0 1 0 1 0 1 .3. we “concatenate” the DFA on the right of Figure 15.4: DBA for the real solutions of 2x − y ≤ 0 satisfying (i) (top) and (ii) or (iii) (bottom). but changing the ﬁnal state to 0 we get a DFA for the integer solutions of 2xI − yI = 0. 0 1 0 0 0 0 0 1 −1 0 0 Figure 15. and the DBA of Figure 15..3. 0 1 1 0 0 0 . 0 1 0 0 0 ... 0 1 −1 1 1 .. y ∈ R satisfying (ii) or (iii).15. and so it suﬃces to link state −1 of the DFA to state 1 of the DBA.

- InheritanceUploaded bydavid_benavidez_3
- Flat It Gate 2Uploaded byMohan Babu
- CR11 WarrantsUploaded byLaert Velia
- 5_re1Uploaded byHuong Luong
- MELJUN CORTES AUTOMATA_LECTURE_Equivalence of Regular Expressions and Finite Automata_Part 2_1Uploaded byMELJUN CORTES, MBA,MPA
- Theory of Computation SlidesUploaded byKatrina Kopio
- 4-nfaUploaded byAtif Rehman
- 10.1.1.37.307Uploaded byOluponmile Adeola
- r7311502-Automata and Compiler DesignUploaded bysivabharathamurthy
- These Spec Relationnelles EnUploaded byLuc Pham
- What Happened to Formal MethodsUploaded byAlistair Atkinson
- MBK on MeillassouxUploaded byTerence Blake
- Afrikanization Is the Real DigitalizationUploaded byKumwaga Ra-Nu Kiganga KemPtah
- cd0309Uploaded byimsuguna
- hw1Uploaded byAbe F
- Logic ModuleUploaded byMatt
- 14. Logical Notation, AckytoffvuxomiUploaded byArif Wahidin
- CLAIMS.pdfUploaded byEzee Peezee
- MathematicalUploaded byliewqiying3354
- Basic Binary Basic Built in Behavior Resulting in RealityUploaded bysfofoby
- Vogel1Uploaded byAndres Lopez
- MinorCSEUploaded byRoshan Dash
- Modeling Simulation and Operation ResearchUploaded byHomik Soni
- GreekG EometricaAl Nafysis - Norman GulleyUploaded byFelipe
- Aristotle on Practical Truth: Coherence vs. Correspondence?Uploaded byVictor Gonçalves de Sousa
- RatioSyllabusLogicaSymTrad.docUploaded bySENORTUPSI
- Calculation Thinking Computational ThinkingUploaded byadrip01
- Nils Bertschinger - Autonomy An Information – Theoretic PerspectiveUploaded bydrleonunes
- Pneutrônica - Fluidsim e CLP - Conveyor StationUploaded byMelanie Powell
- Such a Constraint Means That It is Not Possible for All A1,...,An to Be True and at the Same Time All OfUploaded byrataserder