You are on page 1of 90

Finding Transitive Approximations of

Directed Graphs
DIPLOMARBEIT

zur Erlangung des akademischen Grades

Diplom-Informatiker

FRIEDRICH-SCHILLER-UNIVERSITÄT JENA

Fakultät für Mathematik und Informatik

eingereicht von Mathias Weller

geb. am 24.02.1983 in Königs Wusterhausen

Betreuer: Christian Komusiewicz


Prof. Dr. Rolf Niedermeier
Johannes Uhlmann

Jena, 11.02.2009
Abstract
In Bioinformatics, the task of hierarchically classifying diseases with
noisy data recently led to studying the Transitivity Editing prob-
lem, which is to change a given digraph by adding and removing a
minimum number of arcs such that the resulting digraph is transitive.
We show that both Transitivity Editing and Transitivity Dele-
tion, which does not allow the insertion of arcs, are NP-complete even
when restricted to DAGs. We provide polynomial-time executable data
reduction rules that yield an O(k 2 )-vertex kernel for general digraphs
and an O(k)-vertex kernel for digraphs of bounded degree. Further-
more, a heuristic approach and a search tree algorithm are presented.
We show an asymptotic running time of O(2.57k + n3 ) for Transitiv-
ity Editing and O(2k + n3 ) for Transitivity Deletion.

Im Bereich der Bioinformatik wurde vor kurzem im Zusammen-


hang mit hierarchischer Klassifizierung von Krankheiten das Prob-
lem untersucht, einen gegebenen, gerichteten Graphen mit so wenig
Kantenmodifikationen wie möglich so zu verändern, dass der resul-
tierende gerichtete Graph transitiv ist. Sind als Kantenmodifikationen
das Einfügen und das Löschen möglich, so bezeichnen wir dieses Prob-
lem als Transitivity Editing. Ist ausschließlich das Löschen von
Kanten erlaubt, so bezeichnen wir das Problem als Transtitivity
Deletion. In dieser Arbeit wird gezeigt, dass Transitivity Editing
und Transitivity Deletion NP-vollständig sind, selbst wenn der
gegebene Digraph azyklisch ist. Wir geben Regeln zur Datenreduktion
an, die eine Kernelisierung erlauben. Die Zahl der Knoten im Kernel ist
linear, falls der Knotengrad des gegebenen gerichteten Graphen durch
eine Konstante beschränkt ist, andernfalls quadratisch. Desweiteren
präsentieren wir zwei Algorithmen zur Lösung der beiden genannten
Probleme: Einen heuristischen und einen Suchbaumalgorithmus. Wir
zeigen, dass Transitivity Editing in der Laufzeit O(2.57k + n3 ) und
Transitivity Deletion in der Laufzeit O(2k + n3 ) gelöst werden
kann.
Contents
1 Introduction 3

2 Preliminaries 5
2.1 Basic Definitions and Notations . . . . . . . . . . . . . . . . . 5
2.2 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Transitivity Editing . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Graph Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Structure of Optimal Solution Sets 15

4 Computational Complexity 20
4.1 Complexity of Transitivity Editing . . . . . . . . . . . . . . . 20
4.2 Complexity of Acyclic Transitivity Editing . . . . . . . . . . . 26

5 Polynomial-Time Data Reduction 38


5.1 Kernelization for Transitivity Editing . . . . . . . . . . . . . . 39
5.2 Further Data Reduction . . . . . . . . . . . . . . . . . . . . . 50
5.2.1 Isolating Peripheral Arcs . . . . . . . . . . . . . . . . 50
5.2.2 Kernel Consideration. . . . . . . . . . . . . . . . . . . 62

6 Search Tree Algorithm 64

7 Heuristics 71

8 Experimental Results 78
8.1 Employed Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.2 Results and Interpretation . . . . . . . . . . . . . . . . . . . . 80

9 Conclusion 85
1 INTRODUCTION 3

1 Introduction
Finding and highlighting structure in graph modeled data has been a compu-
tational problem since automated data collection became possible. Machine
learning, data mining, pattern recognition, image analysis, genome assembly,
automatic data reduction, and chip design are examples of fields where such
tasks are common. Examples for such structures range from induced cycles
of even or odd length over paths and vertices of special degree to complex
subgraphs such as complete bipartite graphs of certain size. In molecular
diagnostics, the task of finding hierarchical disease classifications based on
noisy data was recently considered by Jacob et al. [JJK+ 08]: A group of
patients that share a disease is analyzed for the possibility of hierarchically
classifying the disease in a scheme of sub-diseases, based on molecular char-
acteristics [HBB+ 06]. A threshold is applied to deduce that a certain disease
is a sub-disease of some other: If the ratio of patients with some feature B
that also exhibit feature A is beyond a given threshold, then (A, B) is an arc
in the disease hierarchy. Naturally, such measurements and classifications
are subject to errors that may result in noisy data. Although the relation
of being a sub-disease of some other disease should logically be transitive,
this noise may cause inconsistencies in the resulting data set. For instance,
a disease A is found to be a sub-disease of B which is in turn a sub-disease
of C, but the measured data does not indicate that A is a sub-disease of C.
To be able to work with the resulting data, we wish to eliminate as much
noise as possible from the data set, that is, we want to find a consistent dis-
ease hierarchy that is closest to the measured data. When considering the
data as a directed graph, we insert and delete arcs from it until transitivity
is achieved. Obviously, we must assume the error to be small, otherwise, we
may reconstruct almost any hierarchical structure from the data.
The central problem in this work is the Transitivity Editing prob-
lem, which is to find the closest transitive digraph to a given digraph. It
is highly related to the Cluster Editing problem, which is to find the
closest transitive undirected graph for a given undirected graph. It has
been shown that Cluster Editing is NP-complete [KM86]. We show
that the Transitivity Editing problem is also NP-complete and remains
NP-complete even when restricted to acyclic digraphs (DAGs) or digraphs
of maximum degree four. Another related problem is the Comparabil-
ity Editing problem, which asks whether a given undirected graph can
be transformed by a limited number of operations such that the remaining
graph can be transitively ordered. Comparability Editing has also been
shown to be NP-complete [NSS01]. In this work, we only briefly address
1 INTRODUCTION 4

this problem.
Bearing in mind that, since the measurement error is small, the edit-
ing distance to the closest transitive digraph is small, a fixed-parameter
approach is promising. In parameterized complexity, problems are ana-
lyzed with regard to parameters other than the input size. Often, it can be
shown that problems become efficiently solvable when considering a param-
eter that is small in comparison to the input size. In this context, Böcker
et al. [BBK09] have shown that Transitivity Editing is fixed-parameter
tractable by providing a search tree algorithm that runs in O(3k · n3 ) time
with n denoting the number of vertices in the input graph and k denot-
ing the minimum number of arc modifications. They also presented a very
fast Integer Linear Programming (ILP) implementation of Transi-
tivity Editing based on the ILP implementation of Cluster Editing.
However, the existence of a polynomial-size problem kernel was posed as
an open problem. By constructing a problem kernel that contains O(k 2 )
vertices and improving the branching vector of the search tree algorithm,
we show that Transitivity Editing can be solved in O(2.57k + n3 ) time,
improving the previous result by Böcker et al. [BBK09].
Apart from the exact search tree algorithm, we present a heuristic that
computes a transitive digraph that is relatively close but not necessarily
optimal in terms of edit distance. However, the algorithm can provide an
upper bound for the edit distance of the given digraph and, although possibly
more inaccurate, runs asymptotically faster than any known exact algorithm
and a previous heuristic algorithm by Jacob et al. [JJK+ 08]. As a novelty,
we study the problem of Transitivity Deletion, which asks whether the
given digraph can be turned transitive with a given number of arc deletions.
We show its NP-completeness and present a search tree algorithm that runs
in O(2k + n3 ) time based on a kernelization that leaves a problem kernel
of O(k 2 ) vertices. We only briefly address the problem of Transitivity
Completion, which is to calculate the transitive closure of a given digraph.
2 PRELIMINARIES 5

2 Preliminaries
In the following, we give a survey of definitions and graph-theoretic facts
needed in later sections. Complexity results and techniques are introduced
and a collection of classes of digraphs and problems including the Transi-
tivity Editing problem and some of its variants are presented.

2.1 Basic Definitions and Notations


A directed graph or digraph is a pair (V, A) with A ⊆ V × V . The set V con-
tains the vertices of the digraph, while A contains the arcs. The symmetric
difference of two sets of arcs A and A0 is A∆A0 := (A ∪ A0 )\(A ∩ A0 ). The
power set of a set V of vertices is P(V ) := {V 0 | V 0 ⊆ V }. A digraph is called
simple if there is no v ∈ V with (v, v) ∈ A, that is, the relation A is irreflex-
ive. In this work, we only consider simple digraphs. Let D = (V, A) be a
digraph and u ∈ V , then predA (u) := {v ∈ V | (v, u) ∈ A} denotes the set of
predecessors of u with respect to D, while succA (u) := {v ∈ V | (u, v) ∈ A}
denotes its successors. The vertices in predA (u) ∪ succA (u) are said to be
adjacent to u. These operators are also meaningful for sets of vertices:
[ [
predA V 0 := predA (v) \V 0 succA V 0 := succA (v) \V 0 .
 

v∈V 0 v∈V 0

Furthermore, indegA (u) := |predA (u)| denotes the indegree of the vertex u
in D and outdegA (u) := |succA (u)| denotes the outdegree of the vertex u
in D. Note that
X X
indegA (u) = outdegA (u) = |A| .
u∈V u∈V

Let u ∈ V . If indegA (u) = 0 then u is called a source, and if outdegA (u) = 0,


then u is called a sink. The degree of u is defined as degA (u) := indegA (u)+
outdegA (u). Finally, maxv∈V degA (v) is called the maximum degree of D.
An arc (u, v) is said to be incoming to v and outgoing of u and incident
to both u and v. The arc (u, v)−1 := (v, u) is said to be the opposite arc
of (u, v). Furthermore, the arc (u, v) is adjacent to the arc (v, w). Let P =
(v0 , v1 , . . . , vm−1 ) be a sequence of vertices of V . If

∀0≤i<m−1 (vi , vi+1 ) ∈ A,

then P is called a path of length m in D. Note that a path of length m ≥ 3


with v0 = vm−1 is called a (directed) cycle. A vertex v is reachable from a
2 PRELIMINARIES 6

vertex v in D = (V, A) if there is a path from u to v in D:


u →D w ⇔def ∃path P in D (v0 = u ∧ vm−1 = w) .
Furthermore, the set of all vertices that are reachable from u in A is denoted
by
reachD (u) := {w | u →D w} .
Note that each vertex is reachable from itself. For a vertex set V 0 ⊆ V , we
call D[V 0 ] := (V, A ∩ (V 0 × V 0 )) the subgraph of D that is induced by V 0 .
Furthermore, for some set Z ⊆ V and some vertex z ∈ V , the terms D − Z
and D − z are abbreviations for D[V \Z] and D[V \{z}], respectively.

P3 s and Diamonds. In a directed graph D = (V, A), an induced P3


in D is a triple (u, v, w) of distinct vertices, such that (u, v) ∈ A, (v, w) ∈ A,
and (u, w) 6∈ A. Note that, since we do not consider P3 s that are not induced,
we will simply refer to induced P3 s as P3 s. We say that the P3 (u, v, w)
contains the arcs (u, v) and (v, w) and the vertices u, v, and w. In this
context, the vertex v is called the middle of this P3 . Two P3 s (u, v, w)
and (x, y, z) are called (arc-) disjoint, if
{(u, v), (v, w), (u, w)} ∩ {(x, y), (y, z), (x, z)} = ∅.
For example, if u, v, w, x, z are pairwise distinct, then (u, v, w) and (x, u, z)
are disjoint, while (u, v, w) and (u, y, w) are not.
An important structural element is the “diamond”. In Section 3, we will
see that its absence in a given digraph simplifies the Transitivity Editing
problem. This enables us to improve the branching vector of the search tree
algorithm that is presented in Section 6. Furthermore, it allows for easier
analysis of the gadget construction in the NP-completeness proof presented
in Section 4. A (directed) diamond in a digraph D is a triple (u, Z, v) ∈
V × P(V ) × V with (u, z), (z, v) ∈ A for all z ∈ Z and |Z| = 2. In this
context, Z is called the belt of the diamond, u and v are the head and the
tail, respectively. If D does not contain a diamond, then it is said to be
(directed-)diamond-free. Since we do not consider any diamond structures
other than directed diamonds, we will omit the prefix.

Connectivity. As opposed to undirected graphs, there are two types of


connectivity for directed graphs. Let G(D) = (V, E) denote the underlying
undirected graph that is obtained from the directed graph D by removing
the direction of all arcs, that is,
{u, v} ∈ E ⇔ ((u, v) ∈ A ∨ (v, u) ∈ A) .
2 PRELIMINARIES 7

A subgraph D0 = D[V 0 ] of D is called weakly connected if G(D0 ) is a con-


nected graph. It is called strongly connected or strong, if for each pair (u, v) ∈
V 0 × V 0 , there is a path from u to v and a path from v to u in D0 . The
maximal strongly connected subgraphs of D are its strong components. A
digraph D = (V, A) that contains all possible arcs, that is, A = V × V , is
called complete digraph.

2.2 Complexity
In this section, we give an introduction to the terms and ideas that are
related to combinatorially hard problems like Transitivity Editing. In
particular, we give a brief overview of NP-completeness and parameterized
complexity as well as kernelization and approximation.

NP-completeness. In computer science, an algorithm is called determin-


istic if each step is determined only by prior steps and the input data. A
deterministic algorithm is called efficient if its asymptotic running time is
bounded by a polynomial in the size of the input data. The set of problems
that are solvable efficiently is denoted by P, while NP denotes the set of
problems whose solutions are efficiently verifiable. Let A and B be prob-
lems in NP and let f : IA → IB be a function with IA denoting the set of all
inputs to A and IB denoting the set of all inputs to B. Then f is called a re-
duction from A to B if for any input d ∈ IA , it holds that d ∈ A ⇔ f (d) ∈ B
and the computation of f (d) is deterministic and efficient. If such a function
exists then A is called reducible to B. So the question whether d ∈ A can be
answered by applying the reduction f to d and testing whether f (d) ∈ B. If
such a function exists for two problems A and B, then A is called reducible
to B. Note that if f (d) ∈ B can be determined efficiently, so can d ∈ A. Also
note that the binary reducible-relation is transitive, meaning that if A is re-
ducible to B and B is reducible to some problem C, then A is also reducible
to C. A problem Q is called NP-hard (or “intractable”), if all problems
in NP can be reduced to it, and NP-complete if Q is NP-hard and in NP.
Hence, if Q was solvable efficiently, all problems in NP would be. It is yet
unknown whether it is possible to find efficient algorithms for NP-complete
problems. This is called the P vs. NP problem.

Parameterized Complexity. Although being intractable, numerous NP-


hard problems occur in practice and need to be solved. Hence, in many
applications, it may be of high interest to ask whether these problems have
2 PRELIMINARIES 8

deterministic algorithms that are exponential only with respect to a param-


eter that may be “small” in comparison to the input size. Parameterized
complexity theory is a recent approach to cope with these problems by pro-
viding a framework for a refined analysis [DF99, FG06, Nie06]. As opposed
to measuring the complexity of a problem only in terms of the input size, as
classical complexity theory does, the parameterized complexity of a problem
may depend on multiple characteristics of the input. Thus the complexity
of a problem can be explored from different points of view. A canonical ap-
proach is, for example, to analyze the complexity of an algorithm in terms
of the size of the input and the output. Other popular parameter types of
graph algorithms in particular are path- and treewidth, vertex-cover num-
ber, maximum degree, etc.

Fixed-Parameter Tractability. In parameterized complexity, a prob-


lem is said to be in FPT or fixed-parameter tractable with respect to some
parameter k if the problem can be solved in f (k)·poly(n) time, with poly(n)
being a polynomial in n and f (k) being an arbitrary, but computable (likely
superpolynomial) function in k that is independent of n. For instance, the
NP-complete Vertex Cover problem is:

Vertex Cover:
Input: An undirected graph G = (V, E) and an integer k ≥ 0.
Question: Is there some set C ⊆ V such that each edge in E
has at least one endpoint in C and |C| ≤ k?

Vertex Cover can be solved in O(1.2738k + kn) time [CKX06], where the
parameter k is a bound on the maximum size of the vertex cover set we are
looking for and n is the number of vertices of the given graph (throughout
this work, n always refers to the number of vertices in the input graph).
The best known “non-parameterized” solution for Vertex Cover is due
to Robson [Rob86, Rob01]. He showed that Independent Set and thus,
Vertex Cover can be solved in O(1.19n ) time. However, for k ≤ 0.71n,
the above mentioned fixed-parameter solution turns out to be better.

Kernelization. A practical approach that may help to solve NP-hard


problems is polynomial-time preprocessing, which aims at transforming the
input by repeatedly applying some efficient (polynomial-time executable)
rules [GN07]. A kernelization is a polynomial-time algorithm that trans-
forms a given instance I with parameter k of a problem Q into a new in-
stance I 0 with parameter k 0 ≤ k of Q such that the original instance I is
2 PRELIMINARIES 9

a yes-instance with parameter k if and only if the new instance I 0 is a yes-


instance with parameter k 0 and |I| ≤ g(k) for a function g. The instance I 0
is called the problem kernel [DF99, FG06, Nie06]. Kernelization enables us
to improve the running time of search tree algorithms. Suppose we have a
search-tree traversing algorithm to a problem in FPT that runs on a search
tree of size O(αk ) and a kernelization that reduces the input to a kernel of
polynomial size q(k) in polynomial time P (|I|). If we apply the kernelization
algorithm if and only if |I| > c · q(k) for some constant c, then the running
time of the search tree algorithm can be bounded by O(αk + P (|I|)) instead
of O(αk · poly(n)). This technique is known as interleaving [NR00].

Approximation. If we are satisfied with having a solution that is close


to optimal, we can sometimes find efficient algorithms for its calculation. A
minimization problem A is in PTAS (admits a polynomial-time approxima-
tion scheme) if there is an algorithm for A that, for any constant  > 0,
computes a solution that is within a factor 1 +  of being optimal and runs
in polynomial time. For instance, if S is the solution of the approximation
algorithm and Sopt is an optimal solution, then |S| ≤ (1 + ) |Sopt | (and
analogously, |S| ≥ (1 − ) |Sopt | for maximization problems). If there is an
algorithm that produces a solution that is always within a constant factor
of being optimal, then it is in APX (has a constant-factor approximation).
Analogously to NP-hardness, a problem is called APX-hard, if there is an
approximation preserving reduction from every problem in APX to that
problem. Note that PTAS⊆APX [ACK+ 99].

2.3 Transitivity Editing


This section gives an introduction to transitivity in digraphs and the prob-
lem of Transitivity Editing and its variants Transitivity Deletion,
Transitivity Completion, and Transitivity Vertex Deletion. Fi-
nally, some results about the transitive closure are presented.

Transitive Digraphs. The set A of arcs of a directed graph D = (V, A)


can be considered as a relation on V × V . If this relation is transitive, then
the digraph is called transitive.

Definition 2.1. A digraph D = (V, A) is called transitive if

∀u,v,w∈V ((u, v) ∈ A ∧ (v, w) ∈ A) ⇒ (u, w) ∈ A.


2 PRELIMINARIES 10

As previously done with Cluster Editing [GGHN05], we characterize


transitivity via a forbidden subgraph.

Lemma 2.2. A digraph D = (V, A) is transitive iff it does not contain a P3 .

Proof. The following statements are equivalent:

D contains a P3
⇔∃u,v,w∈V (u, v) ∈ A ∧ (v, w) ∈ A ∧ (u, w) 6∈ A
⇔∃u,v,w∈V ¬(¬((u, v) ∈ A ∧ (v, w) ∈ A) ∨ (u, w) ∈ A)

and since a ⇒ b is equal to ¬a ∨ b,

⇔¬ (∀u,v,w∈V ((u, v) ∈ A ∧ (v, w) ∈ A) ⇒ (u, w) ∈ A)


⇔D is not transitive

Lemma 2.2 allows us to use the terms “P3 -free” and “transitive” syn-
onymously. Furthermore, in a transitive digraph, the arcs from a vertex u
to all vertices v that are reachable from u are present, since otherwise there
would be some vertex w with (u, w, v) being a P3 . Recall that a digraph is
strong, if all vertices of it are reachable from any of its vertices. This implies
that, if a transitive digraph is strong, then it is a complete digraph.

Transitivity Editing. In applied sciences, sometimes a collection of data


that is known to describe a transitive relation may contain intransitive sub-
structures due to measurement errors or inaccuracy. An example is hierar-
chical disease classification based on noisy data: For a group of patients that
share a disease, we have to assign each patient to a well-defined subgroup in
a hierarchical classification scheme of sub-diseases, based on molecular char-
acteristics of the patient. The task is to deduce the hierarchical structure in
the integrated noisy data sets by an automated approach [BBK09, HBB+ 06].
The problem of Transitivity Editing is finding a transitive relation that
is closest to the measured data.

Transitivity Editing:
Input: A directed graph D = (V, A) and an integer k ≥ 0.
Question: Is there a directed graph D0 = (V, A0 ) that is transi-
tive and |A∆A0 | ≤ k?
2 PRELIMINARIES 11

Figure 1: An example for the Transitivity Editing problem: The di-


graph D on the left is not transitive. However, by deleting three of its
arcs and inserting another it can become transitive. Hence, (D, 4) is a yes-
instance of Transitivity Editing.

In this context, the set S := A∆A0 is called solution set and contains all
arcs that are inserted into or deleted from D. There are several interesting
properties of solution sets described in Section 3. An example for the Tran-
sitivity Editing problem is shown in Figure 1. When considering graph
editing problems it is also interesting to consider versions of the problem
that are limited to insertion or deletion, respectively.

Transitivity Deletion:
Input: A directed graph D = (V, A) and an integer k ≥ 0.
Question: Is there a directed graph D0 = (V, A0 ) that is transi-
tive with A0 ⊆ A and |A\A0 | ≤ k?

Note that the Transitivity Completion problem, which is defined anal-


ogously, is equal to finding the transitive closure of a digraph, which is
discussed later in this section.
Another way to turn a given digraph transitive is to simply ignore certain
subsets of data resulting in an induced subgraph of D.

Transitivity Vertex Deletion:


Input: A directed graph D = (V, A) and an integer k ≥ 0.
Question: Is there a set V 0 ⊆ V such that D[V 0 ] is transitive
and |V \V 0 | ≤ k?

Solution Sets. As mentioned before, if D = (V, A) is a digraph and D0 =


(V, A0 ) is transitive, then S := A∆A0 is called solution set for D. If not
explicitly stated otherwise, all solution sets in this work relate to the Tran-
sitivity Editing problem. The set SDEL := S ∩A contains all arcs that are
2 PRELIMINARIES 12

deleted from D, while SINS := S\A contains all arcs that are inserted into D.
The arcs in SDEL and SINS may sometimes be called delete operations or
insert operations, respectively. Note that, obviously, SINS ∩ SDEL = ∅. Ap-
plying S to D results in (V, A∆S). Note that all operations of S can be
applied in arbitrary order, as long as all operations in S are applied.

Definition 2.3. Let D be a digraph. A solution set S for D is called optimal


if there is no solution set S 0 for D with |S 0 | < |S|.

Note that there may still be many different optimal solution sets for a
single digraph. We discuss optimal solution sets in detail in Section 3.

Transitive Closure. To obtain a transitive digraph from an arbitrary


digraph D, one may repeatedly find a P3 (u, v, w) and insert (u, w) into D
until it is transitive.

Definition 2.4. The transitive closure of a given digraph D = (V, A) is a


digraph D+ := (V, A+ ) with

∀u,v∈V (u, v) ∈ A+ ⇔ v ∈ reachD (u) ,

meaning that there is an arc from u to v in D+ iff v is reachable from u


in D.

Obviously, A ⊆ A+ and, thus, the transitive closure of a strong digraph


is always a complete digraph. In computer science, the construction of the
transitive closure is used as preprocessing to answer reachability queries
in O(1) time. The transitive closure can be constructed using Boolean ma-
trix multiplication [Mun71]. Using the currently asymptotically best matrix
multiplication algorithm yields an overall time of O(n2.376 ) [CW90]. Cal-
culating the transitive closure is obviously a method to recognize transitive
digraphs. However, it is yet unknown whether transitive digraphs can be
recognized faster than calculating their transitive closure [MS90]. Note that,
for a digraph D, the smallest graph D0 with D+ = D0+ is called the transitive
reduction of D.

2.4 Graph Classes


In this section, we present a number of graph classes that have interesting
properties with regard to Transitivity Editing or Transitivity Dele-
tion. Most of the presented graphs are directed.
2 PRELIMINARIES 13

Acyclic Digraphs. As mentioned before, a cycle is a path that starts


and ends in the same vertex. If D does not contain a cycle, it is said to
be a directed acyclic graph, or short DAG. Note that DAGs have at least
one source and at least one sink [BJG08]. The problem of turning a digraph
acyclic with the minimum number of possible arc deletions is very prominent
in theoretical computer science.

Feedback Arc Set:


Input: A directed graph D = (V, A) and an integer k ≥ 0.
Question: Is there a DAG D0 = (V, A0 ) such that |A∆A0 | ≤ k?

A complexity overview of this problem is given later in this section.

Lemma 2.5. If D = (V, A) is a digraph with {a | a−1 ∈ A} ∩ A = ∅ and S


is an optimal solution set for D with S ⊆ A, then D0 = (V, A\S) is a DAG.

Proof. Suppose there was a directed cycle C = (VC , AC ) with |VC | ≥ 3 in D0 .


Since D0 is transitive, we know that D0 [VC ] is a complete digraph. Hence,
for all arcs (a, b) ∈ AC we know that (b, a) ∈ AC . However, since S ⊆ A, we
know that neither (a, b) nor (b, a) are inserted. Hence, both arcs must also
be in A, contradicting {a | a−1 ∈ A} ∩ A = ∅.

Consider a digraph D that satisfies the conditions of Lemma 2.5 and an


optimal solution set S for D. Since applying S to D yields a DAG, we also
know that D can be turned acyclic with |S| arc deletions.

Corollary 2.6. If a digraph D satisfies the conditions of Lemma 2.5, then

∀k∈N (D, k) ∈ Transitivity Deletion ⇒ (D, k) ∈ Feedback Arc Set.

On the other hand, not every DAG is transitive, moreover, we will see
that Transitivity Editing on DAGs is NP-complete. Figure 2 shows that
there is a digraph D, such that

∃k∈N (D, k) ∈ Transitivity Deletion 6⇐ (D, k) ∈ Feedback Arc Set.

Tournaments. Another class of directed graphs that have interesting


properties with regard to transitivity are tournaments.

Definition 2.7. A tournament is a directed graph with

∀u,v∈V (u, v) ∈ A ⇔ (v, u) 6∈ A.


2 PRELIMINARIES 14

Figure 2: A digraph D that can be turned acyclic by removing the ver-


tical arc. However, there is no arc whose removal alone is enough to
turn D transitive. Hence (D, 1) ∈ Feedback Arc Set and (D, 1) 6∈
Transitivity Deletion.

A tournament can be obtained by orienting all edges of a complete undi-


rected graph and can be viewed as the result of a round-robin competition
without draws: Each vertex represents a participant and the arc connecting
each two participants is ordered towards the winner of a direct comparison.

Lemma 2.8. A tournament is transitive iff it contains no cycle.

Proof. We show that a tournament D = (V, A) contains a P3 (u, v, w) iff it


contains a cycle C.
“⇒”: Obviously, (u, v) ∈ A, (v, w) ∈ A, and (u, w) 6∈ A. However,
since D is a tournament, we know that (w, u) ∈ A and thus C = (u, v, w, u)
is a directed cycle in D.
“⇐”: Let C = (v0 , v1 , . . . , vm−1 , v0 ). Suppose D does not contain a P3 ,
then, since (v0 , v1 ) ∈ A and (v1 , v2 ) ∈ A, we know that (v0 , v2 ) ∈ A.
Analogously, (v0 , vi ) ∈ A for each i < m. However, since C is a cy-
cle, (v0 , vm−1 ) ∈ A and (vm−1 , v0 ) ∈ A, a contradiction to D being a tour-
nament.

Corollary 2.9. If D is a tournament and a DAG, then D is transitive.

Since for each arc (a, b) either (a, b) or (b, a) is in the arc set, Lemma 2.5
implies that, when restricted to tournaments, Transitivity Deletion ⊆
Feedback Arc Set. It has been shown that the Feedback Arc Set
problem is APX-hard [Kan92]. However, while still being NP-hard [CTY07],
3 STRUCTURE OF OPTIMAL SOLUTION SETS 15

Feedback Arc Set on tournaments admits a polynomial-time approxima-


tion scheme [KMS07]. The problem was also considered from a parameter-
ized point of view: Feedback Arc Set on tournaments can be solved
in O(2.42k · n2.38 ) time [RS06] and admits a O(k 2 )-vertex kernel [DGH+ 06].

Comparability Graphs.

Definition 2.10. Let G = (V, E) be an undirected graph. An orienta-


tion of the edges of G is a function dir : E → V × V with dir({u, v}) ∈
{(u, v), (v, u)}.
Comparability graphs, also known as transitively orientable graphs or
partially orderable graphs, are undirected graphs for which an orientation dir
exists such that D = (V, A) with A = {dir(e) | e ∈ E} is transitive.

It has been shown that comparability graphs can be recognized in O(δ ·


|E|) time [Gol77] with δ denoting the maximum degree of the vertices in V .
The problem of turning a given undirected graph into a comparability graph
is:

Comparability Editing:
Input: An undirected graph G = (V, E) and an integer k ≥ 0.
Question: Is there an undirected graph G0 = (V, E 0 ) that is a
comparability graph and |E∆E 0 | ≤ k?

The Comparability Editing problem is NP-complete and NP-hard to


approximate within a factor of 18/17 [NSS01].

3 Structure of Optimal Solution Sets


In this section, we introduce some interesting properties of optimal solu-
tion sets regarding the Transitivity Editing problem (see Section 2.3).
In particular, there are results concerning the use of insert operations: For
instance, diamond-free digraphs have optimal solution sets that do not em-
ploy arc insertions. We also show that for each insertion (a, b) done by an
optimal solution set, there is a path from a to b other than (a, b) in the
resulting transitive digraph. Finally, it is explained why sources and sinks
are preserved by applying optimal solution sets.
Although the solution sets for a digraph are dependent on the digraph
structure as a whole, we can tell by the presence of a certain local feature,
whether arc insertions are necessary for a solution set to be optimal. This
feature is the diamond structure introduced in Section 2.1 on Page 6. The
3 STRUCTURE OF OPTIMAL SOLUTION SETS 16

following lemma shows that arc deletions preserve the property of being
diamond-free.

Lemma 3.1. Let D = (V, A) be a diamond-free directed graph and let S be


an optimal solution set for D. Then DDEL := (V, A∆SDEL ) is diamond-free.

Proof. Suppose DDEL contains a diamond consisting of the two P3 s (u, x, v)


and (u, y, v). Note that, since D is diamond-free, (u, v) ∈ SDEL . Since
neither (u, x) nor (x, v) can be deleted by SINS , it is obvious that (u, v) ∈
SINS . This contradicts SINS ∩ SDEL = ∅ (see Section 2.3, Page 11).

With the following lemma, we are able to show that in order to solve
Transitivity Editing on a digraph which is diamond-free, it is optimal to
only perform arc deletions. This helps us improve the performance (running
time) of our algorithms on diamond-free graphs.

Lemma 3.2. If a given directed graph D = (V, A) does not contain a dia-
mond, then there is an optimal solution set S for D that does not insert an
arc, that is, S = SDEL .

Proof. Let S 0 be an optimal solution set for D. By Lemma 3.1, we can apply
all delete operations of a given solution set and still maintain a diamond-free
digraph. Hence, we assume D to be diamond-free and the solution set S 0 to
only contain insert operations. We now construct S from S 0 :

S := (a, w) | ∃b∈V (a, b) ∈ S 0 ∧ (a, w) ∈ A ∧ (w, b) ∈ A .




Since D does not contain a diamond, for each pair (a, b), there is at most
one w meeting the criteria (a, w) ∈ A and (w, b) ∈ A. Hence, for each arc
in S 0 there is at most one arc in S and hence |S| ≤ |S 0 |.
Let D0 := (V, A0 ) with A0 := A∆S. We now show that S is a solution
set for D by proving that D0 is transitive: Assume there is a P3 p = (x, y, z)
in D0 . Since S ⊆ A (that is, S contains only delete operations), we know
that (x, y) ∈ A and (y, z) ∈ A and, since S 0 is a solution set for D, we know
that p is not a P3 in (V, A∆S 0 ), implying either (x, z) ∈ S 0 or (x, z) ∈ S.
However, (x, z) 6∈ S 0 , because otherwise (x, y) ∈ S, contradicting p being
a P3 in D0 . Hence, (x, z) ∈ A and (x, z) ∈ S. By definition of S, this implies
that there is a v ∈ V with (z, v) ∈ A and (x, v) ∈ S 0 . Also, (y, v) 6∈ A, since
otherwise, (x, z, v) and (x, y, v) would form a diamond in D. Hence, q =
(y, z, v) is a P3 in D. Like p, also q cannot be a P3 in (V, A∆S 0 ). However, S 0
does only contain insert operations, which implies (y, v) ∈ S 0 . Since (y, z) ∈
A and (z, v) ∈ A, this implies (y, z) ∈ S, contradicting p being a P3 in D0 .
3 STRUCTURE OF OPTIMAL SOLUTION SETS 17

In the following, we can specialize this circumstance to the fact that arc
insertions need only to take place if the endpoints take part in a diamond.
However, to prepare this, the following lemma shows that there are optimal
solution sets that do not increase the number of paths between two vertices,
if they are not head and tail of a diamond.
Lemma 3.3. Let D = (V, A) be a digraph and u, v, w ∈ V . Furthermore,
let predA (v) ∩ succA (u) ⊆ {w}. Then there is an optimal solution set S ∗
for D such that if there is a path of length ≥ 3 from u to v in (V, A∆S ∗ ),
then it is (u, w, v).
Proof. Let S 0 denote an optimal solution set for D. Furthermore for all S ⊆
V × V , let
[
P(S) := {(u = x0 , x1 . . . , xm−1 = v) ∈ V m | ∀0≤i≤m−2 (xi , xi+1 ) ∈ A∆S}
m≥3

denote the set of all paths of length ≥ 3 in A∆S from u to v and let P0 (S) :=
P(S)\{u, w, v}. Note that, since there is no diamond (u, . . . , v), for each
path (x0 , . . . , xm−1 ) ∈ P0 (S 0 ), we know that (u, x1 ) ∈ SINS
0 0
∨ (x1 , v) ∈ SINS
0 0
and (u, xm−2 ) ∈ SINS ∨ (xm−2 , v) ∈ SINS . We refer to this fact as obser-
vation 1. In the following, we show that there is also an optimal solution
set S ∗ for D such that P0 (S ∗ ) = ∅. Let

S ∗ := S 0 ∪ S 0+ \S 0−


with [
S 0+ := A ∩ {(u, x1 ), (xm−2 , v)}
(x0 ,...,xm−1 )∈P0 (S 0 )

and
[
S 0− := SINS
0
∩ {(u, x1 ), (u, xm−2 ), (x1 , v), (xm−2 , v)}.
(x0 ,...,xm−1 )∈P0 (S 0 )

Obviously, S 0 ∩ S 0+ = ∅ and S 0− ⊆ S 0 . Hence, we need only show |S 0− | ≥


|S 0+ | to prove that |S ∗ | ≤ |S 0 |. Note that, for each (x0 , . . . , xm−1 ) ∈ P0 (S 0 ):
If (u, x1 ) ∈ A, then observation 1 implies (x1 , v) ∈ SINS0 . Also, if (x
m−2 , v) ∈
A, then (u, xm−2 ) ∈ SINS . Hence, for each arc in S 0+ there is at least one
0

arc in S 0− and thus |S 0− | ≥ |S 0+ |.


We now show that S ∗ is a solution set for D. For the sake of contradiction
we assume there is a P3 (p, q, r) in D0 = (V, A∆S ∗ ). Hence, we know that

(p, q) ∈ A∆S ∗ ∧ (q, r) ∈ A∆S ∗ ∧ (p, r) 6∈ A∆S ∗ .


3 STRUCTURE OF OPTIMAL SOLUTION SETS 18

Note that, since S 0+ ⊆ A and S 0− ∩A = ∅, it is apparent that A∆S ∗ ⊆ A∆S 0 ,


and hence (p, q) ∈ A∆S 0 and (q, r) ∈ A∆S 0 . Since S 0 is a solution set for D,
it is clear that (p, r) ∈ A∆S 0 and hence, (p, r) ∈ S 0+ ∪ S 0− .
Case 1: (p, r) = (u, x1 ) for some path (u, x1 , . . . , xm−2 , v).
This implies that (p, q, x1 , . . . , xm−2 , v) is also a path in A∆S 0 . Assum-
ing (p, q) ∈ A\S 0 implies, by construction of S ∗ , that (p, q) ∈ S ∗ and
thus (p, q) 6∈ A∆S ∗ . On the other hand, the assumption that (p, q) ∈ SINS 0
∗ ∗
implies (p, q) 6∈ S and thus (p, q) 6∈ A∆S . Hence (p, q, r) is not a P3
in (V, A∆S ∗ ).
Case 2: (p, r) = (xm−2 , v) for some path (u, x1 , . . . , xm−2 , v).
This implies that (u, x1 , . . . , xm−2 , q, r) is also a path in A∆S 0 . Assum-
ing (q, r) ∈ A\S 0 implies, by construction of S ∗ , that (q, r) ∈ S ∗ and
thus (q, r) 6∈ A∆S ∗ . On the other hand, if we assume (q, r) ∈ SINS 0 then
∗ ∗
we know that (q, r) 6∈ S and thus (q, r) 6∈ A∆S . Hence (xm−2 , q, r) is not
a P3 in (V, A∆S ∗ ).
Case 3: (p, r) = (x1 , v) for some path (u, x1 , . . . , xm−2 , v).
This is analogous to Case 2.
Case 4: (p, r) = (u, xm−2 ) for some path (u, x1 , . . . , xm−2 , v).
This is analogous to Case 1.
Note that, by construction of S ∗ , it is clear that P0 (S ∗ ) = ∅ which im-
plies P(S ∗ ) ⊆ {(u, w, v)}, that is, if there is a path of length ≥ 3 from u to v
in (V, A∆S ∗ ), then it is (u, w, v).

Lemma 3.4. Let D = (V, A) be a digraph and let (u, v) ∈ (V × V )\A.


If predA (v) ∩ succA (u) ⊆ {w} for some w ∈ V , then there is an optimal
solution set S for D that does not contain (u, v).

Proof. By Lemma 3.3 there is an optimal solution set S 0 for D such that
the digraph (V, A∆S 0 ) contains at most one path of length ≥ 3 from u
to v. If there is no such path in (V, A∆S 0 ), then S 0 \{(u, v)} is obviously
an optimal solution set that does not contain (u, v). If there is such a
path, then Lemma 3.3 implies that this path is (u, w, v). We can assume
0 . In the following, we show that
that (u, v) ∈ SINS

S := S 0 ∪ {(u, w)} \{(u, v)}




is an optimal solution set for D that does not contain (u, v). Obviously, |S| ≤
|S 0 |, hence, we need only show that S is a solution set for D. Suppose there
is a P3 p = (x, y, z) in (V, A∆S). Since we remove an arc insertion and add
an arc removal, it is clear that (x, z) ∈ S 0 ∆S.
3 STRUCTURE OF OPTIMAL SOLUTION SETS 19

Case 1: (x, z) ∈ S 0 \S.


Obviously, (x, z) = (u, v). By construction of S 0 , the path (u, w, v) is the
only path from u to v in (V, A∆S 0 ). Thus y = w. However, since (u, w) ∈
SDEL , it is clear, that (u, w, v) = (x, y, z) cannot be a P3 in (V, A∆S).
Case 2: (x, z) ∈ S\S 0 .
Obviously, (x, z) = (u, w). Since S and S 0 differ only in (u, v) and (u, w),
it is clear that (u, y) ∈ A∆S 0 and (y, w) ∈ A∆S 0 . However, (u, y, w, v) is
a path from u to v in (V, A∆S 0 ) that is different from (u, w, v), contradict-
ing (u, w, v) being the only path of length ≥ 3 from u to v in (V, A∆S 0 ).

Moreover, for all pairs of vertices (u, v) 6∈ A we can prove that, if there
is no directed path from u to v after applying all arc deletions of an optimal
solution set, then this solution set does not contain (u, v).

Lemma 3.5. Let D = (V, A) be a digraph and S be an optimal solution set


for D. If (a, b) ∈ SINS , then there is a directed path starting at a and ending
at b in D0 = (V, A\SDEL ).

Proof. Let A0 := A\SDEL . Obviously, SINS is an optimal solution set for D0 .


Suppose there is no path from a to b in D0 . Then there is a partition of V
into
Va := reachD0 (a) and Vb := V \Va ,
with a ∈ Va , b ∈ Vb , and (Va × Vb ) ∩ A0 = ∅, since if there was some u ∈ Va
and some w ∈ Vb with (u, w) ∈ A0 , then, by definition, w ∈ Va . Note
0
that SINS := SINS \(Va × Vb ) is a solution set for D0 , since all P3 s that could
have been destroyed by SINS ∩ (Va × Vb ) would contain at least one arc
0 , we know that |S 0 | < |S
in Va × Vb . Since (a, b) 6∈ SINS INS INS |, contradicting
the optimality of S.

Lemma 3.5 implies that, if applying an optimal solution set to a di-


graph D creates a cycle in D by inserting an arc, then there was already
a longer cycle in D. Hence, optimal solution sets do not create cycles in
acyclic digraphs.

Corollary 3.6. Given a DAG D, all transitive digraphs of minimum editing


distance to D are DAGs.

Consider a source r in D. If r is not a source in (V, A∆S), then there is


an arc insertion (a, r) ∈ SINS , however, there is no path in D that ends in a
source.

Corollary 3.7. Applying optimal solution sets preserves sources and sinks.
4 COMPUTATIONAL COMPLEXITY 20

Apart from preserving sources and sinks, optimal solution sets also do
not delete arcs from any source to any sink.

Lemma 3.8. Let D = (V, A) be a digraph and VSRC and VSNK be the sets
of all sources and sinks in D, respectively. If S is an optimal solution set
for D, then SDEL ∩ (VSRC × VSNK ) = ∅.

Proof. Assume x is a source in D, y is a sink in D, and (x, y) ∈ SDEL . We


show that S 0 := S\{(x, y)} is also a solution set for D: Assume that not
removing (x, y) causes a P3 p = (u, v, w) in (V, A∆S 0 ). Hence, either (x, y) =
(u, v) or (x, y) = (v, w). However, by Corollary 3.7, we know that x is still
a source and y is still a sink in (V, A∆S 0 ) and thus v 6= x and v 6= y,
contradicting (x, y) = (u, v) and (x, y) = (v, w).

4 Computational Complexity
In this section, we prove the NP-completeness of Transitivity Editing
and Transitivity Deletion (see Section 2.3). Although it has been
stated that the NP-completeness of Transitivity Editing had been pre-
viously shown [JJK+ 08], the cited source ([NSS01]) does not prove the NP-
completeness of Transitivity Editing but the NP-completeness of Com-
parability Editing. Motivated by the lack of a completeness result, we
show that both Transitivity Editing and Transitivity Deletion are
NP-complete even when restricted to DAGs (see Section 4.2).

4.1 Complexity of Transitivity Editing


In this section, we will show that Transitivity Editing is NP-complete.
To this end, we need to show that a transitive digraph can be recognized
in polynomial time and that the Transitivity Editing problem is NP-
hard. The hardness is derived by a reduction from Positive-Not-all-
equal-3SAT, which is an NP-complete variant of 3SAT. As opposed to
3SAT, each clause of the input for Not-All-Equal-3SAT evaluates to
false if all of its variables are assigned true. The Positive-Not-all-equal-
3SAT problem differs from the Not-All-Equal-3SAT problem in that all
variables appear non-negated in the given formula.

Positive-Not-all-equal-3SAT:
Input: A Boolean formula ϕ in n variables x0 , . . . , xn−1 which
is a conjunction of m clauses Ci , each consisting of three positive
literals.
4 COMPUTATIONAL COMPLEXITY 21

Question: Is there a truth assignment to all n variables such


that for each clause Ci , exactly one or two of its variables are
assigned true, that is, there is no clause for which the truth values
of its variables are all equal?
In the following, Positive-Not-all-equal-3SAT will be referred to as
Positive-NAE-3SAT.
Example 4.1. The formula
ϕ := (x1 , x2 , x3 ) ∧ (x2 , x3 , x4 ) ∧ (x1 , x2 , x4 ) ∧ (x1 , x3 , x4 )
is a conjunction of the four clauses C0 = (x1 , x2 , x3 ), C1 = (x2 , x3 , x4 ), C2 =
(x1 , x2 , x4 ), and C3 = (x1 , x3 , x4 ). Note that all variables occur non-negated.
The formula ϕ is a yes-instance of Positive-NAE-3SAT. Consider the
following assignment to the variables of ϕ:
β(x1 ) := true β(x2 ) := true β(x3 ) := false β(x4 ) := false.
Obviously, β(C0 ) = ¬(β(x1 ) = β(x2 ) = β(x3 )) = true. Analogously, C1 , C2
and C3 evaluate to true.
Theorem 4.2 ([KT02]). Positive-NAE-3SAT is NP-complete.
In the following, we show that Positive-NAE-3SAT can be reduced to
Transitivity Editing in polynomial time.
To this end, we construct an instance of the Transitivity Editing
problem from a given instance of the Positive-NAE-3SAT problem in
polynomial time: For each of the n Boolean variables, we construct a di-
rected cycle of length 8m, with m being the number of clauses in the given
formula ϕ. These cycles will be referred to as variable cycles and are de-
scribed in the following. First of all, let i ⊕ j := i + j mod 8m for all i and j.
For each variable xk , we construct the variable cycle (Vk , Ak ) with
m−1
[ [ 7 n o m−1
[ [ 7 n o
k k k
Vk := v8i+j Ak := v8i+j , v8i⊕j⊕1 .
i=0 j=0 i=0 j=0

Each variable cycle has a subpath of eight vertices for each of the m clauses.
As we will see, each clause may cause the fifth vertex of the corresponding
subpath to be connected to other variable cycles, if xk is one of the variables
of this clause. The collection of all variable cycles is then referred to by (V, A)
with
n−1
[ n−1
[
V := Vk A := Ak .
k=0 k=0
4 COMPUTATIONAL COMPLEXITY 22

Figure 3: A clause cycle connecting three variable cycles as constructed by


the reduction.

In the following, we refer to the arcs (v0k , v1k ), (v2k , v3k ), . . . , (v8m−2
k k
, v8m−1 ) as
even arcs and all other arcs in the variable cycle as odd arcs. Furthermore,
for each of the m clauses in ϕ, we construct a directed cycle of length three
between the variable cycles of its three variables as shown in Figure 3. These
will be referred to as clause cycles. In particular, for each clause Ci =
(xi0 , xi1 , xi2 ), we construct the following clause cycle:
n     o
A0i := i0
v8i+4 i1
, v8i+4 i1
, v8i+4 i2
, v8i+4 i2
, v8i+4 i0
, v8i+4 .

The set of all arcs in the clause cycles is denoted by


m−1
[
0
A := A0i .
i=0

Note that we do not need any vertices other than those in V . Finally,
let D := (V, A ∪ A0 ) denote the resulting digraph.
In order to show the correctness of the reduction, we need the following
lemmas.
Lemma 4.3. In order to turn a directed cycle of even length ≥ 4 transitive
without inserting an arc, it is optimal to delete every second arc. Moreover,
this is the only optimal way to do so.
Proof. Let C = (VC , AC ) denote a directed cycle of length l = 2 · l0 . Sup-
pose S is an optimal solution set for C that does not delete every second
arc. Note that |S| ≤ l0 , since otherwise the set containing every second
arc of C is a solution set that is smaller then S, contradicting the opti-
mality of S. Consider all pairs of adjacent arcs (a, b), (b, c) ∈ AC . Obvi-
ously, (a, c) 6∈ AC . Since S is a solution set for C, we know that (a, b) ∈ S
4 COMPUTATIONAL COMPLEXITY 23

or (b, c) ∈ S. However, since S does not delete every second arc, there
is some pair of arcs (a0 , b0 ), (b0 , c0 ) that are both in S. Obviously, P :=
(VC , AC \{(a0 , b0 ), (b0 , c0 )}) is a path of l − 2 = 2(l0 − 1) arcs. Hence, there
are l0 − 1 disjoint P3 s in P . Since S\{(a0 , b0 ), (b0 , c0 )} must be a solution set
for P , we know that |S|−2 ≥ l0 −1 and thus |S| ≥ l0 +1, which contradicts S
being an optimal solution set for C.

Note that there are two ways to delete every second arc in a variable
cycle. Either delete all odd arcs or all even arcs. These two optimal so-
lutions will represent the truth value of the corresponding variable. If the
variable cycle for xk is turned transitive by the deletion of all even arcs, xk
is considered to be assigned true, otherwise false.
Consider a clause cycle. Obviously, a cycle of length three can be turned
transitive with two arc deletions. However, this stamps an asymmetry on
the clause cycle that results in a remaining P3 , if all even arcs of all three
variable cycles are deleted or all odd arcs are. Hence, if this is the case, an
additional arc deletion is required.

Lemma 4.4. For each clause Ci , if a solution set S to the induced subgraph
h n oi
i0 i1 i2
D Vi0 ∪ Vi1 ∪ Vi2 ∪ v8i+4 , v8i+4 , v8i+4

contains all or none of the arcs


     
i0 i0 i1 i1 i2 i2
v8i+4 , v8i+5 , v8i+4 , v8i+5 , and v8i+4 , v8i+5 ,

then S contains at least 3 · 4m + 3 arcs, 4m arcs for each variable cycle and 3
for the clause cycle.

Proof. By Lemma 4.3, 4m operations are required to turn each variable


cycle transitive. If the variable cycles adjoin to the clause cycle in the
same manner, then, as shown in Figure 4, three operations are required,
while otherwise, it is possible to turn the clause cycles transitive with two
operations (see Figure 5). Since the structure is highly symmetric, these
examples represent all possible situations, thus proving the claim.

Theorem 4.5. Transitivity Editing is NP-complete, even if the maxi-


mum degree is bounded by 4 (indegree 2 and outdegree 2).

Proof. Obviously, one can verify that a digraph is transitive in polynomial


time by calculating its transitive closure (see Section 2.3 on Page 12). This
4 COMPUTATIONAL COMPLEXITY 24

Figure 4: If all variable cycles adjoin to the clause cycle in the same way,
that is either all even arcs of all variables are deleted (left image), or all
odd arcs of all variables are deleted (right image), then the structure can
neither be turned transitive by removing two arcs, nor by removing an arc
and inserting its opposite arc. Always three operations are required. Bold
arcs symbolize membership in A0 . Dashed arcs symbolize deletions.

Figure 5: If the variable cycles adjoin to the clause cycle in different ways
(the left image shows that all odd arcs of the variable cycle of xi1 are deleted
and all even arcs of the variable cycles of the other two variables are deleted,
the right image shows the opposite), then the cycles can be turned transitive
by removing two arcs. Removing an arc and inserting its opposite does not
yield transitive subgraphs. Bold arcs symbolize membership in A0 . Dashed
arcs symbolize deletions.
4 COMPUTATIONAL COMPLEXITY 25

implies that Transitivity Editing is in NP. We now show that it is also


NP-hard by reducing from Positive-NAE-3SAT. Let D = (V, A ∪ A0 ) be
a digraph constructed as described from the given instance of Positive-
NAE-3SAT (see Page 22). We show that (D, 2m + 4mn) ∈ Transitivity
Editing iff there is a satisfying assignment to the variables of the Positive-
NAE-3SAT instance.
“⇐”: Suppose there is a satisfying assignment β to the variables of a
Positive-NAE-3SAT instance. Then we can construct a transitive digraph
by editing D in the following way: First, for each variable xk , we remove all
odd arcs of its variable cycle if β(xk ) = true and all even arcs, otherwise.
All in all, we remove 4m arcs for each of the n variable cycles, which is a
total of 4mn operations.
Furthermore, for each clause Ci = (xi0 , xi1 , xi2 ), the clause cycle is edited
in the following way: Since β is a satisfying assignment to all variables of
the Positive-NAE-3SAT instance, we know that β(xi0 ), β(xi1 ), and β(xi2 )
are not all equal. Hence, according to Lemma 4.4 it suffices to delete two
of the three arcs of the clause cycle. This requires 2m operations in total.
Altogether, turning D transitive is possible with 2m + 4mn operations and
hence (D, 2m + 4mn) ∈ Transitivity Editing.
“⇒”: Suppose (D, 2m + 4mn) ∈ Transitivity Editing. Hence, a
solution set S for D exists such that |S| ≤ 2m + 4mn. Let  := (A ∪ A0 )∆S,
and D̂ := (V, Â). Obviously, D̂ is transitive. Since D is diamond-free, by
Lemma 3.2, we can assume S ⊆ A ∪ A0 . By Lemma 4.4, turning all m clause
cycles transitive requires at least 2m operations. Hence, |S ∩ A0 | = 2m and
thus |S ∩ A| = 4mn. Since all variable cycles are disjoint and no variable
cycle can be turned transitive with less them 4m arc deletions, we also
know that for all xk in the given formula, |S ∩ Ak | = 4m and thus, by
Lemma 4.3, every second of the variable cycle of xk is deleted. Hence, for
each variable xk , either
 
k k
∀j v2j , v2j⊕1 ∈ Â
or  
k k
∀j v2j⊕1 , v2j⊕2 ∈ Â,
which represents assigning true or false to xk , respectively:
(
true, if (v0k , v1k ) ∈ Â
β(xk ) = .
false, else.
We now show that β is a satisfying assignment to the n variables of the
Positive-NAE-3SAT instance. Suppose β was not a satisfying assignment,
4 COMPUTATIONAL COMPLEXITY 26

then there is some clause Ci = (xi0 , xi0 , xi0 ) with β(xi0 ) = β(xi1 ) = β(xi2 ).
By Lemma 4.4, turning the corresponding clause cycle transitive would re-
quire three operations, contradicting (D, 2m+4mn) ∈ Transitivity Edit-
ing.
Since Transitivity Editing is in NP and also NP-hard, the NP-
completeness follows.

In the above proof, we never employ arc insertions which implies that it
can be used to prove that Transitivity Deletion is NP-complete.

Corollary 4.6. Transitivity Deletion is NP-complete, even if the max-


imum degree is bounded by 4 (indegree 2 and outdegree 2).

4.2 Complexity of Acyclic Transitivity Editing


Having established the NP-completeness of Transitivity Editing in the
previous section, it is of great interest whether restricting the input in some
way may allow for polynomial-time computation of optimal solution sets.
A canonical restriction is to disallow cycles in the input graph. Problems
like Disjoint Paths [YZL07] and Longest Path [KMR97, BPSS01] which
are NP-complete on general graphs, have been shown to be polynomial-time
solvable on DAGs. However, our attempts to find polynomial-time algo-
rithms for it failed. In the following, we show that Transitivity Editing
is NP-complete on DAGs by providing a reduction from Positive-NAE-
3SAT with an approach similar to the one described in Section 4.1, except
that the variable and clause cycles are replaced by acyclic gadgets. Given an
instance of Positive-NAE-3SAT as defined in Section 4.1 on Page 20, we
construct a DAG D = (V ∪ V 0 , A ∪ A0 ) as follows: For each of the n Boolean
variables, we construct a variable gadget (see Figure 6) that has exactly
two ways of being turned transitive using at most 40m + 5 operations, which
represent assigning true and false to the variable, respectively. For each xk ,
we construct the vertex set
n o [8m n o 3m−1
[ n o 13m+1
[ n o
Vk := v1k , v5k ∪ k
v0,j k
, v6,j ∪ k
v2,j k
, v3,j k
, v4,j ∪ akj , bkj , ckj
j=0 j=0 j=0

and the arc set


3m−1 13m+1 8m
Aupper
[ [ [
Ak := k,j ∪ Alower
j,k ∪ Aouter
k,j ,
j=0 j=0 j=0
4 COMPUTATIONAL COMPLEXITY 27

Figure 6: The variable gadget of xk . The bold arcs show potential docking
arcs (see Figures 8 and 9), while the additional paths via akj , bkj , and ckj
ensure that optimally turning this structure transitive requires the deletion
k , v k ) for each 0 ≤ j ≤ 8m or (v k , v k ) for each 0 ≤ j ≤ 8m.
of either (v0,j 1 5 6,j
4 COMPUTATIONAL COMPLEXITY 28

with
n       o
Aupper
k,j := k
v1k , v2,j k
, v2,j k
, v3,j k
, v3,j k
, v4,j k
, v4,j , v5k ,
n       o
Alower
k,j := v1k , akj , akj , bkj , bkj , ckj , ckj , v5k , and
n   o
Aouter
k,j := v k
,
0,j 1 v k
, v k k
, v
5 6,j .

The collection of all variable gadgets is then referred to as


n−1
[ n−1
[
V := Vk A := Ak .
k=0 k=0

Note that (V, A) is acyclic and diamond-free. The following arc disjoint P3 s
are contained in each variable gadget (Vk , Ak ):
k , v k , v k ), (v k , v k , v k ), (v k , v k , v k ) for all 0 ≤ j < 3m
1. (v0,j 1 2,j 2,j 3,j 4,j 4,j 5 6,j

k
2. (v0,3m+j , v1k , akj ), (akj , bkj , ckj ), (ckj , v5k , v6,3m+j
k ) for all 0 ≤ j ≤ 5m

3. (v1k , akj , bkj ), (bkj , ckj , v5k ) for all 5m < j ≤ 13m + 1

All in all, there are 3 · 3m + 3 · (5m + 1) + 2 · (8m + 1) = 40m + 5 disjoint P3 s


in each variable gadget.

Observation 4.7. For each variable gadget, at least 40m + 5 operations are
required to turn it transitive.

For convenience, we introduce the following notation.

Definition 4.8. For each variable xk , the vertices v1k and v5k are odd ver-
tices. All other vertices in V are even vertices if they are adjacent to an odd
vertex and odd vertices if they are adjacent to an even vertex. We refer to
an arc (u, v) as odd arc if u is odd, otherwise (u, v) is called even arc.

Furthermore, we construct the following clause gadgets: For each clause Ci


in the formula ϕ, we construct three gadget parts such that one of them can
be turned transitive with exactly four operations iff one of the variables of Ci
is assigned a different truth value than the other two. For each part p of the
clause gadget of clause Ci = (xi0 , xi1 , xi2 ), we construct the vertex set
0
:= uip,0 , wp,0
i
, uip,1 , wp,1
i

Vi,p .
4 COMPUTATIONAL COMPLEXITY 29

Figure 7: Part 0 of the clause gadget of clause C2 = (x3 , x4 , x7 ). Notice that


3 , v 3 ) instead of (v 3 , v 3 ).
this gadget part docks over (v3,6 4,6 2,6 3,6

The two vertices uip,0 and uip,1 are then connected to the variable gadgets,
depending on p:
n   o
 v ir , u i , v ir
, u i , if p = r
3,3i+p p,0
A0i,p,r := n i   4,3i+p p,1 o .
i ir i
 vr
2,3i+p , up,0 , v3,3i+p , up,1 , if p 6= r

Each structure is completed by

A0i,p := uip,0 , wp,0


i
, uip,1 , wp,1
i
  
.

Hence, for each clause Ci , we have


2 2
!
[ [
A0i := A0i,p ∪ A0i,p,r
p=0 r=0

and the collection of all clause gadgets is then denoted by


m−1
[ [ 2 m−1
[
V 0 := 0
Vi,p A0 := A0i .
i=0 p=0 i=0

With the construction of these clause gadgets, we need less operations if one
of the variable gadgets is edited in a different manner than the other two
which will correspond to one of the variables of Ci being assigned a differ-
ent truth value than the other two. This is achieved by choosing different
4 COMPUTATIONAL COMPLEXITY 30

“docking arcs” for each clause gadget part: The three docking arcs of the
gadget part p of the clause gadget of Ci = (xi0 , xi1 , xi2 ) are (see Figure 7)

∀0≤r<3 αir ,3i+p := predD uip,0 × predD uip,1 ∩ Air .


 

Furthermore, for each 0 ≤ r < 3, the arc γir ,3i+p denotes the arc in Air
that is incoming to the vertex that αir ,3i+p is outgoing from. Note that D
is acyclic and diamond-free. Hence, we can assume that there is an optimal
solution set for D that contains only arc deletions. By the construction of
the clause gadgets it is clear that each gadget part requires at least two
operations to be turned transitive.

Observation 4.9. For each clause gadget, at least six operations are re-
quired to turn it transitive, independent of deletions in the variable gadgets.

Furthermore, by the construction of A0i,p,r we can see that each part of a


ir ir
clause gadget docks over (v3,3i+p , v4,3i+p ), which is an odd arc, if p = r and
ir ir
over (v2,3i+p , v3,3i+p ), which is an even arc, otherwise.

Observation 4.10. All three parts of each clause gadget dock over one odd
arc and two even arcs.

In order to prove the NP-completeness, we need the following lemmas.


First, for each variable gadget to be able to be turned transitive without
using too many operations, we can show that either all arcs that are incoming
to v1k or all arcs that are outgoing from v5k must be deleted, but not both.

Lemma 4.11. Let S be a solution set for D = (V ∪ V 0 , A ∪ A0 ) as defined


above with |S| ≤ n · (40m + 5) + 14m. Furthermore, for each k, let Sk :=
S ∩ Ak . Then for each k, either
k
∀0≤r≤8m (v0,r , v1k ) ∈ Sk or ∀0≤r≤8m (v5k , v6,r
k
) ∈ Sk .

Proof. Let D0 := (V ∪ V 0 , (A ∪ A0 )∆S). Since the variable gadgets do not


contain diamonds, by Lemma 3.2, we can assume that Sk contains only arc
deletions. First, we show that at most one of the two statements is true.
For the sake of contradiction, we assume
k
∀0≤j≤8m {(v0,j , v1k ), (v5k , v6,j
k
)} ⊆ Sk .

This leaves 2 · (16m + 2) disjoint P3 s in the center of the gadget. Hence, 2 ·


(8m + 1) + 2 · (16m + 2) = 48m + 6 operations would be required for this
4 COMPUTATIONAL COMPLEXITY 31

gadget. By Observations 4.7 and 4.9, we know that |S| ≥ (n − 1) · (40m +


5) + 48m + 6 + 6m = n · (40m + 5) + 14m + 1, a contradiction.
Next, we show that at least one of the two statements is true. For the
sake of contradiction, we assume
k
∃j (v0,j , v1k ) 6∈ Sk and ∃j (v5k , v6,j
k
) 6∈ Sk .
If there was some 0 ≤ r ≤ 13m + 1 with (v1k , akr ) 6∈ Sk then there exists
k , v k , ak ) in D 0 . Hence, (v k , ak ) ∈ S for each 0 ≤ r ≤ 13m +
a P3 (v0,j 1 r 1 r k
k k
1. Likewise, (v1 , v2,r ) ∈ Sk for each 0 ≤ r < 3m. The symmetry of the
construction then implies that
∀0≤j<3m {(v1k , v2,j
k k
), (v4,j , v5k )} ⊆ Sk
and
∀0≤j≤13m+1 {(v1k , akj ), (ckj , v5k )} ⊆ Sk ,
requiring already 2 · (16m + 2) arc deletions in Sk and leaving 16m + 2
disjoint P3 s in the center of the gadget. Hence, 2 · (16m + 2) + 16m + 2 =
48m + 6 operations would be required for this gadget. Again, we know
that |S| ≥ (n − 1) · (40m + 5) + 48m + 6 + 6m = n · (40m + 5) + 14m + 1, a
contradiction.

Since not all (v0,jk , v k ) and all (v k , v k ) are deleted, it is clear that there
1 5 6,j
is either some (v0,j k , v k ) or some (v k , v k ) that is not deleted. Hence, either
1 5 6,j
all (v1k , v2,j
k ) or all (v k , v k ) must be deleted.
4,j 5

Corollary 4.12. Let Sk be as defined in Lemma 4.11. For each k,


∀0≤j<3m (v1k , v2,j
k
) ∈ Sk and ∀0≤j≤13m+1 (v1k , akj ) ∈ Sk
or
k
∀0≤j<3m (v4,j , v5k ) ∈ Sk and ∀0≤j≤13m+1 (ckj , v5k ) ∈ Sk .
Second, we can show that for each clause gadget to be turned transitive
without using too many operations, there must be a part p such that the
arcs of p are either all deleted or all not deleted.
Lemma 4.13. Let D = (V ∪V 0 , A∪A0 ) be as described above and let S be an
optimal solution set for D with S = SDEL . Let Ci = (xi0 , xi1 , xi2 ) be a clause
in ϕ and p be a part of its clause gadget. Furthermore, for each 0 ≤ r < 3
exactly one of the arcs αir ,3i+p and γir ,3i+p is in S. If
∀0≤r<3 αir ,3i+p ∈ S or ∀0≤r<3 γir ,3i+p ∈ S

then S ∩ A0i,p = 4. Otherwise, S ∩ A0i,p = 5.

4 COMPUTATIONAL COMPLEXITY 32

Proof. Suppose the premise is true, that is, p is a gadget part for which
all αir ,3i+p or all γir ,3i+p are deleted. Without loss of generality we assume
all αir ,3i+p to be deleted. Since for each 0 ≤ r < 3 exactly one of the
arcs αir ,3i+p and γir ,3i+p is deleted, we know that all γir ,3i+p are not deleted.
Figure 8 shows that it is possible to turn part p of the clause gadget cor-
responding to clause Ci transitive with four operations. As we can also see
in Figure 8, there are four disjoint P3 s in (V ∪ V 0 , (A ∪ A0 )\(S\A0ir ,3i+p )).
Hence, four arc deletions are also required.
Suppose the premise is false, that is, p is a gadget part for which there
is some αis ,3i+p that is not deleted and there is also some γit ,3i+p that is not
deleted. Figure 9 shows that it is possible to turn part p of the clause gadget
corresponding to clause Ci transitive with five operations. As we can also
see in Figure 9, there are five disjoint P3 s in (V ∪ V 0 , (A ∪ A0 )\(S\A0ir ,3i+p )).
Hence, at least five arc deletions are required

By construction of the clause gadgets, two of the docking arcs of each


part are even while the third one is odd. If in all variable gadgets all odd or
all even arcs are deleted and there is some part that can be turned transitive
with four arc deletions, then either all its docking arcs are deleted or all its
docking arcs are not deleted. However, if there is some other part of the
same gadget that can be turned transitive with four operations, then, since
it is different from the first, this part must dock over one even and two odd
docking arcs.
Lemma 4.14. Let D = (V ∪ V 0 , A ∪ A0 ) be as described above and let S
be an optimal solution set for D with S = SDEL . Let Ci = (xi0 , xi1 , xi2 ) be
a clause in ϕ such that S contains exactly all odd or all even arcs of the
variable gadgets of xi0 ,xi1 , and xi2 . For each 0 ≤ p < 3 and each 0 ≤ r < 3
let exactly one of the arcs αir ,3i+p and γir ,3i+p be in S. If
∃0≤p<3 ∀0≤r<3 αir ,3i+p ∈ S or ∃0≤p<3 ∀0≤r<3 γir ,3i+p ∈ S
then |S ∩ A0i | = 14. Otherwise, |S ∩ A0i | = 15.
Proof. Suppose the premise is false, that is, for each 0 ≤ p < 3 there is
some 0 ≤ q < 3 such that αiq ,3i+p 6∈ S and there is some 0 ≤ r < 3 such
that γir ,3i+p 6∈ S. then, by Lemma 4.13, each of the three parts require five
arc deletions each, thus all three parts require a total of 15 arc deletions.
Suppose the premise is true, that is, there is a gadget part p0 such that all
its docking arcs are deleted or all its docking arcs are not deleted. Without
loss of generality, we assume that all its docking arcs are deleted, that is,
∀0≤r<3 αir ,3i+p0 ∈ S.
4 COMPUTATIONAL COMPLEXITY 33

Figure 8: The first part of the clause gadget of clause C2 = (x3 , x4 , x7 )


with all or no docking arcs being deleted. Dashed lines indicate arcs that
are deleted. Here, four arc deletions suffice. This situation corresponds
to β(x3 ) 6= β(x4 ) = β(x7 ) for an assignment β of all variables.
4 COMPUTATIONAL COMPLEXITY 34

Figure 9: The first part of the clause gadget of clause C2 = (x3 , x4 , x7 )


with some, but not all docking arcs being deleted. Dashed lines indicate
arcs that are deleted. Here, at least five arc deletions are required. This
situation corresponds to β(x3 ) = β(x4 ) = β(x7 ) for an assignment β of all
variables.

By Lemma 4.13, four operations are required to turn p0 transitive. Ob-


viously, there can be no second part p1 of the same gadget such that all
docking arcs of the part p1 are deleted. Suppose all docking arcs of part p1
are not deleted. Then
∀0≤r<3 γir ,3i+p1 ∈ S,
and, since the arcs in the variable gadgets are deleted alternately, p1 would
have to dock over one even arc and two odd arcs, a contradiction to Obser-
vation 4.10.

Theorem 4.15. Transitivity Editing is NP-complete, even when re-


stricted to DAGs.

Proof. We show the following: The given instance of Positive-NAE-3SAT


is a yes-instance iff (D, n · (40m + 5) + 14m) with D constructed as described
above is a yes-instance of Transitivity Editing.
“⇒”: If the given instance of Positive-NAE-3SAT is a yes-instance
then all clauses of the given formula ϕ evaluate to true under some assign-
ment β for all variables of ϕ. Hence, there is no clause whose variables are
all assigned the same truth value. By Lemma 4.11, all variable gadgets can
be turned transitive with 40m + 5 operations each, by either deleting all
even arcs or all odd arcs: If β(xk ) = true then all even arcs of the variable
4 COMPUTATIONAL COMPLEXITY 35

gadget of xk are deleted, otherwise all odd arcs are. Since β is a satisfy-
ing assignment, there is no clause whose variables are assigned the same
truth value. Thus, each clause gadget docks to at least one variable gad-
get whose odd arcs are deleted and at least one variable gadget whose even
arcs are deleted. Hence, there is exactly one part of each clause gadget for
which the docking arcs are either all deleted or all not deleted. Lemma 4.14
describes that, under these circumstances, we can turn each clause gadget
transitive with 14 arc deletions and thus, we can turn D transitive with a
total of n · (40m + 5) + 14m arc deletions.
“⇐”: Since (D, n · (40m + 5) + 14m) is a yes-instance, there is some
solution set S to D that is optimal and |S| ≤ n · (40m + 5) + 14m. Further-
more, since D does not contain a diamond, by Lemma 3.2, we can assume
that S contains only arc deletions. Let D0 := (V ∪ V 0 , (A ∪ A0 )∆S). Let xk
be a variable of ϕ, let Ci denote a clause, and let p denote a gadget part of
the clause gadget of Ci . By Lemma 4.11, either all (v0,r k , v k ) or all (v k , v k )
1 5 6,r
with 0 ≤ r ≤ 8m are deleted. Without loss of generality, we assume that
k , v k ) ∈ S and all (v k , v k ) 6∈ S for 0 ≤ r ≤ 8m.
all (v0,r 1 5 6,r
In the following, we show that under these circumstances all even arcs of
the variable gadget of xk are deleted. Since i and p were chosen arbitrarily,
it suffices to show that we can modify S such that it is an optimal solution
set and
S ∩ Aupper k k k k
k,3i+p = {(v2,3i+p , v3,3i+p ), (v4,3i+p , v5 )}. (1)
k
Obviously, (v4,3i+p , v5k ) must be in S since otherwise (v4,3i+p
k , v5k , v6,0
k ) is a P
3
in (V, A∆S). Furthermore, there must be more than two arcs in S ∩ Aupper k,3i+p ,
k k k k
since the remaining P4 (v1 , v2,3i+p , v3,3i+p , v4,3i+p ) can only be destroyed with
k
a single arc deletion by deleting (v2,3i+p k
, v3,3i+p ), implying that S already
satisfies (1). Furthermore, we assume that

upper
S ∩ Ak,3i+p 6= 4,

since otherwise, we can remove one of these four arcs from S without creating
a P3 in D0 , contradicting the optimality of S. Hence, it is clear that

upper
S ∩ Ak,3i+p = 3

and thus, exactly one of the arcs in Aupper k,3i+p is not in S. Recall that the
docking arc αk,3i+p of part p of the clause gadget of clause Ci is either the
k
arc (v2,3i+p k
, v3,3i+p k
) or the arc (v3,3i+p k
, v4,3i+p ). In the following, we show
that S can be modified without creating a P3 in (V, A∆S) such that (1)
holds. To this end, we consider the following six cases:
4 COMPUTATIONAL COMPLEXITY 36

Figure 10: An illustration of how S can be modified. In this exam-


k
ple, (v2,3i+p k
, v3,3i+p ) is a docking arc that is not deleted. The scissors sym-
bolize how two arc deletions can be altered in order for the solution set to
satisfy (1). This situation corresponds to Case 1.3 in the proof of Theo-
rem 4.15.

Case1: αk,3i+p = (v2,3i+p k k


, v3,3i+p ).
k k
Case1.1: (v1 , v2,3i+p ) 6∈ S.
k
Obviously S\{(v3,3i+p k
, v4,3i+p )} is a solution set to D thus contradicting the
optimality of S.
k
Case1.2: (v3,3i+p k
, v4,3i+p ) 6∈ S.
k k
We can remove (v1 , v2,3i+p ) from S and add (v2,3i+p k , uip,0 ) to S without
creating a P3 in D0 . Hence, S 0 = (S\{(v1k , v2,3i+p k )}) ∪ {(v2,3i+pk , uip,0 )} is a
solution set for D that satisfies (1).
k
Case1.3: (v2,3i+p k
, v3,3i+p ) 6∈ S.
k
We can remove (v3,3i+p , v4,3i+pk k
) from S and add (v2,3i+p k
, v3,3i+p ) to S without
0
creating a P3 in D . Thereafter, Case 1.2 applies (see Figure 10).
Case2: αk,3i+p = (v3,3i+p k k
, v4,3i+p ).
k k
Case2.1: (v3,3i+p , v4,3i+p ) 6∈ S.
Since all arcs that are incoming to v1k are in S, it is clear that S\{(v1k , v2,3i+p k )}
is a solution set to D thus contradicting the optimality of S.
Case2.2: (v1k , v2,3i+p
k ) 6∈ S.
k
We can remove (v3,3i+p , v4,3i+p k k
) from S and add (v4,3i+p , uip,1 ) to S without
creating a P3 in D0 . Hence, S 0 = (S\{(v3,3i+p k k
, v4,3i+p )}) ∪ {(v4,3i+pk , uip,1 )} is
a solution set for D that satisfies (1).
k
Case2.3: (v2,3i+p k
, v3,3i+p ) 6∈ S.
k k
We can remove (v1 , v2,3i+p ) from S and add (v2,3i+p k k
, v3,3i+p ) in S without
0
creating a P3 in D . Thereafter, Case 2.2 applies (see Figure 11).
So far, we have shown that if
k
∀0≤r≤8m (v0,r , v1k ) ∈ S
4 COMPUTATIONAL COMPLEXITY 37

Figure 11: An illustration of how S can be modified. In this exam-


k
ple, (v3,3i+p k
, v4,3i+p k
) is a docking arc and (v2,3i+p k
, v3,3i+p ) is not deleted.
The scissors symbolize how two arc deletions can be altered in order for the
solution set to satisfy (1). This situation corresponds to Case 2.3 in the
proof of Theorem 4.15.

then all even arcs of the variable gadget of xk are deleted. By analogy, it
can be shown that if
∀0≤r≤8m (v5k , v6,r
k
)∈S
then all odd arcs of the variable gadget of xk are deleted. By Lemma 4.11, we
can assume that for each variable gadget, either all even arcs or all odd arcs
are in S. Since |S| ≤ n · (40m + 5) + 14m, Observation 4.7 and Lemma 4.14
imply that for each 0 ≤ i < m,

S ∩ A0i = 14. (2)

In the following, we construct a satisfying assignment β for the variables


of the given formula ϕ from S:
(
k , vk ) ∈ S
true, if (v0,0 1
β(xk ) :=
false, otherwise.

By Lemma 4.14, Equation (2) we know that for each clause gadget, there is
some part such that all its docking arcs are deleted or all its docking arcs
are not deleted. By the construction of the clause gadgets, this implies that
the truth values of the three variable gadgets of each clause gadget cannot
be equal. Hence, β is a satisfying assignment for the variables of ϕ.
All in all, the given instance of Positive-NAE-3SAT is a yes-instance,
iff (D, n · (40m + 5) + 14m) is a yes-instance of Transitivity Editing.
The theorem follows.

In the proof, we never employ arc insertions which implies that it can
be used to prove that Transitivity Deletion is NP-complete on DAGs.
5 POLYNOMIAL-TIME DATA REDUCTION 38

Input: An instance (D = (V, A), k) for the Transitivity Editing


problem and a set of rules R.
Output: An instance (D∗ , k ∗ ) that is reduced with respect to each
rule in R.
1 if R 6= ∅ then
2 Let current rule ∈ R;
3 repeat
4 Reduce (D, k) with respect to rules R\{current rule};
5 Apply current rule to (D, k);
6 until no more changes were made to (D, k) ;
7 end
8 return (D, k);

Algorithm 1: The algorithm for exhaustive application of a set of


rules. Notice the recursive call in line 4. Since the application of each
rule decreases k or |V |, at most k + |V | applications are performed. If
each rule can be applied in polynomial time, then the whole algorithm
runs in polynomial time.

Corollary 4.16. Transitivity Deletion is NP-complete, even if re-


stricted to DAGs.

5 Polynomial-Time Data Reduction


In this section, we describe a kernelization for the Transitivity Editing
problem on general digraphs and digraphs of bounded degree. Since the
data reduction rules we present detect local characteristics only and none of
them deals with cycles, we do not expect any different result when applying
them to DAGs. Furthermore, additional data reduction rules are given that
are provably correct but do not improve the asymptotic kernel size.
In accordance with Section 2.2, a kernelization for Transitivity Edit-
ing is a polynomial-time algorithm that transforms a given instance (D, k)
into a new instance (D∗ , k ∗ ) with k ∗ ≤ k such that (D, k) is a yes-instance
if and only if (D∗ , k ∗ ) is a yes-instance and |I| ≤ g(k) for a function g. A
kernelization can be given by providing a set of reduction rules that trans-
form the input digraph and the parameter k. The kernelization consists of
exhaustive application of each rule (see Algorithm 1).
Definition 5.1. Let (D, k) be an instance of the Transitivity Editing
problem and R be a reduction rule. Applying R to (D, k) means to change D
5 POLYNOMIAL-TIME DATA REDUCTION 39

and k according to R. The instance that is obtained by applying rule R


to (D, k) is denoted by R(D, k). The rule R is said to be sound if

(D, k) ∈ Transitivity Editing ⇔ R(D, k) ∈ Transitivity Editing.

The instance (D, k) is said to be reduced with respect to rule R if R(D, k) =


(D, k).

5.1 Kernelization for Transitivity Editing


In the following, we describe a kernelization for the Transitivity Editing
problem. We are able to show a kernel of O(k 2 ) vertices for the general prob-
lem and a kernel of O(k) vertices for Transitivity Editing on digraphs
with bounded degree.

Rule 5.2. Given an instance (D, k) of the Transitivity Editing problem


with D = (V, A). If there is a vertex u ∈ V that does not take part in any P3
in D, then remove u and all arcs that are incident to it.

Lemma 5.3. Rule 5.2 is sound.

Proof. In order to prove the lemma, we construct a sequence of operations


that form an optimal solution set. Then we prove that, if at some point
in the successive application there is a vertex that does not take part in
a P3 , then this vertex does not take part in any P3 at any later point in
the successive application. Thus, removing this vertex does not modify the
sequence of sets of P3 .
Let (D, k) with D = (V, A) denote the given instance and let S denote
an optimal solution set for D with m := |S| ≤ k and D0 := (V, A∆S). Let Q
be a search-tree algorithm that finds a P3 in the digraph and destroys it by
inserting or removing an arc. Furthermore, Q returns the shortest sequence
of digraphs (D = D0 , D1 , . . . , Dm = D0 ) with Di := (V, Ai ) and a sequence
of operations F1 , . . . , Fm with Fi := Ai−1 ∆Ai for each 0 ≤ i ≤ m. We
prove the following: For each i > 0, if a vertex u ∈ V does not take part
in any P3 in Di−1 then it does not take part in any P3 in Di . Hence, by
induction, if u does not take part in any P3 in D0 then it does not take part
in any P3 in any Dj , j > 0. Thus, D and D − u yield the same sequence of
operations F1 , . . . , Fm and thus (D, k) ∈ Transitivity Editing ⇔ (D −
u, k) ∈ Transitivity Editing.
In the following, we show the contraposition of the claim: For each i > 0,
if a vertex u ∈ V takes part in a P3 p in Di , then it takes part in a P3 q
in Di−1 . Let Fi = {(a, b)}. Since Q only removes or inserts (a, b) to destroy
5 POLYNOMIAL-TIME DATA REDUCTION 40

a P3 , we know that there is a P3 r in Di−1 that contains both a and b.


Hence, if u = a or u = b, then q = r and thus u takes part in q. Otherwise,
we consider the following cases.
Case 1: (a, b) is inserted.
Clearly, there is a vertex v ∈ V such that (a, v, b) is a P3 in Di−1 , hence,
if u = v, then q = (a, v, b). Furthermore, if p 6= (a, b, u) and p 6= (u, a, b),
then q = p, otherwise p = (a, b, u) or p = (u, a, b). Since both cases are anal-
ogous, we assume that p = (a, b, u). Obviously, (a, u) 6∈ Ai and since (a, u) 6∈
Fi , we know that (a, u) 6∈ Ai−1 . If (v, u) ∈ Ai−1 , then q = (a, v, u), other-
wise q = (v, b, u).
Case 2: (a, b) is deleted.
Clearly, there is a vertex v ∈ V such that either (a, b, v) or (v, a, b) is a P3
in Di−1 , hence, if u = v then q = (a, b, v) or q = (v, a, b). Since both
cases are analogous, we assume that (a, b, v) is a P3 in Di−1 . Furthermore,
if p 6= (a, u, b) then q = p, otherwise p = (a, u, b). If (u, v) ∈ Ai−1 , then q =
(a, u, v), otherwise q = (u, b, v).

Lemma 5.4. Let D = (V, A) be a digraph and let n := |V |. Rule 5.2 can be
implemented to run in O(n3 ) time.
Proof. First, we show how it is possible to determine whether a given ver-
tex v takes part in a P3 in (V, A) in O(n2 ) time. A vertex v takes part in
a P3 , if one of the following conditions is true:
1. succA (succA (v)) \ succA (v) 6= ∅.
2. succA (v) \ succA (predA (v)) 6= ∅.
3. predA (predA (v)) \ predA (v) 6= ∅.
Since the difference of two sets of size O(n) each can be calculated in O(n)
time and |succA (succA (v))| ∈ O(n2 ), determining whether a given vertex
takes part in a P3 can be done in O(n2 ) time. Second, we show that it is
possible to remove a vertex v from the digraph (V, A) in O(n2 ) time. For
each vertex a maximum of O(n) arcs have to be removed, and each arc
removal can be done in O(n) time. All in all, it follows that Rule 5.2 can be
implemented in O(n3 ) time.

Surprisingly, this fairly simple rule is sufficient to show a linear upper


bound on the number of vertices remaining in the kernel, if the maximum
degree of the given digraph is fixed. More precisely, we can show that, if the
maximum degree of D is a constant with regard to n and k, then we obtain
a kernel that has O(k) vertices by applying Rule 5.2.
5 POLYNOMIAL-TIME DATA REDUCTION 41

Theorem 5.5. Let D be a digraph that is reduced with respect to Rule 5.2
and let δ denote the maximum degree of D. If (D, k) is a yes-instance of
Transitivity Editing, then D contains at most 2k · (δ + 1) vertices.

Proof. Let S be an optimal solution set for D. Obviously, there is no vertex


in (V, A∆S) that is contained in any P3 . For each operation (u, v) ∈ S let
(
predA (u) ∪ succA (v) ∪ {u, v}, if (u, v) ∈ SDEL
R(u,v) :=
(predA (v) ∩ succA (u)) ∪ {u, v}, otherwise

denote the set of vertices in V that are affected by modifying the arc (u, v).
In the following, we prove that
[
R(u,v) = V
(u,v)∈S

and X
R(u,v) ≤ 2k(δ + 1).
(u,v)∈S

Suppose there is some w ∈ V such that w 6∈ R(u,v) for all (u, v) ∈ S.


Since D is reduced with respect to Rule 5.2, there is a P3 p in D that
contains w.
Case 1: p = (a, b, w).
Since S is a solution set for D, we know that (a, b) ∈ S, (b, w) ∈ S, or (a, w) ∈
S. Obviously, if (a, w) ∈ S or (b, w) ∈ S, then w ∈ R(a,w) or w ∈ R(b,w) ,
respectively. Hence, (a, b) ∈ S. However, we know that (b, w) ∈ A and
thus w ∈ succA (b). Since (a, b) ∈ A, this implies w ∈ R(a,b) .
Case 2: p = (w, a, b).
This case is completely analogous to case 1.
Case 3: p = (a, w, b).
Since S is a solution set for D, we know that (a, b) ∈ S, (w, b) ∈ S,
or (a, w) ∈ S. Obviously, if (a, w) ∈ S or (w, b) ∈ S, then w ∈ R(a,w) or w ∈
R(w,b) , respectively. Hence, (a, b) ∈ S. However, we know that (a, w) ∈ A
and (w, b) ∈ A and thus w ∈ predA (b) ∩ succA (a). Since (a, b) 6∈ A, this
implies w ∈ R(a,b) .
It remains to show
X
R(u,v) ≤ 2k(δ + 1).
(u,v)∈S
5 POLYNOMIAL-TIME DATA REDUCTION 42

Figure 12: Examples for the application of Rule 5.6. Left: If (u, v) 6∈ A
and |Z| > k then insert (u, v) into D. Right: If (u, v) ∈ A and |Zu |+|Zv | > k
then delete (u, v) from D. Note that x 6∈ Zu and y 6∈ Zv .

Obviously, since the maximum degree of D is δ, we know that |predA (u)| ≤ δ


and |succA (u)| ≤ δ for all u ∈ V . Hence, for each (u, v) ∈ S
(
R(u,v) = δ + δ + 2, if (u, v) ∈ A

δ + 2, otherwise,

and thus R(u,v) ≤ 2(δ + 1) for each (u, v) ∈ S. Since |S| = k, this implies
X
R(u,v) ≤ 2k(δ + 1).
(u,v)∈S

All in all, this implies




[ X
|V | =
R(u,v) ≤ R(u,v) ≤ 2k(δ + 1)
(u,v)∈S (u,v)∈S

Theorem 5.5 encourages the thought that the complexity of the problem
is partially related to the degree of the given digraph.
The following reduction rule follows an idea of Gramm et al. [GGHN05]
for the Cluster Editing problem: If there is some arc (a, b) in the given
digraph such that, if (a, b) is not modified, then each solution set must
contain more than k other arcs, then, in order for the solution set to contain
at most k arcs, (a, b) has to be modified. An example for the rule can be
found in Figure 12.
5 POLYNOMIAL-TIME DATA REDUCTION 43

Rule 5.6. Given an instance (D, k), with D = (V, A).

1. Let (u, v) ∈ (V × V )\A and

Z := succA (u) ∩ predA (v) .

If |Z| > k then add (u, v) to A and decrease k by one.

2. Let (u, v) ∈ A,

Zu := predA (u) \(predA (v) ∪ {u}), and

Zv := succA (v) \(succA (u) ∪ {v}).


If |Zu | + |Zv | > k then delete (u, v) from A and decrease k by one.

In order to prove the correctness, we need the following lemma.

Lemma 5.7. Rule 5.6 causes an arc insertion or an arc deletion iff this
operation destroys more than k P3 s in D.

Proof. “⇒”: Suppose that the application of Rule 5.6 to D causes an arc
insertion. Thus, there is a pair (u, v) ∈ V × V with (u, v) 6∈ A and |Z| > k
for Z := succA (u) ∩ predA (v). For all w ∈ Z, this operation destroys
the P3 (u, w, v) in D.
If the application of Rule 5.6 to D causes an arc deletion, then there
is a pair (u, v) ∈ A with |Zu | + |Zv | > k for Zu := predA (u) \ predA (v)
and Zv := succA (v) \ succA (u). For all w ∈ Zu and z ∈ Zv , this operation
destroys the P3 (w, u, v) and (u, v, z) in D. Thus, |Zu | + |Zv | > k P3 s in D
are destroyed in total.
“⇐”: Suppose the insertion of the arc (u, v) destroys more than k P3 s
in D, hence (u, v) 6∈ A and there is a set Z := {w ∈ V | (u, w, v) is a P3 in D}
with |Z| > k. Hence

∀w∈Z (u, w) ∈ A ∧ (w, v) ∈ A

and thus
∀w∈Z w ∈ succA (u) ∧ w ∈ predA (v)
which implies
Z ⊆ succA (u) ∩ predA (v) .
Since |Z| > k, Rule 5.6 applies.
5 POLYNOMIAL-TIME DATA REDUCTION 44

Suppose the deletion of the arc (u, v) destroys more than k P3 s in D,


hence (u, v) ∈ A and there are sets Zu = {w ∈ V | (w, u, v) is a P3 in D}
and Zv = {w ∈ V | (u, v, w) is a P3 in D} with |Zu | + |Zv | > k. Hence

∀w∈Zu (w, u) ∈ A ∧ (w, v) 6∈ A ∧ w 6= v

∀w∈Zv (v, w) ∈ A ∧ (u, w) 6∈ A ∧ w 6= u


and thus
∀w∈Zu w ∈ predA (u) ∧ w 6∈ predA (v) ∪ {v}
∀w∈Zv w ∈ succA (u) ∧ w 6∈ succA (v) ∪ {u}
which implies
Zu ⊆ predA (u) \(predA (v) ∪ {v})
Zv ⊆ succA (u) \(succA (v) ∪ {u}).
Since |Zu | + |Zv | > k, Rule 5.6 applies.

Lemma 5.8. Rule 5.6 is sound.

Proof. Let (D∗ , k−1) with D∗ = (V, A∗ ) denote the instance that is obtained
by applying Rule 5.6 to the given instance (D, k) with D = (V, A). Further-
more, let {(a, b)} = A∆A∗ , that is, applying Rule 5.6 modifies arc (a, b).
“⇐”: Suppose (D∗ , k − 1) is a yes-instance of Transitivity Editing.
Let S ∗ denote a solution set for D∗ with |S ∗ | ≤ k − 1. Obviously, S :=
S ∗ ∪ {(a, b)} is a solution set for D and |S| = |S ∗ | + 1 ≤ k.
“⇒”: Let (D, k) be a yes-instance of Transitivity Editing. In the
following, we show that all solution sets S for D with |S| ≤ k contain (a, b).
For the sake of contradiction, we assume that there is a solution set S to D
with |S| ≤ k and (a, b) 6∈ S. Lemma 5.7 implies that modifying (a, b)
destroys more than k different P3 s in D. Let p0 , . . . , pm with m ≥ k denote
these P3 s. For each 0 ≤ i ≤ m, the P3 pi must contain a and b in order
to be destroyed by modifying (a, b). Furthermore, pi must contain a third
vertex ci that is different from a and b. Obviously, all ci must be pairwise
different, otherwise the P3 s would not all be different. Since S is a solution
set for D, we know that for each 0 ≤ i ≤ m, it must contain one of the
arcs (a, ci ),(b, ci ),(ci , a),(ci , b) and (a, b). However, (a, b) 6∈ S and thus |S| ≥
m + 1 ≥ k + 1, contradicting S being a solution set for D.

Lemma 5.9. Let D = (V, A) be a digraph and let n := |V |. Rule 5.6 can be
executed in O(n3 ) time
5 POLYNOMIAL-TIME DATA REDUCTION 45

Proof. We show that, given a pair of vertices, we can execute Rule 5.6
in O(n) time. Let u, v ∈ V . If (u, v) 6∈ A, then we need to calculate succA (u)∩
predA (v), which can be done in O(n). If (u, v) ∈ A, then we need to calcu-
late predA (u) \ predA (v) and succA (v) \ succA (u), which can also be done
in O(n). Of course, we can determine if the sizes of the intersections are ≥ k
in constant time. Obviously, inserting or deleting (u, v) can be done in O(n)
as well.

With Rules 5.2 and 5.6 established, we look at the size of the remaining
instance. In the following, we show a kernel of O(k 2 ) vertices.

Theorem 5.10. Transitivity Editing admits a problem kernel contain-


ing at most k(k + 2) vertices.

Proof. Assume that there is a digraph D = (V, A) with |V | > k(k + 2)


and D is reduced with respect to Rules 5.2 and 5.6 and it is possible to
turn D transitive by applying at most k operations. Let D0 = (V, A0 ) denote
the transitive digraph obtained by the application of these k operations and
let S := A∆A0 denote the solution set. Consider the following disjoint
decomposition of V :

Y := {v ∈ V | ∃u∈V (u, v) ∈ S ∨ (v, u) ∈ S}

X := V \Y
Note that all vertices in X are adjacent to at least one vertex in Y because D
is reduced with respect to Rule 5.2. Also note that in order to destroy a P3 p
in D, the solution set S must contain an arc incident to two of the vertices
of p, hence for each P3 p in D at most one of the vertices of p is in X.
Since D can be turned transitive with at most k operations, we know
that |S| ≤ k and consequently |Y | ≤ 2k. Obviously |V | = |X| + |Y |, hence
the assumption that |V | > k(k + 2) implies |X| > k 2 . With the above
observation, it follows that there are more than k 2 P3 s in D.
For each operation (a, b) ∈ S, let

Z(a,b) := {p | (a, b) destroys the P3 p} ∩ {p | p is a P3 in D}.



Since D is reduced with respect to Rule 5.6, Lemma 5.8 implies that Z(a,b) ≤
k for each (a, b) ∈ S. Since there are more than k 2 P3 s in D, but |S| ≤ k,
we know that there is a P3 q in D with

∀(a,b) q 6∈ Z(a,b)
5 POLYNOMIAL-TIME DATA REDUCTION 46

and thus [
q 6∈ Z(a,b) .
(a,b)∈S

Hence, q is not destroyed by S, contradicting S being a solution set.

Note that the existence of a polynomial-size problem kernel was an open


question of Böcker et al. [BBK09].

A lower bound for k. An important topic, not only for kernelization but
also for the implementation of the search-tree algorithm, is to find a lower
bound for the kernel size. In the following, we present a train of thought
that enables us to find a lower bound for the number of deletions needed to
turn a given digraph transitive: The main idea is that two P3 s that do not
interfere cannot be destroyed by a single operation and, hence, if a given
digraph can be turned transitive with at most k operations, then it cannot
contain more than k disjoint P3 s. In this context, recall that two P3 s are
called disjoint if they share at most one vertex (See Section 2.1 on page 6).

Definition 5.11. Let D be a digraph and Γ the set of all P3 s in D. The


undirected graph CD = (Γ, E) with

E := {{p, q} | p and q are not disjoint}

is called the P3 -conflict graph of D.

Theorem 5.12. Let D = (V, A) be a digraph and CD its P3 -conflict graph.


Let I denote an independent set of CD . The size of each solution set S for D
can be bounded by
|S| ≥ |I| .

Proof. Suppose there is an optimal solution set S 0 for D with |S 0 | < |I|. Con-
sider some operation (a, b) ∈ S 0 that destroys a P3 p = (u, v, w) in I. Since p
is destroyed by the operation, we know that (a, b) ∈ {(u, v), (v, w), (u, w)}.
However, since I is an independent set, we also know that all P3 s represented
by vertices in I are disjoint. Hence,

∀q∈I\{p} q = (x, z, y) ⇒ (a, b) 6∈ {(x, y), (y, z), (x, z)},

and thus there is no q ∈ I\{p} that can be destroyed by the operation (a, b).
Since (a, b) is arbitrary, we know that no arc in S can destroy more than
one P3 represented by a vertex in I and thus the theorem follows.
5 POLYNOMIAL-TIME DATA REDUCTION 47

The lower bound for the size of solution sets can be used when travers-
ing the search tree: If a subtree is discovered that requires turning a di-
graph D transitive with at most k operations and D is known to require
more than k operations, then we can skip over the subtree in the search tree
algorithm, thus saving time. Furthermore, it can be used in conjunction
with Rule 5.6:

Rule 5.13. Given an instance (D, k) with D = (V, A). Let I denote an
independent set of the P − 3-conflict graph of D.

1. Let (u, v) ∈ (V × V )\A and let

Z := succA (u) ∩ predA (v) .

Furthermore, let

R := {p | ∀w∈Z p contains (w, u) or (v, w)}.

If |Z| > k − |I\R|, then add (u, v) to A and decrease k by one.

2. Let (u, v) ∈ A,

Zu := predA (u) \(predA (v) ∪ {u}), and

Zv := succA (v) \(succA (u) ∪ {v}).


Furthermore, let

Ru := {p | ∀w∈Zu p contains (w,u)},

Rv := {p | ∀w∈Zv p contains (v,w)}, and


R := Ru ∪ Rv .
If |Zu | + |Zv | > k − |I\R| then delete (u, v) from A and decrease k by
one.

Obviously, modifying (u, v) can only destroy those P3 s that are in R.


Hence, if (u, v) is not modified, we have to modify more than (k − |I\R|) +
|I\R| = k arcs in order to turn D transitive. Apart from that, the proof of
soundness is analogous to the proof of Lemma 5.8 and is therefore omitted.

Lemma 5.14. Rule 5.13 is sound.


5 POLYNOMIAL-TIME DATA REDUCTION 48

Input: A directed graph D = (V, A)


Output: A large independent set of the P3 -conflict graph of D
1 forall pairs of vertices (u, v) do
2 ai clique[(u, v)] := (predA (u) \ predA (v)) × {u} × {v} ∪ {u} ×
{v} × (succA (v) \ succA (u));
3 if |ai clique[(u, v)]| ≥ 2 then
4 forall P3 p ∈ ai clique[(u, v)] do
5 ai clique count[p] := ai clique count[p] + 1;
6 ai clique sizes[p] := ai clique sizes[p] + |ai clique[(u, v)]|;
7 end
8 end
9 end
10 P := get all P3 (D);
/* sort P by ai clique count and ai clique sizes, biased by
ai clique count. */
11 sort(P, clique count, clique sizes);
12 forbidden arcs := ∅;
13 disjoint P3 := ∅;
14 forall p ∈ P in ascending order do
15 if none of the three arcs of p are in forbidden arcs then
16 insert p into disjoint P3 ;
17 insert all three arcs of p into forbidden arcs ;
18 end
19 end
20 return disjoint P3 ;

Algorithm 2: The algorithm for finding a large independent set of


the P3 -conflict graph of the given digraph D. Its worst-case running
time is O(n3 log n).
5 POLYNOMIAL-TIME DATA REDUCTION 49

Since the size of any independent set of CD is a lower bound for the size
of a solution set for D, the best lower bound is of course the maximum inde-
pendent set of CD . However, since the problem of obtaining this maximum
independent set is NP-hard, we may limit ourselves to finding a fairly good
independent set. This can be done by calculating a maximal matching M
of vertices in CD 1 (all vertices of CD that are not in M form an indepen-
dent set). In practice, it appears that the lower bound for the size of the
solution sets can be improved significantly by not just picking any maximal
matching of CD but finding a small maximal matching, yielding a larger
independent set. By conventional means, finding a maximal matching in
the P3 -conflict graph may take O(n6 ) time, since there may be O(n3 ) P3 s in
the given digraph. In the following, we present an approach that computes
a large independent set of the P − 3-conflict graph CD in O(n3 log n) time
(see Algorithm 2).
There are two key observations: First, each pair of vertices (u, v) ∈ V 2
causes a clique in CD . The vertices of this clique are all P3 s that contain u
and v, since these P3 s cannot be disjoint. We refer to these cliques as
arc-induced cliques. Second, each P3 can be contained in at most three arc-
induced cliques. For each pair of vertices, its clique can be found in O(n)
time. Since there are O(n2 ) pairs of vertices, we can find all arc-induced
cliques in the P3 -conflict graph in O(n3 ) time. The next step is to sort
all P3 s by the number of arc-induced cliques of size ≥ 2 they are contained
in and the sum of the sizes of all arc-induced cliques they are contained in,
biased by the former. This can be done in O(n3 log n) time. In the last step,
we take P3 s one by one in ascending order and insert their arcs into a set of
forbidden arcs. If one of the arcs of a P3 is in the set of forbidden arcs, then
we simply discard it, otherwise, it is joined into the set of disjoint P3 s. Since
there are up to n3 P3 s in the given digraph, this can take up to O(n3 log n)
time. Thus, the overall running time is bounded by O(n3 log n).

Lemma 5.15. Given a digraph D, Algorithm 2 returns an independent set


of the P3 -conflict graph CD and runs in O(n3 log n) time.

Proof. Let I denote the set that is returned by Algorithm 2 and suppose I is
not an independent set of CD . Then there are two P3 s p and q in I that are
connected in CD . Thus, by definition of CD , the P3 s p and q are not disjoint.
Without loss of generality, we assume that p succeeds q in the set P after
it has been sorted in Line 11. Then, however, all arcs of p are inserted into
1
Note that, the smaller the maximal matching, the larger the independent set. Unfor-
tunately, the MinMax-Matching problem is NP-hard [YG80].
5 POLYNOMIAL-TIME DATA REDUCTION 50

the set called “forbidden arcs” and since p and q are not disjoint, q cannot
be inserted into the resulting set.
It remains to show that the algorithm runs in O(n3 log n) time. As
sketched in the above text, processing the first loop takes at most O(n3 )
time, since there are at most n2 pairs of vertices and for each pair, there
are O(n) P3 s that contain the pair. The set of all P3 s in D can be calcu-
lated and sorted in O(n3 log n) time because there cannot be more than n3
different P3 s in D. Since |P | ∈ O(n3 ) and the insertion of p into the set of
disjoint P3 s may take up to O(log n) time, the final loop runs in O(n3 log n)
time. All in all, the running time of the algorithm does not exceed O(n3 log n).

5.2 Further Data Reduction


In this section, we present rules for data reduction that are provably correct
but, as we show in the final paragraph of this section, do not provide an
asymptotically smaller bound on the kernel size than the one that we estab-
lished in Section 5.1. In fact, using only the rules considered in this section
does not provide a bound of the instance size at all.

5.2.1 Isolating Peripheral Arcs


The idea is that arc deletions that are not incident to a source or a sink can
potentially destroy P3 s “in two directions” and are thus more effective then
arc deletions that are incident to a source or a sink. Consider for example
a set of sources R that are predecessors of a vertex v and let T denote the
successors of v with R and T being disjoint. For simplicity, let all vertices
in R be of degree one (see Figure 13). There are several P3 s in this structure,
each of the form (r, v, t) with r ∈ R and t ∈ T . These P3 s can be destroyed
by either inserting (r, t), deleting (r, v), or deleting (v, t). However, if the
indegree of v is greater than the outdegree, then deleting (v, t) is at least as
good as any other solution, because deleting (v, t) potentially destroys also
every P3 (v, t, x) with x ∈ V . Note that, in this example, Lemma 3.4 implies
that inserting arcs cannot yield a better solution. A generalization of this
idea is implemented in the following rule.

Rule 5.16. Let D = (V, A) be a digraph and v some vertex that is not part
of the belt of a diamond. Let VSRC denote the set of all sources in D, let

R := predA (v) ∩ VSRC ∩ r | ∃u∈succA (v) (r, u) 6∈ A ,
5 POLYNOMIAL-TIME DATA REDUCTION 51

Figure 13: Rule 5.16: If all vertices in R are sources and we know that
deletion is optimal and the indegree of v is greater than its outdegree, then
deleting all arcs leaving v is at least as good as any other solution.

and let
T := succA (v) ∩ {t | ∃r∈R (r, t) 6∈ A} .
Furthermore, for each T 0 ⊆ T , let
\
RT 0 := R ∩ predA (t)
t∈T 0

denote the set of all vertices in R that are predecessors of all vertices in T 0 .
If |T 0 | + |RT 0 | ≤ |R| for all T 0 ⊆ T , then delete all arcs in {v} × T and
modify k accordingly.

Note that it is unclear whether checking for |T 0 |+|RT 0 | ≤ |R| for all T 0 ⊆
T is possible in polynomial time. However, it is possible to determine in
polynomial time, whether a condition that implies |T 0 | + |RT 0 | ≤ |R| for
all T 0 ⊆ T is true.

Lemma 5.17. Let D, R, T , RT 0 , and v be as in Rule 5.16. If



|T | + max R{t} ≤ |R| (3)
t∈T

then |T 0 | + |RT 0 | ≤ |R| for all T 0 ⊆ T .

Proof. Let T 0 be chosen arbitrarily with T 0 ⊆ T . We show that


0
T + |RT 0 | ≤ |T | + max R{t} .
t∈T
5 POLYNOMIAL-TIME DATA REDUCTION 52

Obviously, |T 0 | ≤ |T |. Furthermore,

|RT 0 | ≤ max0 R{t} ≤ max R{t} .
t∈T t∈T

The following Rule is a special case of Rule 5.16 that can be executed in
polynomial time.
Rule 5.18. Let D = (V, A) be a digraph and v some vertex that is not part
of the belt of a diamond. Let VSRC denote the set of all sources in D, let

R := predA (v) ∩ VSRC ∩ r | ∃u∈succA (v) (r, u) 6∈ A ,
and let
T := succA (v) ∩ {t | ∃r∈R (r, t) 6∈ A} .
Furthermore, for each t ∈ T , let Rt := R ∩ predA (t). If |T | + maxt∈T |Rt | ≤
|R|, then delete all arcs in {v} × T and modify k accordingly.
By Lemma 5.17, it is clear that the preconditions of Rule 5.18 imply the
preconditions of Rule 5.16. Thus, if Rule 5.16 is correct, then Rule 5.18 is
also correct.
It is not hard to see that Rule 5.16 can be modified such that it works
with sinks instead of sources. All lemmas in this section are true for both
Rules 5.16 and 5.19, but the proofs for Rule 5.19 are omitted since they are
always completely analog.
Rule 5.19. Let D = (V, A) be a digraph and v some vertex that is not part
of the belt of a diamond. Let VSNK denote the set of all sinks in D, let

R := succA (v) ∩ VSNK ∩ r | ∃u∈predA (v) (u, r) 6∈ A ,
and let
T := predA (v) ∩ {t | ∃r∈R (t, r) 6∈ A} .
Furthermore, for each t ∈ T , let Rt := R ∩ succA (t). If |T | + maxt∈T |Rt | ≤
|R|, then delete all arcs in T × {v} and modify k accordingly.
Before proving the correctness of the reduction rules, we need the fol-
lowing lemma.
Lemma 5.20. Let D, R, T , RT 0 and v be as described in Rule 5.16. Let S
be an optimal solution set for D that does not contain any arc insertions
between R and T . Furthermore, let |T 0 | + |RT 0 | ≤ |R| for all T 0 ⊆ T . Then
|({v} × T )\S| ≤ |(R × {v}) ∩ S| .
5 POLYNOMIAL-TIME DATA REDUCTION 53

Proof. Let TS := succA\S (v) ∩ T denote the set of vertices of T that are
successors of v in (V, A\S). Obviously, for each t ∈ TS , there is some r ∈
R\RTS such that (r, v, t) is a P3 in D. Since S is a solution set for D that
does not delete any arc (v, t) with t ∈ TS , we know that

(R\RTS ) × {v} ⊆ S.

Hence,

|(R × {v}) ∩ S| ≥ |((R\RTS ) × {v}) ∩ S|


= |(R\RTS ) × {v}|
= |R\RTS |
= |R| − |RTS |
≥ |TS |
= |({v} × T )\S| .

Lemma 5.21. For transitivity Editing, Rule 5.16 is sound.

Proof. Let D, R, T , and v be as described in Rule 5.16. Furthermore,


let X := succA (v) \T . The construction of these sets imply the following
observations:

1. ∀t∈T ∃r∈R (r, t) 6∈ A

2. ∀r∈R ∃t∈T (r, t) 6∈ A

3. ∀x∈X ∀r∈R (r, x) ∈ A

Note that the third observation is semantically equal to T and X being


disjoint. The precondition of Rule 5.16 that v is not part of the belt of a
diamond will be referred to as diamond constraint. It implies

(X × T ) ∩ A = ∅. (4)

Likewise, ∀r∈R (succA (r) ∩ T ) × (T \ succA (r)) ∩ A = ∅.


In the following, we show that there is always an optimal solution set
for D that deletes all arcs (v, t) with t ∈ T . Let S denote an optimal solution
set for D and let D0 := (V, A∆S). By Lemma 3.3, we can assume that for
all vertices r ∈ R and t ∈ T , if there is a path of length ≥ 3 from r to t in D0 ,
then it is (r, v, t). Let X + := succA∆S (v) ∩ X and X − := X\X + . Note
5 POLYNOMIAL-TIME DATA REDUCTION 54

that (X + × X − ) ∩ (A∆S) = ∅, since if there were x+ ∈ X + and x− ∈ X − ,


with (x+ , x− ) ∈ A∆S, then (v, x+ , x− ) would be a P3 in (V, A∆S). Also,
for any set Z ⊆ V , let
[
reach∗D0 (Z) := reachD0 (z) \ (T ∪ X ∪ {v})
z∈Z

denote the set of all vertices that are not in X ∪ T ∪ {v} but are reachable
from some z ∈ Z in D0 . Let XZ := {x ∈ X + | reach∗D0 ({x}) ∩ Z 6= ∅} denote
the set of all vertices in X + from which a vertex in Z can be reached. There
are a number of interesting facts to notice about this modified reachability
function:

1. reach∗D0 () is linear, since for each u ∈ V , X1 ⊆ V , and X2 ⊆ V

u ∈ reach∗D0 (X1 ∪ X2 ) ⇔ (∃x∈X1 u ∈ reachD0 (x) ∨


∃x∈X2 u ∈ reachD0 (x)) ∧
u 6∈ X ∪ T ∪ {v}
⇔ u ∈ reach∗D0 (X1 ) ∨ u ∈ reach∗D0 (X2 )
⇔ u ∈ (reach∗D0 (X1 ) ∪ reach∗D0 (X2 ))

2. For all y ∈ reach∗D0 (X + ), we know that (v, y) ∈ SINS , since other-


wise, (v, x, y) with x ∈ X + would be a P3 in (D0 ).

3. For all Z ⊆ reach∗D0 (X + ), it is clear that |Z| ≤ |XZ |, since other-


wise, we can obtain a solution set that is smaller than S by removing
all (v, z) ∈ SINS with z ∈ Z and adding all (v, x) ∈ A\S with x ∈ XZ ,
contradicting the optimality of S.

Let Y := X + ∪ reach∗D0 (X + ) ∪ {v}. Note that (Y × X − ) ∩ (A∆S) = ∅, since


if there was some y ∈ Y and some x ∈ X − with (y, x) ∈ A∆S, then (v, y, x)
would be a P3 in (V, A∆S). Figure 14 gives an overview of the situation
with most of the defined sets.
We now construct a solution set S 0 for D from S by replacing all arc
deletions in R × {v} with arc deletions in {v} × T . To avoid the creation of
additional P3 s of the form (r, v, y), we have to make sure that R×Y ⊆ A∆S 0 .
In order for S 0 to be at most as large as S, we also remove all arc insertions
from R to T . This however forces us to remove all arcs in Y × T as well. To
this end, consider the following arc sets:
+
SDEL := (Y × T ) ∩ (A\S)
5 POLYNOMIAL-TIME DATA REDUCTION 55

Figure 14: An overview over most of the defined sets of the proof for
Lemma 5.21. Not all arcs are drawn here.

+
SINS := (R × Y ) ∩ A ∪ S
+ +
The set S + := SDEL ∪SINS then contains all arc modifications that are added
to those in S. Furthermore, consider the following sets.

SDEL := (R × Y ) ∩ SDEL

SINS := (((R ∪ Y ) × T ) ∩ SINS ) ∪ ({v} × (V \Y ) ∩ SINS )
− −
The set S − := SDEL ∪ SINS then contains all arc modifications that are re-
+ + −
moved from those in S. In this context, note that the sets SDEL , SINS , SDEL ,
− − +
and SINS are pairwise disjoint. Furthermore, S ⊆ S and S ∩ S = ∅. Fi-
nally, the set S 0 is constructed by

S 0 := (S ∪ S + )\S − .

The following observations are direct consequences of the construction of S 0 :

(R × X) ∩ (A∆S) ⊆ (R × X) ∩ (A∆S 0 ), (5)



+
SINS ∪ SDEL ⊆ R × Y ⊆ A∆S 0 , (6)
(Y × (T ∪ X − )) ∩ (A∆S 0 ) = ∅, and (7)
5 POLYNOMIAL-TIME DATA REDUCTION 56

∀z∈Y succA∆S 0 (z) ⊆ Y . (8)


In the following, we show that S0
is a solution set for D. For the sake of
contradiction, we assume that there is a P3 (a, b, c) in (V, A∆S 0 ). Obviously,

(a, b) ∈ A∆S 0 ∧ (b, c) ∈ A∆S 0 ∧ (a, c) 6∈ A∆S 0 .

Since (a, b, c) is not a P3 in D0 , one of the following must be true:


+ −
1. (a, b) ∈ SINS ∪ SDEL
+ −
2. (b, c) ∈ SINS ∪ SDEL
+ −
3. (a, c) ∈ SDEL ∪ SINS
+ −
Case 1: (a, b) ∈ SINS ∪ SDEL .
By (6), we know that (a, b) ∈ R × Y . However, by (8), it is clear that c ∈
Y and thus, (6) implies (a, c) ∈ A∆S 0 contradicting (a, b, c) being a P3
in (V, A∆S 0 ).
+ −
Case 2: (b, c) ∈ SINS ∪ SDEL .
By (6), we know that (b, c) ∈ R × Y . Corollary 3.7 implies that all vertices
in R are sources in D0 . Furthermore, since S 0 does not insert arcs that are
incoming to any vertex in R, it is clear that all vertices in R are also sources
in (V, A∆S 0 ) and thus succA∆S 0 (b) = ∅, contradicting (a, b, c) being a P3
in (V, A∆S 0 ).
+ −
Case 3: (a, c) ∈ SDEL ∪ SINS .
Case 3.1: (a, c) ∈ (Y × T ) ∩ (A∆S).
By (8) we know that b ∈ Y . However, (7) implies (b, c) 6∈ A∆S 0 , contradict-
ing (a, b, c) being a P3 in (V, A∆S 0 ).
Case 3.2: (a, c) ∈ ({v} × (V \Y )) ∩ SINS .
By (8), we know that b ∈ Y . Likewise, this implies c ∈ Y which, however,
contradicts c ∈ V \Y .
Case 3.3: (a, c) ∈ (R × T ) ∩ SINS .
Recall that S was constructed in such a way that for all r ∈ R and t ∈ T ,
if there is a path of length ≥ 3 from r to t in (V, A∆S), then it is (r, v, t).
Hence, if there is a path of length ≥ 3 from a to c in (V, A∆S), then it
− +
is (a, v, c). However, we know that SDEL ∪ SINS ⊆ R × Y . Thus (7) implies
0
that also in (V, A∆S ), the only path from a to c that contains at least
two arcs is (a, v, c), which implies b = v. Since v ∈ Y , it is also clear
that (v, c) 6∈ A∆S 0 , contradicting (a, b, c) being a P3 in (V, A∆S 0 ).
All in all, each of the cases leads to a contradiction and hence, our
assumption that (a, b, c) is a P3 in (V, A∆S 0 ) is proved wrong. Thus, S 0 is
indeed a solution set for D.
5 POLYNOMIAL-TIME DATA REDUCTION 57

t
Figure 15: Visualization of the sets XINS and Yt in the proof of Lemma 5.21.
Dotted arcs are inserted by S.

To complete the proof, we need to show that S is at least as large as S 0 .


It is sufficient to show |S + | ≤ |S − |. In particular, we show the following
inequations:
({v} × T ) ∩ S + ≤ (R × {v}) ∩ S −

DEL DEL (9)
(R × (Y \{v})) ∩ S −
+
S ≤
INS − DEL
(Y \{v} × T ) ∩ S + ≤

S
DEL INS

Note that (9) follows directly from Lemma 5.20. For the following proofs,
recall that |Z| ≤ |XZ | for each Z ⊆ reach∗D0 (X + ), and that the func-
tion reach∗D0 () is linear.
In the following, we show that
(Y \{v} × T ) ∩ S + ≤ S − .

DEL INS

+
By (4), we know that ((Y \{v}) × T ) ∩ SDEL = (reach∗D0 (X + ) × T ) ∩ SDEL
+
.
Thus, it suffices to show that
reach∗ 0 X + × T ∩ S + ≤ X + × T ∩ SINS .
  
D DEL (10)

To this end, we show that, for each t ∈ T , the solution set S contains
more arc insertions from X + to t than there are arcs from reach∗D0 (X + )
to t. For each t ∈ T , consider its predecessors in D0 . Especially, consider
those predecessors of t that are either in X + or in reach∗D0 (X + ). More
t
formally, these two sets are denoted by XINS := predA∆S (t) ∩ X + and Yt :=
5 POLYNOMIAL-TIME DATA REDUCTION 58

r
Figure 16: Visualization of the sets XDEL and Y r in the proof of Lemma 5.21.
Dashed arcs are deleted by S.

predA∆S (t) ∩ reach∗D0 (X + ), respectively. See Figure 15 for a visualization.


Since D0 is transitive, we know that for all t ∈ T and all x ∈ X +

x ∈ XYt ⇒ ∃y∈reach∗ 0 ({x}) y ∈ Yt


D

⇒ ∃y∈reach∗ 0 ({x}) (y, t) ∈ A∆S


D

⇒ (x, t) ∈ SINS
t
⇒ x ∈ XINS .
t , which implies |Y | ≤ |X | ≤
Thus,
t
for all t ∈ T , it is clear that
t
XYt ⊆ XINS t Yt
X and thus |Yt × {t}| ≤ X × {t} . By the definition of X t and Yt ,
INS INS INS
it is clear that Yt × {t} = (reach∗D0 (X + ) × {t}) ∩ (A∆S) and XINS t × {t} =
(X + × {t}) ∩ SINS . This implies that

∀t∈T reach∗D0 X + × {t} ∩ (A∆S) ≤ X + × {t} ∩ SINS .


  

Obviously, all sets on the left hand side are pairwise disjoint and so are all
sets on the right hand side. Thus, we know the sizes of their respective
unions. It is not hard to see that (10) follows.
In the following, we show that
S ≤ (R × (Y \{v})) ∩ S − .
+
INS DEL
5 POLYNOMIAL-TIME DATA REDUCTION 59

By definition of R and X + , we know that R × (X + ∪ {v}) ⊆ A which


+
implies (R × Y ) ∩ SINS = (R × reach∗D0 (X + )) ∩ SINS
+
. Thus, it suffices to
show that
R × reach∗ 0 X + ∩ S + ≤ R × X + ∩ SDEL .
 
D INS (11)

To this end, we show for each r ∈ R that S contains more arc deletions
from r to X + than insertions are needed to complete r ×reach∗D0 (X + ) in D0 .
For each r ∈ R, consider its successors in D0 . Especially, consider those
successors of r that are either in X + or in reach∗D0 (X + ). More formally, these
two sets are denoted by XDEL r := succA∆S (r) ∩ X + and Y r := succA∆S (r) ∩

reachD0 (X + ), respectively. See Figure 16 for a visualization. Since D0 is
transitive, we know that for all r ∈ R and all x ∈ X +
r
x 6∈ XDEL ⇒ (r, x) ∈ A∆S
⇒ ∀y∈reach∗ 0 ({x}) (r, y) ∈ A∆S
D
⇒ ∀y∈reach∗ 0 ({x}) y 6∈ Y r
D
⇒ x 6∈ XY r .

Thus, for all r ∈ R, it is clear that XY r ⊆ XDEL r , which implies |Y r | ≤


r r r
|XY r | ≤ |XDEL | and thus |{r} × Y | ≤ |{r} × XDEL |. By definition of XDELr

and Y r , it is clear that {r} × Y r = ({r} × reachD0 (X + )) ∩ A∆S and {r} ×
r
XDEL = ({r} × X + ) ∩ SDEL . This implies that

∀r∈R ({r} × reach∗D0 (XDEL


r r

)) ∩ A∆S ≤ |({r} × XDEL ) ∩ SDEL |

Obviously, all sets on the left hand side are pairwise disjoint and so are all
sets on the right hand side. Thus, we know the sizes of their respective
unions. It is not hard to see that (11) follows.
Altogether, we have proved that S 0 is also an optimal solution set for D
and that S 0 contains all (v, t) ∈ A with t ∈ T . Thus, there is an optimal
solution set for D that deletes all arcs in T ×{v}. The correctness of Rule 5.16
follows.

Although the application of Rule 5.16 causes only arc deletions, it is not
obvious that it can be applied for Transitivity Deletion as well. The
proof presented above allows arc insertions in the original solution set S and
relies on arc insertions in the constructed solution S 0 . Hence, the following
proof is needed to apply Rule 5.16 to instances of Transitivity Deletion.

Lemma 5.22. Rule 5.16 is sound for Transitivity Deletion.


5 POLYNOMIAL-TIME DATA REDUCTION 60

Proof. Let S denote an optimal solution set for D with S = SDEL and
let D0 := (V, A\S). As in the proof for Lemma 5.21, let X := succA (v) \T
and X + := succA\S (v). Note that, under this circumstance we know that

succA\S X + ⊆ X + ,

(12)

since, if there was some u ∈ succA\S (X + ) \X + then there would be some x ∈


X + such that (v, x, u) is a P3 in (V, A\S). Finally, let Y := X + ∪ {v}.
In the following, we show that S 0 := (S ∪ S + )\S − with

S − := R × Y

and
S + := {v} × T
is an optimal solution set for D that deletes all arcs in {v} × T . In this
context, note that
R × Y ⊆ A\S 0 . (13)
Suppose S 0 was not a solution set for D. Then there is some P3 (a, b, c)
in (V, A\S 0 ) that is not in (V, A\S), and hence (a, b) ∈ S − , (b, c) ∈ S − ,
or (a, c) ∈ S + .
Case 1: (a, b) ∈ R × Y .
By (12), it is obvious that c ∈ X + . Then however, (13) contradicts (a, b, c)
being a P3 in (V, A\S 0 ).
Case 2: (b, c) ∈ R × Y .
Since R is a set of sources and S 0 contains only arc deletions, it is clear
that b is a source in (V, A\S 0 ), contradicting (a, b, c) being a P3 in (V, A\S).
Case 3: (a, c) ∈ {v} × T .
Since a = v, it is clear that b ∈ X + . This implies that there is some r ∈ R
with (r, b) ∈ A and thus, (r, a, c) and (r, b, c) are P3 s in D, contradicting the
diamond constraint.
Suppose S 0 was not optimal, that is, |S 0 | > |S|. Since S − ⊆= S and S + ∩
S = ∅, this implies |S + | > |S − |. However, by Lemma 5.20, we know that

S ≥ |(R × {v}) ∩ S| ≥ |({v} × T )\S| = S + ,

contradicting |S + | > |S − |.

When looking at the proof of Lemma 5.22, we discover that we make


only limited use of the fact that all vertices in R are sources. In fact, the
proof remains almost the same if, instead of being sources, we require for
all r ∈ R that r is not the middle of a P3 in D.
5 POLYNOMIAL-TIME DATA REDUCTION 61

Definition 5.23. For a digraph D = (V, A), the set of all P3 middles is
denoted by
MD := {b | ∃a,c∈V (a, b, c) is a P3 in D}.
Lemma 5.24. Let D = (V, A) be a digraph and a ∈ V . The vertex a is not
in MD iff
predA (a) × succA (a) ⊆ A. (14)
Proof. Let (x, a, y) denote a path in D. The following statements are equal:
• (x, a, y) is a P3 in D

• (x, y) 6∈ A

• a does not satisfy (14)

Obviously, all sources and sinks are in MD and thus we can expect a
larger set R for the application of Rule 5.16. Hence, it is more likely to find
some vertex v that the rule is applicable to.
Lemma 5.25. Rule 5.16 is sound for Transitivity Deletion, even with

R := predA (v) ∩ MD ∩ r | ∃u∈succA (v) (r, u) 6∈ A .

Proof. Let all notations except R and S 0 be as in the proof for Lemma 5.22.
In the following, we show that S 0 := (S ∪ S + )\S − with

S − := (R ∪ predA\S (R)) × Y

and
S + := {v} × T
is an optimal solution set for D that deletes all arcs in {v} × T . By con-
struction of S 0 and (12), it is clear that

succA\S 0 (Y ) ⊆ Y . (15)

Furthermore, since R ⊆ MD , it is clear that predA (R) × Y ⊆ A and thus,

(R ∪ predA\S (R)) × Y ⊆ A\S 0 . (16)

As in the proof for Lemma 5.22, the fact that |S 0 | ≤ |S| follows directly
from Lemma 5.20. For the sake of contradiction, we assume that (a, b, c) is
a P3 in (V, A\S 0 ) but not in (V, A\S).
5 POLYNOMIAL-TIME DATA REDUCTION 62

Case 1: (a, b) ∈ S − .
Obviously, b ∈ Y and thus, (15) implies c ∈ Y . Hence, by (16), we know
that (a, c) ∈ A\S 0 , a contradiction.
Case 2: (b, c) ∈ S − .
Case 2.1: b ∈ R.
Since (a, b) 6∈ S − , it is clear that (a, b) ∈ A\S. Thus, a ∈ predA\S (R) and
by (16) we know that (a, c) ∈ A\S 0 , a contradiction.
Case 2.2: b ∈ predA\S (R).
Then, however, there must be some r ∈ R with (b, r) ∈ A\S. Since (a, b) 6∈
S − , it is clear that (a, r) ∈ A\S, implying a ∈ predA\S (R) and thus, by (16),
it is clear that (a, c) ∈ A\S 0 , a contradiction.
Case 3: (a, c) ∈ S + .
Obviously, a = v and c ∈ T . Hence, (15) implies b ∈ Y and thus, c ∈ Y ,
contradicting c ∈ T .

5.2.2 Kernel Consideration.


Although providing further data reduction, Rules 5.18 and 5.19 do not pro-
vide an asymptotic improvement for the kernel size for either Transitiv-
ity Editing or Transitivity Deletion, even when modified according
to Lemma 5.25. In the following, we present a construction that is reduced
with respect to Rules 5.2, 5.6, 5.18 and 5.19 and contains O(k 2 ) vertices.
First, we need a base to which graph modules can connect:

Vb := {a, b} Ab := {(a, b), (b, a)}

Furthermore, for a given set of vertices Qi , we construct a graph module as


follows:
[
Vim := Qi ∪ {ui , vi , wi } Am
i := {(vi , q)} ∪ {(ui , vi ), (vi , wi )}
q∈Qi

Each module connects to the base by connecting a and b to all vertices


in Qi ∪ {vi }. The complete structure is then described by (see Figure 17)
[ [
V := Vb ∪ Vim A := Ab ∪ Ami ,
0≤i<d 0≤i<d

with d denoting the number of modules and all Qi being disjoint. We


find that the construction is optimally turned transitive by deleting (ui , vi )
and (vi , wi ) for each module i and deleting the base arc (a, b). Since there
are d := (k − 1)/2 modules, it is clear that k modifications are required
5 POLYNOMIAL-TIME DATA REDUCTION 63

Figure 17: A digraph that is reduced with respect to all presented rules.
Note that d = (k − 1)/2 and |Qi | = k − 1 for all modules i. Its size is O(k 2 ).

in total. Rule 5.19 does not apply to vi since in this case T = {b, ui },
R = Qi ∪ {wi }, and maxt∈T Rt = Qi . Hence,

|T | + max R{t} = |{b, ui }| + |Qi | = k + 1 > k = |R| .
t∈T

Rule 5.18 does not apply to (V, A) since the only sources in this construction
are the vertices ui . However, since in this case T = Qi ∪ {wi } and R = {ui },
it is clear that

|T | + max R{t} = |Qi | + 1 = k > 1 = |R| .
t∈T

Furthermore, for each module i, the P3 s (a, b, vi ), (b, vi , wi ), and (ui , vi , qi )


for all qi ∈ Qi contain all vertices of the module and a and b. Thus, there is
no vertex that is not contained in a P3 and hence, Rule 5.2 does not apply.
Since the construction is diamond-free, it suffices to consider all arcs in A
in order to prove that Rule 5.6 does not apply. The construction implies
[
succA (b) \(succA (a) ∪ {a}) = {vi }.
0≤i<d

Since this set contains at most (k − 1)/2 vertices, Rule 5.6 does not apply
to (a, b). Furthermore, the indegrees of a and b are 1 and the indegree of
each vi is 2. Thus, for each module i, Rule 5.6 does not apply to any arc
in {a, b, vi } × Qi and (vi , wi ). For each module i, it is clear that

succA (vi ) \(succA (b) ∪ {b}) = {wi }


6 SEARCH TREE ALGORITHM 64

and thus, Rule 5.6 does not apply to (b, vi ). Finally, Rule 5.6 does not apply
to (ui , vi ) for each module i, since ui is a source and |succA (vi )| = k. All
in all, the construction is reduced with respect to the mentioned rules. In
the following, we consider the size of the construction. With d = (k − 1)/2
and |Qi | = k − 1 for all modules i, we can calculate the number of vertices
d−1
X
|V | = |Vb | + |Vim |
i=0
d−1
X
= 2+ (|Qi | + 3)
i=0
= 2 + d · (k − 1 + 3)
k−1
= 2+ · (k + 2)
2
1 2
= (k + k + 2).
2
Hence, the construction, which is reduced with respect to all presented rules
contains O(k 2 ) vertices (see Figure 18) and thus, the kernel size does not
improve asymptotically. However, since the rules do provide additional data
reduction in polynomial time, it is, for practical purposes, justified to apply
them to the given instance.

6 Search Tree Algorithm


In this section, basic search tree algorithms are presented for both Transi-
tivity Editing and Transitivity Deletion. We show that it is possible
to modify these algorithms to achieve a total running time of O(2.57k + n3 )
and O(2k + n3 ), respectively.
A straightforward algorithm that finds an optimal solution set for a given
digraph is the search tree algorithm presented as Algorithm 3. It branches
on each P3 (u, v, w) in the digraph, trying to destroy it by either deletion
of (u, v), deletion of (v, w) or insertion of (u, w). Obviously, it is unreason-
able to insert previously deleted arcs or remove previously inserted arcs. To
avoid this, we introduce marks for each arc. An arc can be marked perma-
nent or forbidden in order to prohibit its deletion or insertion, respectively.
Hence, whenever an arc is modified in a node B of the search tree, it is
not subject to modification in the subtree that is rooted in B and thus, the
depth of the search tree is at most k implying that the worst-case running
time of Algorithm 3 is 3k · poly(n). Note that to solve the Transitivity
6 SEARCH TREE ALGORITHM 65

Figure 18: A solution of size k = 5 to an instance of the described construc-


tion with d = (k − 1)/2 = 2 and |Q0 | = |Q1 | = k − 1 = 4. Note that the
digraph has |V | = (k 2 + k + 2)/2 = 16 vertices.
6 SEARCH TREE ALGORITHM 66

Input: A directed graph D = (V, A) and an integer k.


Output: Is it possible to turn D transitive with at
most k operations?
1 if k < 0 then
2 return no
3 end
4 if there is a P3 p = (u, v, w) in D then
5 foreach non-permanent a ∈ {(u, v), (v, w)} do
6 if (V, remove(A, a)) can be turned transitive with ≤ k − 1
operations then
7 return yes;
8 end
9 mark a permanent;
10 end
11 if (u, w) is not forbidden and (V, add(A, (u, w)) can be turned
transitive with ≤ k − 1 operations then
12 return yes;
13 end
14 return no
15 else
16 if k ≥ 0 then
17 return yes;
18 else
19 return no;
20 end
21 end

Algorithm 3: The search tree algorithm destroys a P3 and recursively


asks, whether the remaining digraph can be turned transitive with
the remaining number of operations. The functions Remove and Add
remove or add an arc to A and mark the arc forbidden or permanent,
respectively. Note that (u, v) is marked to cut subbranches in the
second and third branch that have already been considered in the first
branch.
6 SEARCH TREE ALGORITHM 67

Input: A directed graph D = (V, A) and an integer k.


Output: Is it possible to turn D transitive with at
most k operations?
1 if k < 0 then
2 return no
3 end
4 if there is a diamond (u, v1 , v2 , w) in D then
5 foreach non-permanent a ∈ {(u, v1 ), (v1 , w)} do
6 foreach non-permanent b ∈ {(u, v2 ), (v2 , w)} do
7 if (V, remove(A, {a, b}) can be turned transitive
with ≤ k − 2 operations then
8 return yes;
9 end
10 end
11 mark a permanent;
12 end
13 if (u, w) is not forbidden and (V, add(A, (u, w)) can be turned
transitive with ≤ k − 1 operations then
14 return yes;
15 end
16 return no
17 else
18 use Transitivity Deletion-algorithm and return its result;
19 end

Algorithm 4: The modified search tree algorithm destroys a diamond


and recursively asks, whether the remaining digraph can be turned
transitive with the remaining number of operations. If there are no
diamonds left in D, then the search tree algorithm for Transitivity
Deletion is used to complete the search.
6 SEARCH TREE ALGORITHM 68

Deletion problem, to check whether (V, A ∪ {(u, w)}) can be turned tran-
sitive with ≤ k − 1 operations may be omitted. In this case, the worst-case
running time is 2k · poly(n). Let us have a closer look at the polynomial
factor. Finding a P3 in a given digraph can take O(n3 ) time. However, if
we know which arc to modify, we can calculate a list of P3 s that contain
this arc in O(n) steps, since each arc can be contained in at most n P3 s.
The idea is to keep a set of P3 s in D while branching. After each arc mod-
ification, this set must be updated. By the above observation and the fact
that set insertions may take logarithmic time, we arrive at a running time
of O(n log n). Also, the initial calculation of the P3 -set takes O(n3 ) time.
However, the task of finding a P3 can then be solved in constant time. All
in all, Transitivity Deletion can be solved in O(2k · n log n + n3 ) time
and Transitivity Editing can be solved in O(3k · n log n + n3 ) time.
In order to improve the running time of Algorithm 3, remember that
Lemma 3.4 implies that we only need to consider inserting an arc if we
encounter a diamond. This helps us decrease the branching number. The
modified search tree algorithm that is presented as Algorithm 4 traverses the
search tree in the following way: Upon finding a diamond d = (u, {x, y}, v)
in the given digraph D = (V, A) the the algorithm recursively asks whether

1. (V, A\{(u, x), (u, y)}) can be turned transitive with ≤ k −2 operations

2. (V, A\{(u, x), (y, v)}) can be turned transitive with ≤ k −2 operations

3. (V, A\{(x, v), (u, y)}) can be turned transitive with ≤ k −2 operations

4. (V, A\{(x, v), (y, v)}) can be turned transitive with ≤ k −2 operations

5. (V, A ∪ {(u, v)}) can be turned transitive with ≤ k − 1 operations

If there are no diamonds in the input graph, then the straightforward search
tree implementation for Transitivity Deletion is used to solve the prob-
lem. Recall that this implementation runs in O(2k · n3 ) time. Since all
possible ways of destroying an encountered diamond are considered by Algo-
rithm 4 and Lemma 3.4 implies the correctness of using the straightforward
search tree implementation for Transitivity Deletion when all diamonds
are destroyed, it is clear that Algorithm 4 is correct.
In the following, we consider the running time of Algorithm 4. By the
above enumeration, we conclude that the branching vector is (2, 2, 2, 2, 1).
This leads to a branching number of about 2.561554. Thus, Algorithm 4
takes, in the worst case, 2.57k · poly(n) time to find an optimal solution set
for the given digraph. By intersecting the predecessors of each vertex with
6 SEARCH TREE ALGORITHM 69

the successors of each vertex, one can find a diamond in asymptotically the
same time as finding a P3 needs. Thus, the polynomial factor is O(n3 ).
However, as before, we can keep a set of heads and tails of diamonds in D
that can be updated in O(n2 log n) steps. Again, the initial calculation of
the set of diamonds can be done in O(n3 ) time. With this modification,
we arrive at a total running time of O(2.57k · n2 log n + n3 ) for solving the
Transitivity Editing problem with the modified search tree algorithm.
Finally, we can apply the technique of interleaving (see Section 2.2) to
both search tree algorithms, resulting in a running time of O(2.57k + n3 )
and O(2k + n3 ) for solving Transitivity Editing and Transitivity Dele-
tion, respectively.

Theorem 6.1. Transitivity Editing and Transitivity Deletion can


be solved in O(2.57k + n3 ) and O(2k + n3 ) time, respectively.

Forbidden and Permanent Arcs. Obviously, marked arcs provide in-


formation about the structure of a digraph. In the following, we present
rules that allow for the calculation of additional arc-marks from previous
ones. Although not reducing the size of the instance, the rules still provide
structural information that help cutting branches in the search tree.

Lemma 6.2. Let D = (V, A) be a digraph and a, b, c ∈ V . Furthermore,


let F and P denote the sets of all arcs of D that are forbidden or permanent,
respectively. Let S denote a solution set for D that does not modify any arc
in F ∪ P .

1. If (a, b) ∈ P and (b, c) ∈ P , then S does not delete (a, c).

2. If (a, b) ∈ P and (a, c) ∈ F , then S does not insert (b, c).

3. If (b, c) ∈ P and (a, c) ∈ F , then S does not insert (a, b).

Proof. Since the proofs for each of the three parts of the lemma are anal-
ogous, we only show that, if (a, b) ∈ P and (b, c) ∈ P , then there is no
solution set for D that deletes (a, c). Clearly, any solution set for D that
deletes (a, c) must also delete either (a, b) or (b, c) to destroy the P3 (a, b, c).
Thus, if S does not modify (a, b) and (b, c) it cannot delete (a, c).

We use Lemma 6.2 to modify the algorithm that marks an arc in the
digraph such that, whenever an arc is marked, the conditions of Lemma 6.2
are checked and further marks are established accordingly.
6 SEARCH TREE ALGORITHM 70

Recall that sources and sinks are preserved by applying optimal solution
sets. In the following, we show that certain arcs between sources and sinks
can be marked.

Lemma 6.3. Let D = (V, A) be a digraph, let r be a source in D, and let s


be a sink in D. If (r, s) ∈ A, then there is no optimal solution set for D that
deletes (r, s). Likewise, if (s, r) 6∈ A, then there is no optimal solution set
for D that inserts (s, r).

Proof. Since the proofs for both parts of the rule are analogous, it suffices
to prove the first part. For the sake of contradiction, assume that there is
an optimal solution set S for D that deletes (r, s). By Corollary 3.7, r is
a source in (V, A∆S) and s is a sink in (V, A∆S). Thus, it is obvious that
undoing the deletion of (r, s) cannot create a P3 and thus S\{(r, s)} is also
a solution set for D, contradicting the optimality of S.

Edit connected components individually. Although not being a re-


duction rule, the following idea reduces the running time of the search tree
algorithm by basically replacing a subtree of depth d with two subtrees of
depth d1 and d2 with d1 + d2 = d. For a branching vector greater than one,
this reduces the number of nodes in the search tree and thus the running
time.

Lemma 6.4. If D is a digraph containing the two weakly connected compo-


nents D1 = (V1 , A1 ) and D2 = (V2 , A2 ) and S1 and S2 are optimal solution
sets for D1 and D2 , respectively, then S := S1 ∪ S2 is an optimal solution
set for D.

Proof. Obviously, optimal solution sets do not insert arcs between the two
components and thus, the unity of both partial solutions is optimal for D if
the partial solutions are optimal for D1 and D2 , respectively.

It is not hard to see that if a given digraph D = (V, A) has more than
one weakly connected component, then we can split the digraph and edit
the components individually. If restricted to Transitivity Deletion, this
idea can be used to split a digraph even earlier.

Lemma 6.5. Consider the Transitivity Deletion problem. Let D =


(V, A) be a digraph and let VSRC and VSNK denote the set of its sources and
sinks, respectively. Let D0 := D − (VSRC ∪ VSNK ) = (V 0 , A0 ) contain at least
two weakly connected components, one of which is D10 = (V10 , A01 ). Further-
more, let D20 = (V20 , A02 ) with V20 := V 0 \V10 and A02 := A0 \A01 . Let S1 ⊆ A
7 HEURISTICS 71

and S2 ⊆ A denote optimal solution sets for D1 := D − V20 = (V1 , A1 )


and D2 := D − V10 = (V2 , A2 ), respectively. Then S := S1 ∪ S2 is an optimal
solution set for D.

Proof. Suppose S was not an optimal solution set for D, that is, S is no
solution set for D or S is not optimal. If S is no solution set for D, then
there is a P3 p = (u, v, w) in (V, A\S). Since S1 and S2 are optimal solution
sets for D1 and D2 , respectively, p cannot be entirely contained in D1 or D2 .
Without loss of generality, let u ∈ V1 \V2 and w ∈ V2 \V1 . Since D10 and D20
are different weakly connected components, it is clear that v ∈ V1 ∩ V2 =
VSRC ∪ VSNK . Hence, v is a sink or a source in D and by Corollary 3.7, v is
a sink or source in both (V1 , A1 \S1 ) and (V2 , A2 \S2 ). Hence, v is also a sink
or source in (V, A\S), a contradiction to (u, v, w) being a P3 in (V, A\S).
If S is not optimal, then there is an optimal solution set S 0 to D with |S 0 | <
|S|. Obviously, S10 := S 0 ∩V1 ×V1 and S20 := S 0 ∩V2 ×V2 are solution sets for D1
and D2 , respectively. Since S1 and S2 are optimal, we know that |S1 | ≤ |S10 |
and |S2 | ≤ |S20 |. However, since S 0 is optimal, Lemma 3.8 implies that
there are no arcs between sources and sinks in S 0 and thus S10 ∩ S20 = ∅.
Hence, |S| ≤ |S1 | + |S2 | ≤ |S10 | + |S20 | = |S 0 |, contradicting |S 0 | < |S|.

Lemma 6.5 enables us to split a digraph that is weakly connected. Al-


though the sum of the vertices of the processed components is greater than
the number of vertices in the original digraph (due to the additional sources
and sinks per component), the sum of P3 s, and thus, the sum of the heights
of the search trees, remains the same. A complete search tree algorithm that
takes in account preprocessing, interleaving, and search tree splitting is pre-
sented in Algorithm 5. This algorithm was implemented for the experiments
in Section 8.

7 Heuristics
In practice, one may find oneself forced to find solution sets for Transi-
tivity Editing faster than possible with the search tree algorithm. This
can be done by waiving the optimality of the computed solution. In the
following, we introduce a heuristic for Transitivity Editing. A heuristic
is an algorithm that either does not produce provably optimal solutions or
is not provably efficient. The presented heuristic may compute suboptimal
solutions but we conjecture that its running time is polynomial for all in-
puts. The basic idea of the presented heuristic is to assign a rank to every
pair of vertices of the given digraph D and then to greedily insert or remove
7 HEURISTICS 72

Input: An instance (D = (V, A), k) for the Transitivity Editing


problem.
Output: true, if (D, k) ∈ Transitivity Editing, otherwise false.
1 if c · k 2 < |V | then
2 Employ Algorithm 1 to produce the reduced instance (D0 , k 0 );
3 if D0 can be split into D10 and D20 according to Lemmas 6.4 or 6.5
then
4 k1 := 0;
5 while k1 ≤ k and (D1 , k1 ) 6∈ Transitivity Editing do
6 k1 := k1 + 1
7 end
8 if (D2 , k − k1 ) ∈ Transitivity Editing then
9 return true;
10 else
11 return false;
12 end
13 end
14 end
15 Employ Algorithm 4 to branch

Algorithm 5: The search tree algorithm that includes preprocessing,


interleaving, and search tree splitting. Note that c is some constant
that can be chosen arbitrarily. Furthermore, the recursive calls in
Algorithm 4 are replaced with calls to Algorithm 5.
7 HEURISTICS 73

Input: A directed graph D = (V, A).


Output: rnk [(u, v)] for all (u, v) ∈ V × V .
1 foreach (u, v) ∈ V × V do
2 norm := |(predA (u) \{v}) \ predA (v)| +
|(succA (v) \{u}) \ succA (u)| − |succA (u) ∩ predA (v)|;
3 rnk [(u, v)] := sgnu,v ·norm;
4 end

Algorithm 6: The algorithm used to calculate the initial ranks of all


arcs. Herein, sgnu,v = 1 if (u, v) ∈ A, and sgnu,v = −1, otherwise.

arcs based on their rank. Hereby, the rank of an arc represents its potential
to destroy P3 s in D. By adding the arc of maximum rank to the solution
set we hope to destroy as many P3 s as possible in each step. Note that this
arc is not necessarily in D.
Definition 7.1. For each pair (u, v) ∈ V × V , the rank of (u, v) is the
number of P3 s in D that are destroyed by modifying (u, v) minus the number
of P3 s that are created in D, if (u, v) is modified.
All ranks are initially computed by Algorithm 6 and stored in an array.
Lemma 7.2. Algorithm 6 is correct, that is, after calling Algorithm 6, for
each (u, v) ∈ V × V , the value of rnk [(u, v)] is equal to the rank of (u, v) as
defined in Definition 7.1.
Proof. Obviously, for any arc a, each P3 that is created by deleting a if a ∈ A
is destroyed by inserting a if a 6∈ A. This is being accounted for in line 3
of the algorithm. Thus, the proof works analogously for (u, v) 6∈ A and it
suffices to show the lemma for (u, v) ∈ A. Let r denote the number of P3 s
in D that are destroyed by deleting (u, v) and let s denote the number of P3 s
that are created by deleting (u, v). We prove the following: after running
Algorithm 6, the rank of the arc (u, v) is rnk [(u, v)] = r−s. Since (u, v) ∈ A,
it is clear that
s = |succA (u) ∩ predA (v)|
and

r = |(predA (u) \{v}) \ predA (v)| + |(succA (v) \{u}) \ succA (u)| .

Since sgnu,v = 1, it is clear that rnk [(u, v)] = r − s.

Lemma 7.3. Algorithm 6 runs in O(n3 ) time.


7 HEURISTICS 74

Figure 19: If the arc (u, v) of some digraph (V, A) is modified, then we
need to update the ranks of the arcs that are drawn for the three classes of
vertices that are represented by x, y, and z.

Proof. Obviously, there are n2 pairs of vertices (u, v). For each pair, three
set differences are calculated with each set being of size O(n). This can
be done in linear time, resulting in a worst-case running time of O(n3 ) for
Algorithm 6.

To avoid recalculating the rank for all vertex pairs after modifying a
pair (u, v), we only update the ranks locally. Figure 19 illustrates that there
are three classes of rank updates that vary in their relation to u and v.
All vertices w ∈ V \{u, v} are processed three times. First, they are
considered as being in class x, that is, the ranks of (w, u) and (w, v) are
updated. Second, they are considered as being in class y, that is, the ranks
of (u, w) and (v, w) are updated. Finally, they are considered as being in
class z, that is, the ranks of (u, w) and (w, v) are updated. Thus, a total
of 6(|V | − 2) updates have to be done after each modification. Each affected
arc may have to get its rank adjusted in a different way, depending on
whether it is in A or not. To describe this correlation, we introduce a group
of functions defined in Table 1.

Example 7.4. Consider Figure 19. What would be the effect on the rank
of arc (x, v) if the arc (u, v) was deleted? Prior to the deletion, the removal
of (x, v) would have caused the P3 (x, u, v). After the deletion of (u, v),
this is no longer possible and, hence, the rank of (x, v) should increase.
The corresponding function value is update11 ((x, u), (x, v)) = 1. If (u, v)
7 HEURISTICS 75

update01 (a, b) update02 (a, b) update03 (a, b)


a∈A b∈A 0 a∈A b∈A 1 a∈A b∈A 1
a∈A b 6∈ A −1 a∈A b 6∈ A 0 a∈A b 6∈ A 0
a 6∈ A b∈A 0 a 6∈ A b∈A −1 a 6∈ A b∈A −1
a 6∈ A b 6∈ A 1 a 6∈ A b 6∈ A 0 a 6∈ A b 6∈ A 0

update11 (a, b) update12 (a, b) update13 (a, b)


a∈A b∈A 1 a∈A b∈A 0 a∈A b∈A 1
a∈A b 6∈ A −1 a∈A b 6∈ A 0 a∈A b 6∈ A −1
a 6∈ A b∈A 0 a 6∈ A b∈A −1 a 6∈ A b∈A 0
a 6∈ A b 6∈ A 0 a 6∈ A b 6∈ A 1 a 6∈ A b 6∈ A 0

Table 1: The function group update0 (a, b) provides information on how


to alter the arc rank of a if it is affected by the deletion of (u, v). Like-
wise, update1 (a, b) provides information on how to alter the arc rank
of b. If (u, v) is inserted, the same information is provided by the nega-
tion − update (a, b). The subscripts 1, 2, and 3 indicate the class of the
vertex that is being considered (See Figure 19).
7 HEURISTICS 76

Input: A directed graph D = (V, A), an arc (u, v) that has been
modified and the current maximum of all ranks.
Output: Updated ranks for all (x, y) ∈ V × V and the new
maximum of all ranks.
1 rnk [(u, v)] := − rnk [(u, v)];
2 foreach w ∈ V \{u, v} do
3 rnk [(u, w)] := rnk [(u, w)] + sgnu,v ·(update03 ((u, w), (w, v)) +
update02 ((u, w), (v, w)));
4 rnk [(w, u)] := rnk [(w, u)] + sgnu,v · update01 ((w, u), (w, v));
5 rnk [(w, v)] := rnk [(w, v)] + sgnu,v ·(update11 ((w, u), (w, v)) +
update13 ((u, w), (w, v)));
6 rnk [(v, w)] := rnk [(v, w)] + sgnu,v · update12 ((u, w), (v, w));
7 update the maximum rank if necessary;
8 end

Algorithm 7: The algorithm used to update the ranks of all arcs


affected by the change of (u, v).

was to be inserted into A, then the removal of (x, v) causes the P3 (x, u, v)
after the insertion, which was not possible prior to the insertion. Thus
the rank must decrease. This is implemented by multiplying the value
of update11 ((x, u), (x, v)) with sgnu,v .
Algorithm 7 describes how the update-procedure is implemented and
Algorithm 8 is used to calculate the resulting transitive digraph.
Lemma 7.5. Let D = (V, A) be a digraph and for all (a, b) ∈ V × V
let rnk [(a, b)] resemble the rank of (a, b) as defined in Definition 7.1. If the
arc (u, v) is modified, then Algorithm 7 updates rnk [(a, b)] such that it still
resembles the rank of (a, b).
Proof. Let D = (V, A) be a digraph, (a, b) be a pair of vertices, and (u, v)
be the pair of vertices with maximum rank in D. Let D0 denote the digraph
that results from modifying (u, v) in D. Let r and r0 denote the number
of P3 s in D and D0 , respectively, that are destroyed by modifying (a, b).
Analogously, let s and s0 denote the number of P3 s that are created by
modifying (a, b). We prove the following: if rnk [(a, b)] = r − s before calling
Algorithm 7, then rnk [(a, b)] = r0 − s0 afterwards. First, note that arcs that
are not incident to either u or v are not affected by the modification of (u, v).
Obviously, if (a, b) = (u, v), then r0 − s0 = −(r − s), since all P3 s that are
destroyed by modifying (u, v) would be created by modifying (u, v) again,
7 HEURISTICS 77

Input: A directed graph D = (V, A).


Output: A (not necessarily optimal) solution set S for D.
1 S := ∅;
2 employ Algorithm 6 to calculate all initial ranks;
3 maxarc := argmax rnk [(x, y)];
4 while rnk [maxarc] > 0 do
5 S := S ∪ {maxarc};
6 employ Algorithm 7 to maintain the ranks and update maxarc ;
7 end

Algorithm 8: Greedy algorithm to calculate a solution set for a given


digraph.

and vice versa. Without loss of generality, we assume that the modification
made to (u, v) is an arc deletion, that is, sgnu,v = 1.
In the following, let w denote some vertex in V \{u, v}. We consider the
rank of (a, b) = (u, w) and show exemplarily that line 3 of Algorithm 7 is
correct by proving
update03 ((u, w), (w, v)) + update02 ((u, w), (v, w)) = (r0 − s0 ) − (r − s).
Lines 4-6 can be verified analogously. The arc (u, w) can act as the two
arcs (u, y) and (u, z) in Figure 19. As shown in the middle and right col-
umn of Table 1, the corresponding update values are update02 ((u, w), (v, w))
and update03 ((u, w), (w, v)). Without loss of generality, let (u, w) ∈ A. We
consider the following cases:
Case 1: (v, w) ∈ A, (w, v) ∈ A.
The removal of (u, w) creates the P3 (u, v, w) in D but not in D0 . Further-
more, the removal of (u, w) destroys the P3 (u, w, v) in D0 which does not
exist in D. Hence, r0 = r + 1 and s0 = s − 1 and thus (r0 − s0 ) − (r − s) = 2.
Since update03 ((u, w), (w, v)) + update02 ((u, w), (v, w)) = 2, the correctness
follows.
Case 2: (v, w) ∈ A, (w, v) 6∈ A.
The removal of (u, w) creates the P3 (u, v, w) in D but not in D0 . Hence, r0 =
r and s0 = s−1 and thus (r0 −s0 )−(r−s) = 1. Since update03 ((u, w), (w, v))+
update02 ((u, w), (v, w)) = 1, the correctness follows.
Case 3: (v, w) 6∈ A, (w, v) ∈ A.
The removal of (u, w) destroys the P3 (u, w, v) in D0 which does not exist
in D. Hence, r0 = r + 1 and s0 = s and thus (r0 − s0 ) − (r − s) = 1.
Since update03 ((u, w), (w, v)) + update02 ((u, w), (v, w)) = 1, the correctness
follows.
8 EXPERIMENTAL RESULTS 78

Case 4: (v, w) 6∈ A, (w, v) 6∈ A.


The removal of (u, w) does not destroy or create any P3 in D or D0 . This
implies r0 = r and s0 = s and thus (r0 − s0 ) − (r − s) = 0. Since in this
case update03 ((u, w), (w, v)) + update02 ((u, w), (v, w)) = 0, the correctness
follows.

In the following, we show that, if Algorithm 8 terminates when run on a


digraph D, then it returns a solution set for D. Obviously, Algorithm 8 does
not terminate for as long as there is an arc with rank > 0. Thus, it would be
sufficient to show for any digraph D that if there is no arc of rank > 0 in D,
then D is transitive. Although we were unable to prove this, we strongly
conjecture that it is true.

Conjecture 7.6. If there is no pair of vertices (u, v) with rnk [(u, v)] > 0
in a digraph D, then there is no P3 in D.

If this conjecture can be proven, then it is easy to show that Algorithm 8


computes a solution set S for D in O(n4 ) time. If, on the other hand,
Conjecture 7.6 is false, then it is still possible to use Algorithm 8 as a
heuristic preprocessing for Transitivity Editing. By doing so, one can
guarantee an upper bound on the maximum rank in a preprocessed digraph.

8 Experimental Results
In the course of this work, experiments were carried out to provide a practical
point of view towards the presented algorithms. In this section, we report
the results of various test runs of the implementation of the two algorithms
for solving Transitivity Editing that are described in Section 6 and
Section 7, respectively. We included preprocessing Rules 5.2 and 5.13 into
the implementation of Algorithm 5 in order to reduce the number of vertices
in the input digraph to O(k 2 ) (see Section 5.1). Furthermore, Rules 5.18
and 5.19 were implemented. For the calculation of a lower bound for the size
of the solution set needed for Rule 5.13, Algorithm 2 (see Page 48) was used.
In the following, we refer to the resulting FPT algorithm simply as “the
FPT algorithm”. We explain the tests that were run with the algorithms
and present and interpret their results. All tests were run on a single core
of the multi core system described in Table 2.
8 EXPERIMENTAL RESULTS 79

[...]
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Quad CPU Q9300 @2.50GHz
stepping : 7
cpu MHz : 2497.000
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
[...]
bogomips : 4982.43
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual

Table 2: Partial content of “/proc/cpuinfo” of the test system. Although


the system possesses four cores, the tests were serialized, that is, only one
core is used for each test.
8 EXPERIMENTAL RESULTS 80

8.1 Employed Tests


In the following, we explain what tests were run and what we hoped to
accomplish by running them. Apart from the presented artificial data, we
used a biological data set supplied by the bioinformatics department of the
University of Regensburg. It consists of 17 vertices and can be solved with
four arc insertions. Both the heuristic and the FPT algorithm took less
than 0.01 seconds to solve this instance. The following tests generate di-
graphs with a fixed number of vertices or a fixed maximum number of arcs
in an optimal solution set, respectively, and a solution set by running the
heuristic and the FPT algorithm separately.

Fixed n Test. For the Fixed n Test, we generated digraphs Dn from a


given number of vertices n and a given arc probability p, such that for each
vertex u, each vertex (except u) has probability p to be a successor of u
in Dn . We then determined the number of seconds an implementation took
to calculate a solution set for each of these digraphs. The main purpose
of this test is to get an idea of how larger solution sets influence the run-
ning time of the implementations. We hope to derive an estimation of the
exponential part of the average running time.

Fixed k Test. For the Fixed k Test, we generated digraphs Dk from a


given integer k such that (Dk , k) ∈ Transitivity Editing. For various
numbers n of vertices, this is accomplished by generating a random di-
graph D, calculating its transitive closure D+ and randomly modifying k
arcs. This way, we ensure that the size of each optimal solution set does not
exceed k. By choosing n >> k, we let the polynomial part of the average
running time dominate the exponential part. Thus, we hope to get an idea
of how the polynomial part of the average running time is related to the
input size.

8.2 Results and Interpretation


Quality of The Heuristic Algorithm. Independent from the other re-
sults the statistic about the performance of the heuristic algorithm is im-
portant. In all test runs, the size of the optimal solution calculated by the
FPT algorithm and the size of the solution set given by the heuristic were
analyzed. The error of the heuristic versus the size of the optimal solution
is plotted in Figure 20. It is apparent that, in our experiments, the heuristic
never delivered a result larger than 11/9 of size of the optimal solution set.
8 EXPERIMENTAL RESULTS 81

14

12

10
heuristic error

0
0 10 20 30 40 50 60 70

size of optimal solution set

Figure 20: The absolute error that the heuristic algorithm made, that
is, ||SOPT | − |Sheur ||, versus |SOPT |. The data was taken from all tests.
Note that the error does not exceed 23%.
8 EXPERIMENTAL RESULTS 82

8000

7000

6000

5000
time in seconds

4000
10
15
20
25
3000
30

2000

1000

0
0 50 100 150 200 250 300 350 400
vertices

Figure 21: Time in seconds that the FPT algorithm took to find an optimal
solution set for a given digraph versus the number of vertices in this digraph
for different optimal solution set sizes (k ∈ {10, 15, 20, 25, 30}).

For k << n, the heuristic algorithm always calculated an optimal solution


set. In 43% of the other runs, the solution set provided by the heuristic
algorithm was optimal.

Fixed n Test. The results of our experiments are not surprising in that
they agree with the theoretical thoughts. Figures 21 and 22 illustrate that
if we choose n >> k, the running times of both algorithms approach a
polynomial in n. Note that the results do not differ significantly for dif-
ferent optimal solution sizes. This may be due to our method of testing,
since it is likely that almost the complete optimal solution set was found
by the polynomial-time preprocessing preceding the branching in the FPT
algorithm. However, as can be seen by the different scales of the time axes
in the diagrams the preprocessing algorithm is by a factor of about 100
slower than the heuristic. This may, however be influenced by calling Algo-
rithm 2 (see Page 48) for lower bounding the size of the solution set, which
8 EXPERIMENTAL RESULTS 83

90

80

70

60
time in seconds

50
10
15
40 20
25
30
30

20

10

0
0 50 100 150 200 250 300 350 400
vertices in the input digraph

Figure 22: Time in seconds that the heuristic algorithm took to find a
solution set to a given digraph versus the number of vertices in this digraph
for different optimal solution set sizes (k ∈ {10, 15, 20, 25, 30}).
8 EXPERIMENTAL RESULTS 84

10000.00

1000.00

100.00
time in seconds

10.00

10
1.00 f(x)
15
20
0.10

0.01

0.00
0 10 20 30 40 50 60 70

size of optimal solution set

Figure 23: Time in seconds that the FPT algorithm took to find a solution
set for a given digraph on a logarithmic scale versus the size of the optimal
solution sets for different input digraph sizes (number of vertices) (n ∈
{10, 15, 20}). The bold line marked f (x) is the graph of the function f (x) :=
0.001 · 1.3x .

takes O(n3 log n) time every time the solution set grows. Hence, we expect
the graphs to diverge for larger solution set sizes.

Fixed k Test. Figure 23 shows the time that the FPT algorithm took to
compute an optimal solution for a given input graph on a log scale. Due to
the polynomial summand we do not expect a straight line in the diagram
but rather a graph that is a bit bumpy in the proximity of the origin but
approaches a line asymptotically. The bold line in the diagram is drawn
for comparability. It is the graph of the function f (x) := 0.001 · 1.3x . This
suggests that the FPT algorithm runs, in practice, much faster than 2.57k .
The implementation of the heuristic algorithm (Algorithm 8) never took
more than 0.04 seconds for n ≤ 20. Unfortunately, this is too close to the
measurement inaccuracy to provide any meaningful results.
9 CONCLUSION 85

Comparing The Results. It is hard to compare the results of our ex-


periments with the ones presented by Böcker et al. [BBK09]. First, we were
unable to find a description of the employed hardware. Second, the tests
performed by Böcker et al. are presented differently in that the running
time is plotted versus the arc probability for different numbers of vertices.
This may be problematic, since the number of arcs in an optimal solution
set may still vary a great deal even if the number of vertices and the arc
probability are fix. Finally, Böcker et al. used a complete digraph instead of
calculating the transitive closure of a random digraph for use in a Fixed k
Test, which increases the likelihood of the preprocessing algorithms to apply
to the digraph.

9 Conclusion
In the course of this work, we considered Transitivity Editing and
some related problems. While Transitivity Completion is solvable in
O(n2.376 ) time, we have seen that both Transitivity Editing and Tran-
sitivity Deletion are NP-complete, even when restricted to DAGs or
digraphs of maximum degree 4. We have shown that both problems admit
a problem kernel containing at most k(k + 1) vertices and that this kernel
can be calculated in O(n3 ) time. Furthermore, we presented a FPT algo-
rithms for Transitivity Editing and Transitivity Deletion that run
in O(2.57k + n3 ) and O(2k + n3 ) time, respectively. We also presented a
heuristic algorithm for solving Transitivity Editing. Finally, we per-
formed various experiments testing how the running times depend on both
the input size and the solution size in practice.
Although we did not succeed in finding a better than O(k 2 )-vertex ker-
nel for Transitivity Editing and Transitivity Deletion, it may be
possible to find an O(k)-vertex kernel in the future. Some sort of crown
type reduction rule [FLRS07] may be powerful enough to achieve this. It is
also interesting if it is possible to show whether a polynomial size kernel can
be computed in less than cubic time. Furthermore, a more detailed analy-
sis of the running time of Rule 5.16 is desirable. It also remains to prove
Conjecture 7.6, thus showing that the heuristic algorithm that was provided
always returns a solution set and runs in O(n4 ) time. Finally, the problem
of Transitivity Vertex Deletion is yet to be analyzed.
REFERENCES 86

References
[ACK+ 99] Giorgio Ausiello, Pierluigi Crescenzi, Viggo Kann, Alberto
Marchetti-Spaccamela, Giorgio Gambosi, and Marco Protasi.
Complexity and Approximation: Combinatorial Optimization
Problems and Their Approximability Properties. Springer, Jan-
uary 1999. 9

[BBK09] Sebastian Böcker, Sebastian Briesemeister, and Gunnar W.


Klau. On optimal comparability editing with applications to
molecular diagnostics. BMC Bioinformatics, 10(Suppl 1):S61,
2009. Proc. of Asia-Pacific Bioinformatics Conference (APBC
2009). 4, 10, 46, 85

[BJG08] Bang-Jensen and Gutin. Digraphs: Theory, Algorithms and Ap-


plications, 2nd Edition. Springer, 2008. 13

[BPSS01] Michael A. Bender, Giridhar Pemmasani, Steven Skiena, and


Pavel Sumazin. Finding least common ancestors in directed
acyclic graphs. In Proceedings of the 12th ACM-SIAM Sym-
posium on Discrete Algorithms (SODA01), pages 845–854, 2001.
26

[CKX06] Jianer Chen, Iyad A. Kanj, and Ge Xia. Improved parameterized


upper bounds for Vertex Cover. In Mathematical Foundations of
Computer Science (MFCS06), volume 4162 of Lecture Notes in
Computer Science, pages 238–249. Springer, 2006. 8

[CTY07] Pierre Charbit, Stéphan Thomassé, and Anders Yeo. The Min-
imum Feedback Arc Set problem is NP-hard for tournaments.
Combinatorics, Probability & Computing, 16(1):1–4, 2007. 14

[CW90] Don Coppersmith and Shmuel Winograd. Matrix multiplication


via arithmetic progressions. Journal of Symbolic Computation,
9(3):251–280, 1990. 12

[DF99] Rodney G. Downey and Michael R. Fellows. Parameterized Com-


plexity. Springer, 1999. 8, 9

[DGH+ 06] Michael Dom, Jiong Guo, Falk Hüffner, Rolf Niedermeier, and
Anke Truß. Fixed-parameter tractability results for feedback set
problems in tournaments. In Proceedings of the 6th Conference
REFERENCES 87

on Algorithms and Complexity (CIAC06), number 3998 in Lec-


ture Notes in Computer Science, pages 320–331. Springer, May
2006. 15

[FG06] Jörg Flum and Martin Grohe. Parameterized Complexity Theory.


Springer, 2006. 8, 9

[FLRS07] Michael R. Fellows, Michael A. Langston, Frances A. Rosamond,


and Peter Shaw. Efficient parameterized preprocessing for Clus-
ter Editing. In Fundamentals of Computation Theory, 16th In-
ternational Symposium (FCT07), volume 4639 of Lecture Notes
in Computer Science, pages 312–321. Springer, 2007. 85

[GGHN05] Jens Gramm, Jiong Guo, Falk Hüffner, and Rolf Niedermeier.
Graph-modeled data clustering: fixed-parameter algorithms for
clique generation. Theory of Computing Systems, 38(4):373–392,
July 2005. 10, 42

[GN07] Jiong Guo and Rolf Niedermeier. Invitation to data reduction


and problem kernelization. ACM SIGACT News, 38(1):31–45,
2007. 8

[Gol77] Martin C. Golumbic. The complexity of comparability graph


recognition and coloring. Computing, 18(3):199–208, 1977. 15

[HBB+ 06] Michael Hummel, Stefan Bentink, Hilmar Berger, Wolfram Klap-
per, Swen Wessendorf, Thomas F.E. Barth, Heinz-Wolfram
Bernd, Sergio B. Cogliatti, Judith Dierlamm, Alfred C. Feller,
Martin-Leo Hansmann, Eugenia Haralambieva, Lana Harder,
Dirk Hasenclever, Michael Kühn, Dido Lenze, Peter Lichter,
Jose Ignacio Martin-Subero, Peter Möller, Hans-Konrad Müller-
Hermelink, German Ott, Reza M. Parwaresch, Christiane Pott,
Andreas Rosenwald, Maciej Rosolowski, Carsten Schwaenen,
Benjamin Stürzenhofecker, Monika Szczepanowski, Heiko Traut-
mann, Hans-Heinrich Wacker, Rainer Spang, Markus Loeffler,
Lorenz Trümper, Harald Stein, and Reiner Siebert. A bi-
ologic definition of Burkitt’s lymphoma from transcriptional
and genomic profiling. New England Journal of Medicine,
354(23):2419–2430, June 2006. 3, 10

[JJK+ 08] Juby Jacob, Marcel Jentsch, Dennis Kostka, Stefan Bentink,
and Rainer Spang. Detecting hierarchical structure in molec-
REFERENCES 88

ular characteristics of disease using transitive approximations of


directed graphs. Bioinformatics, 24(7):995–1001, 2008. 3, 4, 20
[Kan92] Viggo Kann. On the Approximability of NP-complete Optimiza-
tion Problems. PhD thesis, Royal Institute of Technology Stock-
holm, 1992. 14
[KM86] Mirko Křivánek and Jaroslav Morávek. NP-hard problems in
hierarchical-tree clustering. Acta Informatica, 23(3):311–323,
1986. 3
[KMR97] David R. Karger, Rajeev Motwani, and Gurumurthy D. Ramku-
mar. On approximating the longest path in a graph. Algorith-
mica, 18(1):82–98, 1997. 26
[KMS07] Claire Kenyon-Mathieu and Warren Schudy. How to rank with
few errors. In Proceedings of the 39th Annual ACM Symposium
on Theory of Computing (STOC07), pages 95–103. ACM, 2007.
15
[KT02] Jan Kratochvı́l and Zsolt Tuza. On the complexity of bicoloring
clique hypergraphs of graphs. Journal of Algorithms, 45(1):40–
54, 2002. 21
[MS90] Tze-Heng Ma and Jeremy Spinrad. Avoiding matrix multipli-
cation. In Proceedings of the 16rd International Workshop on
Graph-Theoretic Concepts in Computer Science (WG90), vol-
ume 484 of Lecture Notes in Computer Science, pages 61–71.
Springer, 1990. 12
[Mun71] James I. Munro. Efficient determination of the transitive closure
of a directed graph. Information Processing Letters, 1(2):56–58,
1971. 12
[Nie06] Rolf Niedermeier. Invitation to Fixed-Parameter Algorithms.
Oxford University Press, 2006. 8, 9
[NR00] Rolf Niedermeier and Peter Rossmanith. A general method to
speed up fixed-parameter-tractable algorithms. Information Pro-
cessing Letters, 73(3-4):125–129, 2000. 9
[NSS01] Assaf Natanzon, Ron Shamir, and Roded Sharan. Complexity
classification of some edge modification problems. Discrete Ap-
plied Mathematics, 113(1):109–128, 2001. 3, 15, 20
REFERENCES 89

[Rob86] John Michael Robson. Algorithms for maximum independent


sets. Journal of Algorithms, 7(3):425–440, 1986. 8

[Rob01] John Michael Robson. Finding a maximum independent set in


time O(2n/4 )? Technical report, Universit Bordeaux 1, Dparte-
ment d’Informatique, 2001. 8

[RS06] Venkatesh Raman and Saket Saurabh. Parameterized algorithms


for feedback set problems and their duals in tournaments. The-
oretical Computer Science, 351(3):446–458, 2006. 15

[YG80] Mihalis Yannakakis and Fanica Gavril. Edge dominating sets


in graphs. SIAM Journal on Applied Mathematics, 38:364–372,
1980. 49

[YZL07] Bing Yang, Si-Qing Zheng, and Enyue Lu. Finding two disjoint
paths in a network with MinSum-MinMin objective function. In
Proceedings of the 2007 International Conference on Foundations
of Computer Science (FCS2007), pages 356–361. CSREA Press,
2007. 26

You might also like