Professional Documents
Culture Documents
Algorithms in Practice∗
†
Daniel A. Spielman Shang-Hua Teng
Department of Computer Science Department of Computer Science
Yale University Boston University
spielman@cs.yale.edu shanghua.teng@gmail.com
ABSTRACT in the worst case: if one can prove that an algorithm per-
Many algorithms and heuristics work well on real data, de- forms well in the worst case, then one can be confident that
spite having poor complexity under the standard worst-case it will work well in every domain. However, there are many
measure. Smoothed analysis [36] is a step towards a the- algorithms that work well in practice that do not work well
ory that explains the behavior of algorithms in practice. It in the worst case. Smoothed analysis provides a theoretical
is based on the assumption that inputs to algorithms are framework for explaining why some of these algorithms do
subject to random perturbation and modification in their work well in practice.
formation. A concrete example of such a smoothed analysis The performance of an algorithm is usually measured by
is a proof that the simplex algorithm for linear programming its running time, expressed as a function of the input size
usually runs in polynomial time, when its input is subject of the problem it solves. The performance profiles of al-
to modeling or measurement noise. gorithms across the landscape of input instances can differ
greatly and can be quite irregular. Some algorithms run
in time linear in the input size on all instances, some take
1. MODELING REAL DATA quadratic or higher order polynomial time, while some may
take an exponential amount of time on some instances.
“My experiences also strongly confirmed my Traditionally, the complexity of an algorithm is measured
previous opinion that the best theory is inspired by its worst-case performance. If a single input instance
by practice and the best practice is inspired by triggers an exponential run time, the algorithm is called an
theory. ” exponential-time algorithm. A polynomial-time algorithm
[Donald E. Knuth: “Theory and Practice”, is one that takes polynomial time on all instances. While
Theoretical Computer Science, 90 (1), 1–15, 1991.] polynomial time algorithms are usually viewed as being effi-
Algorithms are high-level descriptions of how computa- cient, we clearly prefer those whose run time is a polynomial
tional tasks are performed. Engineers and experimentalists of low degree, especially those that run in nearly linear time.
design and implement algorithms, and generally consider It would be wonderful if every algorithm that ran quickly
them a success if they work in practice. However, an al- in practice was a polynomial-time algorithm. As this is not
gorithm that works well in one practical domain might per- always the case, the worst-case framework is often the source
form poorly in another. Theorists also design and analyze of discrepancy between the theoretical evaluation of an al-
algorithms, with the goal of providing provable guarantees gorithm and its practical performance.
about their performance. The traditional goal of theoretical It is commonly believed that practical inputs are usually
computer science is to prove that an algorithm performs well more favorable than worst-case instances. For example, it
is known that the special case of the Knapsack problem in
∗This material is based upon work supported by the Na- which one must determine whether a set of n numbers can
tional Science Foundation under Grants No. CCR-0325630 be divided into two groups of equal sum does not have a
and CCF-0707522. Any opinions, findings, and conclusions polynomial-time algorithm, unless NP is equal to P. Shortly
or recommendations expressed in this material are those of before he passed away, Tim Russert of the NBC’s “Meet
the author(s) and do not necessarily reflect the views of the
National Science Foundation. the Press,” commented that the 2008 election could end in
Because of CACM’s strict constraints on bibliography, we a tie between the Democratic and the Republican candi-
have to cut down the citations in this writing. We will post dates. In other words, he solved a 51 item Knapsack prob-
a version of the article with more complete bibliograph on lem1 by hand within a reasonable amount of time, and most
our webpage.
†Affliation after the summer of 2009: Department of Com- 1
In presidential elections in the United States, each of the
puter Science, University of Southern California. 50 states and the District of Columbia is allocated a num-
Permission to make digital or hard copies of all or part of this work for ber of electors. All but the states of Maine and Nebraska
personal or classroom use is granted without fee provided that copies are use winner-take-all system, with the candidate winning the
not made or distributed for profit or commercial advantage and that copies majority votes in each state being awarded all of that states
bear this notice and the full citation on the first page. To copy otherwise, to electors. The winner of the election is the candidate who is
republish, to post on servers or to redistribute to lists, requires prior specific awarded the most electors. Due to the exceptional behavior
permission and/or a fee. of Maine and Nebraska, the problem of whether the gen-
Copyright 2008 ACM 0001-0782/08/0X00 ...$5.00. eral election could end with a tie is not a perfect Knapsack
likely without using the pseudo-polynomial-time dynamic- it can be viewed as a function from Ω to R1+ . But, it is
programming algorithm for Knapsack! unwieldy. To compare two algorithms, we require a more
In our field, the simplex algorithm is the classic example concise complexity measure.
of an algorithm that is known to perform well in practice An input domain Ω is usually viewed as the union of a
but has poor worst-case complexity. The simplex algorithm family of subdomains {Ω1 , . . . , Ωn , . . .}, where Ωn represents
solves a linear program, for example, of the form, all instances in Ω of size n. For example, in sorting, Ωn is
the set of all tuples of n elements; in graph algorithms, Ωn
max cT x subject to Ax ≤ b, (1) is the set of all graphs with n vertices; and in computational
where A is an m × n matrix, b is an m-place vector, and c is geometry, we often have Ωn ∈ Rn . In order to succinctly
an n-place vector. In the worst case, the simplex algorithm express the performance of an algorithm A, for each Ωn
takes exponential time [25]. Developing rigorous mathemat- one defines scalar TA (n) that summarizes the instance-based
ical theories that explain the observed performance of prac- complexity measure, TA [·], of A over Ωn . One often further
tical algorithms and heuristics has become an increasingly simplifies this expression by using big-O or big-Θ notation
important task in Theoretical Computer Science. However, to express TA (n) asymptotically.
modeling observed data and practical problem instances is
a challenging task as insightfully pointed out in the 1999 2.1 Traditional Analyses
“Challenges for Theory of Computing” Report for an NSF- It is understandable that different approaches to summa-
Sponsored Workshop on Research in Theoretical Computer rizing the performance of an algorithm over Ωn can lead to
Science2 . very different evaluations of that algorithm. In Theoretical
Computer Science, the most commonly used measures are
“While theoretical work on models of com- the worst-case measure and the average-case measures.
putation and methods for analyzing algorithms The worst-case measure is defined as
has had enormous payoff, we are not done. In
WCA (n) = max TA [x].
many situations, simple algorithms do well. Take x∈Ωn
for example the Simplex algorithm for linear pro-
The average-case measures have more parameters. In each
gramming, or the success of simulated annealing
average-case measure, one first determines a distribution
of certain supposedly intractable problems. We
of inputs and then measures the expected performance of
don’t understand why! It is apparent that worst-
an algorithm assuming inputs are drawn from this distribu-
case analysis does not provide useful insights on
tion. Supposing S provides a distribution over each Ωn , the
the performance of algorithms and heuristics and
average-case measure according to S is
our models of computation need to be further de-
veloped and refined. Theoreticians are investing AveS
A (n) = E [ TA [x] ] ,
x∈S Ωn
increasingly in careful experimental work leading
to identification of important new questions in where we use x ∈S Ωn to indicate that x is randomly chosen
algorithms area. Developing means for predict- from Ωn according to distribution S.
ing the performance of algorithms and heuristics
on real data and on real computers is a grand 2.2 Critique of Traditional Analyses
challenge in algorithms”. Low worst-case complexity is the gold standard for an
algorithm. When low, the worst-case complexity provides an
Needless to say, there are a multitude of algorithms be- absolute guarantee on the performance of an algorithm no
yond simplex and simulated annealing whose performance matter which input it is given. Algorithms with good worst-
in practice is not well-explained by worst-case analysis. We case performance have been developed for a great number
hope that theoretical explanations will be found for the suc- of problems.
cess in practice of many of these algorithms, and that these However, there are many problems that need to be solved
theories will catalyze better algorithm design. in practice for which we do not know algorithms with good
worst-case performance. Instead, scientists and engineers
2. THE BEHAVIOR OF ALGORITHMS typically use heuristic algorithms to solve these problems.
Many of these algorithms work well in practice, in spite of
When A is an algorithm for solving problem P we let
having a poor, sometimes exponential, worst-case running
TA [x] denote the running time of algorithm A on an input
time. Practitioners justify the use of these heuristics by ob-
instance x. If the input domain Ω has only one input in-
serving that worst-case instances are usually not “typical”
stance x, then we can use the instance-based measure TA1 [x]
and rarely occur in practice. The worst-case analysis can be
and TA2 [x] to decide which of the two algorithms A1 and A2
too pessimistic. This theory-practice gap is not limited to
more efficiently solves P . If Ω has two instances x and y,
heuristics with exponential complexity. Many polynomial
then the instance-based measure of an algorithm A defines a
time algorithms, such as interior-point methods for linear
two dimensional vector (TA [x], TA [y]). It could be the case
programming and the conjugate gradient algorithm for solv-
that TA1 [x] < TA2 [x] but TA1 [y] > TA2 [y]. Then, strictly
ing linear equations, are often much faster than their worst-
speaking, these two algorithms are not comparable. Usually,
case bounds would suggest. In addition, heuristics are often
the input domain is much more complex, both in theory and
used to speed up the practical performance of implementa-
in practice. The instance-based complexity measure TA [·] de-
tions that are based on algorithms with polynomial worst-
fines an |Ω| dimensional vector when Ω is finite. In general,
case complexity. These heuristics might in fact worsen the
problem. But one can still efficiently formulate it as one. worst-case performance, or make the worst-case complexity
2
Available at http://sigact.acm.org/ difficult to analyze.
Average-case analysis was introduced to overcome this dif- very well be governed by some “blueprints” of the gov-
ficulty. In average-case analysis, one measures the expected ernment and industrial giants, but it is still “perturbed”
running time of an algorithm on some distribution of inputs. by the involvements of smaller Internet Service Providers.
While one would ideally choose the distribution of inputs
that occur in practice, this is difficult as it is rare that one In these examples, the inputs usually are neither com-
can determine or cleanly express these distributions, and the pletely random nor completely arbitrary. At a high level,
distributions can vary greatly between one application and each input is generated from a two-stage model: In the first
another. Instead, average-case analyses have employed dis- stage, an instance is generated and in the second stage, the
tributions with concise mathematical descriptions, such as instance from the first stage is slightly perturbed. The per-
Gaussian random vectors, uniform {0, 1} vectors, and Erdös- turbed instance is the input to the algorithm.
Rényi random graphs. In smoothed analysis, we assume that an input to an al-
The drawback of using such distributions is that the in- gorithm is subject to a slight random perturbation. The
puts actually encountered in practice may bear very little smoothed measure of an algorithm on an input instance is
resemblance to the inputs that are likely to be generated by its expected performance over the perturbations of that in-
such distributions. For example, one can see what a random stance. We define the smoothed complexity of an algorithm
image looks like by disconnecting most TV sets from their to be the maximum smoothed measure over input instances.
antennas, at which point they display “static”. These ran- For concreteness, consider the case Ωn = Rn , which is a
dom images do not resemble actual television shows. More common input domain in computational geometry, scientific
abstractly, Erdös-Rényi random graph models are often used computing, and optimization. For these continuous inputs
in average-case analyses of graph algorithms. The Erdös- and applications, the family of Gaussian distributions pro-
Rényi distribution G(n, p), produces a random graph by in- vides a natural models of noise or perturbation.
cluding every possible edge in the graph independently with Recall that a univariate Gaussian distribution with mean
probability p. While the average degree of a graph chosen 0 and standard deviation σ has density
from G(n, 6/(n − 1)), is approximately six, such a graph 1 2 2