Professional Documents
Culture Documents
5B (2007), 72-76
Abstract
This paper presents an algorithm performance characterization and its analysis in order to find
explanations through causal models. A systematic development of a causal model for the Threshold
Accepting algorithm performance is shown, when it solves the bin packing problem. Observation of
factors that intervene in its behavior and the analysis of solutions space from the solved instances were
used to create a causal model. A detailed model development and its interpretation are presented. This
kind of modeling could offer elements for a better understanding of algorithms performance and as
consequence their redesign.
characterize the algorithm with specific instances, for this final explanations are given by the researcher analyzing
reason it is mainly used with approximation algorithms. the meaning of each variable and their interactions, in
The research about methodologies for computational the context of previous knowledge of the problem do-
experimentation is growing so fast and has as an objec- main.
tive to promote experiments to be important, correct,
replicable and knowledge producers. [1] No matter Related Works
which method is selected to characterize the algorithms,
the goal is to understand how the performance depends Esposito presented a work using algorithms of induc-
on a set of factors that influence it. The acquired under- tive learning and techniques of causal inference to disco-
standing may lead to better predictions of algorithms ver causal rules for attributes of relational databases
performance in new situations and the discovering of [10]. The result of this work was a system called
improved algorithms. CAUDISCO, in which the process consists of two pha-
ses: inferring the causal structure of the data studying the
Explaining Algorithm Performance associated conditional independences, and to generate
a rules set for relevant dependences using C4.5 classi-
The explanation of an observed behavior is made fication algorithm [11].
through a transition between three stages: description, Lemeire and Dirkx presented a causal modeling applied
prediction and causality [2]. In the case of explanations for to the analysis of parallel algorithms performance, for
approximation algorithms behavior, the transition has been detecting the causes of communication process anomalies
done mainly through the next focuses. At the beginning the [12]. The causal model was generated applying the Tetrad
works were focused on the study of performance by the software for variables related with performance. The
demonstration of the algorithm superiority on a set of obtained causal rules were given in an informal way.
standard instances [3]. The following works incorporated an Maes presented a paradigm to treat the causal multia-
algorithm characterization and some observations about the gent models [13]. The investigation treats the effect of
performance [4]. Subsequent works added the prediction variables obtained by the combination of observational
possibility based on algorithm and problem characterization data and some theoretical suppositions. The authors
[5]. Recently, some works are trying to create more presented an algorithm for the identification of causal
complete and complex models that explain which cha- effects in contexts in which an agent does not have
racteristics affect the performance and how they are related complete access to the whole dominion.
[6]. The presented paper follows the latest approach, The mentioned investigations are focused on the
contributing to the explaining of performance through the algorithms performance study, but they do not present a
generation of causal models. complete treatment that includes the modeling of charac-
teristics that affect the performance, the evaluation of the
Creation of Causal Models for Algorithm acquired knowledge, and the formal explanation of the
Performance observed behavior. The work of Esposito, although not
focused on the analysis of algorithm, evaluates the
A causal model is a generalized representation of knowledge and incorporates explanations. So, we have
knowledge that is obtained by finding dependences considered combining these proposals to enrich the
through the data that implies cause-effect relations [6]. algorithm performance analysis.
One of the most common representations for these
relations is using directed acyclic graphs. The process of Causal Model of the Threshold Accepting
generating causal models is non trivial; Chickering Algorithm (TA)
proved that this problem is NP-hard [7].
Causal modeling generally have four stages: specifi- The procedure used to build a causal model for the
cation, estimation, interpretation and evaluation [8]. The approximation algorithm threshold accepting (TA) [14],
first indicates which variables are causes and which are applied in the solution of the bin packing problem, is
effects, the second determines the intensity of founded presented now. The procedure incorporates the main
causal relations, in the third the results are analyzed and ideas of Cohen and Spirtes [2, 15].
interpreted, and in the fourth the model is proved to know
its accuracy. Step 1: Specification of Causal Model
Different methods are used in the estimation phase; it
depends on how the relations are represented during the Identification of Explanatory Variables
specification stage. If they are presented by a graph, the Variables derived of the measurement of the
most common to determine their magnitudes, is to find parameters of the TA algorithm and others referring to
the probability distributions of the given graph [9]. the space of solution were analyzed to identify those that
In the interpretation phase the most important had some effect in the algorithm performance. For this
relations with the highest magnitude are analyzed. The purpose, questions about the relation between problem
74 Pérez Ortega J., et al.
characteristics and algorithm performance were for- analyses, so four levels were established for each
mulated: Is there a difference in the number of feasible explanatory variable.
solutions from space solution for instances in which TA
performs better? How is the variability of the feasible Graphical Analysis
solutions space of solutions for these instances? Is there In the graphical analysis the distributions of proposed
a difference related to the instances which in which TA explanatory variables related to response variable perfor-
performs worst? Are they factors related with the algo- mance were verified to identify if each variable showed
rithm performance? differences in frequency distribution between it levels.
TA algorithm was analyzed to identify which aspects The result of this analysis showed that variables Fs, Vo,
were feasible of measuring during execution time and if Tav and Nfs have clear differences related to levels of
these could offer an answer to the outlined questions. variable performance (1: win, 0: lost); while in variables
Also, a sample of the space of solution associated to the Ti, Tf and Mc the distributions for both levels were
instances was analyzed, to identify those aspects that almost the same, for this reason they were discarded of
could be measured. In such way the following variables future analysis.
were created, the first five corresponding to algorithm
execution, and the remainder to solution space of the Statistical Analysis
problem. 1) Average of the initial temperature Ti, 2) Relations between explanatory variables Fs, Vo, Tav
Average of the final temperature Tf, 3) Average of the and Nfs with the variable ratio as a response variable were
number of temperatures Tav, 4) Number of times in analyzed. In Figure 1a the relation between the levels of
which the algorithm stops by non finding solution Nfs, 5) analyzed variables and the response variable can be
the execution number in which TA obtained the best observed. It can be noted that the variables that appear
solution Br, 6) Number of feasible solutions Fs, 7) related with performance are: Fs and Tav. Figure 1b shows
Variance of 100 random feasible solutions Vo. the interactions of all the variables between their respective
The variables related to TA execution were measured levels; there apparently exists evidence of relations between
from 30 executions of the algorithm with each instance values of Tav and Fs as well as Nfs with Fs.
from a set of 1226 bin packing standard instances taken
from OR library [16], each instance has as parameters Main Effects Plot (data means) for ratio
the number of items n and their associated weights. The Fs Vo
0.25
0.30
solution and it indicates the distance from the solution given 0.25
An analysis of variance (MANOVA) was carried out pendence) permit us the differentiation between variable
using MINITAB software. The dependent variable was values associated to the values of the performance node.
ratio and the independent variables were Fs, Vo, Tav, The obtained relations with greater probability and
Nfs. The tested hypothesis was “variables Fs, Vo, Tav, support are: P(performance =1 | Fs =4, Vo =1, Tav =1),
Nfs are closely related with ratio” and it was accepted P (performance | Fs = 1, Vo = 3, Tav = 4) and
with a confidence level of 95%. The residuals were P(performance = 2| Fs = 4, Vo = 1, Tav = 1).
analyzed and they looked closely adjusted to a normal
distribution and the constant variation assumption was Model Interpretation
rejected in a Levene test [8]. However as the ANOVA Causal relations which showed greater values of
test is very robust to deviations of normality assumption conditional probability and support were interpreted; the
and constant variation, the F test is slightly affected in following explanations were inferred from these relations.
a fixed effects model for balanced [8], which is the case. P(performance =1| Fs =1, Vo =2,3, Tav =4 ). The
TA algorithm wins if the number of random feasible
Step 2: Estimation of Causal Order solutions of a representative sample from the solutions
space of the problem is small (0), the variability among
The construction of the causal model was carried out fitness function values in a sample of 100 random
using the PC [15] algorithm from TETRAD [19] causal solved feasible solutions is between 0.292-1.685 and
inference software with a confidence level of 0.95%. The the number of temperatures in execution time is big
data used to get the model were Fs, Vo, Tav, Nsf and (23-57).
performance variables in a discrete way. In Fig. 2 it is P(performance =2| Fs =4, Vo =1, Tav =1 ). The TA
appreciated that variables Fs, Vo and Tav have direct algorithm loses if the number of random feasible solutions
relation with the performance node, and this fact of a representative sample from solutions space of the
confirms the suppositions made in the graphical and problem is big 70.978%-100%, there is a little variability
statistical analysis. among fitness function values in a sample of 100 random
solved feasible solutions (0.012-0.092) and the number of
temperatures in execution time is small (2-13).
The obtained explanations indicate that a particular
combination of values for the variable Fs, Vo, and Tav
characterizes an instance as won or lost against greedy
algorithm. We observed that the approximation algo-
rithm wins over greedy algorithm when the search of
solutions is intensified because it is not easy to find
feasible solutions. Referring to the explanation of why
the algorithm TA lost we observed that the solution va-
lues are too similar generating a flat trajectory in which
greedy algorithm takes advance.
Conclusions
performance causal model of a set of approximate algo- 9. HECKERMAN D., A Bayesian Approach to Lear-
rithms is being built having as objective to contrast its ning Causal Networks. Technical Report, MSR-TR-
performance with different configurations. 95-04. Microsoft Research. Advanced Technology
Division. Microsoft Corporation, 1995.
Acknowledgements 10. ESPOSITO F., MALERBA D., RIPA V.,
SEMERARO G., Discovering Causal Rules in Rela-
This research was supported in part by CONACYT tional Databases, Applied Artificial Intelligence, 11,
and DGEST. 71, 1997.
11. QUINLAN J., C4.5: Programs for machine learning,
Morgan Kaufmann, San Mateo, Calif., 1993
References 12. LEMEIRE J., DIRKX E., Causal Models for Per-
formance Analysis, 4th PA3CT Symposium, Ede-
1. MCGEOCH C., Experimental Analysis of Algo- gem, Belgica, 2004.
rithms, Pardalos, Romeijn, H.E.: Handbook of Glo- 13. MAES S., MEGANCK S., MANDERICK B., Iden-
bal Optimization, 2, 489, 2002. tification of Causal Effects in Multi-agent Causal
2. COHEN P., Empirical Methods for Artificial Intel- Models, Proceedings Artificial Intelligence and
ligence, The MIT Press Cambridge, Massachusetts, Applications, 2005.
London England, 1995. 14. PÉREZ J., PAZOS R, FRAUSTO J., RODRÍGUEZ
3. HOOKER J., Needed: An empirical science of algo- G., CRUZ L., FRAIRE H., Comparison and Selec-
rithms, Operations Research, 42, 1994. tion of Exact and Heuristic Algorithms, Lectures
4. HOOS H.H., Stochastic Local Search -Methods, Notes in Computer Science, Vol. 3045. Springer
Models, Applications, PhD Thesis, Department of Verlag, Berlin Heidelberg New York, pp. 415-424,
Computer Science from Darmstadt University of 2004.
Technology, Germany, November, 1998. 15. SPIRTES P., GLYMOUR C., SCHEINES R., Cau-
5. PÉREZ J., PAZOS R., FRAUSTO J., RODRÍGUEZ sation, Prediction, and Search, MIT Press, 2nd
G., ROMERO D., CRUZ L., A Statistical Approach edition 2001.
for Algorithm Selection WEA: pp. 417-431 2004 16. BEASLEY J. E., OR-Library. Brunel University.
6. LEMEIRE J., DIRKX E., Causal Models for Per- http://people.brunel.ac.uk/~ mastjjb/jeb/ orlib/
formance Analysis, 4th PA3CT Symposium, binpackinfo. html 2006.
Edegem, Belgica, 2004. 17. FALKENAUER E., A Hybrid Grouping Genetic
7. CHICKERING D., A transformational Characte- Algorithm for Bin Packing, Journal of Heuristics, 2,
rization of Equivalent Bayesian Network Structures, 5, 1996.
11th Conference on Uncertainly AI. San Francisco, 18. MICHALEWICZ Z., FOGEL D. B., How to Solve
pp. 87-98, 1995. It: Modern Heuristics, Springer Verlag, 1999.
8. MONTGOMERY D., Diseño y Análisis de Expe- 19. Carnegie Mellon’s University, Open Learning
rimentos, Limusa Willey. Segunda Edición 2004. Initiative (OLI), http://www.cmu.edu/oli/