Professional Documents
Culture Documents
Ad am Breind el
Departm ent of Classics, Brow n University
May 1998
technique for stem m atic analysis; and prelim inary results from an application of this
technique.
The first interd isciplinary observation is that the m ethod s and purpose of
stem m atics overlaps substantially w ith the m ethod s and purpose of the biological sub-
d iscipline of clad istic analysis. While this fact is rarely em phasized or exploited , it is not
a new d iscovery, and its history w ill be d iscu ssed . The second interd isciplinary
observation is that com puter softw are w hich has been d eveloped for biolog ists in ord er
to solve problem s in clad istic analysis now offers us the possibility of ad vances in the
construction of textual stem m ata, through a non -trad itional use of trad itional
m anuscript collations.
The analytic technique contained herein – w hich d oes not appear ever to have
been attem pted heretofore – is the application of an existing clad istic analysis softw are
package to the stem m atic analysis of a m anuscript collation. The use of this technique to
analyze part of the Sallu stian corpus is thorough ly d ocum ented in this stud y.
Prelim inary results ind icate that the technique prod uces a stem m a nearly id entical to
Breind el 2
that published by L.D. Reynold s, the ed itor of the Oxford text. H ence, this m ethod
appears to offer an effective new approach to evaluating the relationships am ong extant
versions of a text.
The interplay betw een the d isciplines of biological system atics, genetics, and
textual criticism , w hich m akes this paper possible, has a som ew hat Byzantine history
spanning the last thirty years. I ask the read er to consid er w ith charity m y exposition of
this history. For it seem s that the relative uniqueness of this paper d em and s an
Background
Manuscript Trad ition of Juvenal.” 2 In this stu d y, Griffith applied m ethod s of num erical
explains in a sim ilar article the follow ing year, 3 he had in turn learned from biologist
Griffith d escribes the biological ad vances w hich he exploits in analyzing the texts
of Juvenal:
1
I have fou nd it necessary in the cou rse of this p ap er to refer to som e technical asp ects of system atics and
genetics. I have attem p ted to restrict to an elem entary level the fam iliarity requ ired w ith these d iscip lines,
in ord er to m ake this w ork accessible to a broad au d ience. N onetheless, read ers seeking an introd u ctory
exp osition m ay find ap p rop riate sections of the follow ing textbooks u sefu l:
Gam blin, Lind a and Gail Vines, ed s. (1991) The Evolution of Life, Oxford , chap ter 3.
Maxson, Lind a R. and Charles H . Dau gherty (1992) Genetics: A Human Perspective,
Du bu qu e, Iow a, chap ters 8, 10.
Minkoff, Eli C. (1983) Evolutionary Biology, Read ing, Mass., chap ter 22.
Griffith, John G. (1968) “A Taxonom ic Stu d y of the Manu scrip t Trad ition of Ju venal” M useum Helveticum
2
25:101-38.
Griffith, John G. (1969) “N u m erical Taxonom y and Som e Prim ary Manu scrip ts of the Gosp els,” Journal of
3
Scientist have long been aw are of the lim itations of the trad itional
m ethod s of classifying specim ens; biologists in particular have laboured
und er this hand icap. Within the last 10-15 years consid erable ad vances
have been m ad e, largely because techniques d eveloped for com puter use
have enabled specialists in this activity, w ho style them selves num erical
taxonom ists, to sift w ith speed and precision large m asses of
unprom isingly heterogeneous m aterial, and thereby to isolate groups or
„taxa‟ of related specim ens, on the basis of w hich further inquiry m ay be
cond ucted . ... 4
Thus, Griffith id entifies a requirem ent w hich textual criticism has in com m on w ith
d istinctions am ong vast am ounts of sim ilarity. The num eric taxonom y m ethod s appear,
expresses the hope that w e m ight find associations betw een specim ens by evaluating
large am ounts of d ata w ith m achine assistance. In light of the existing resources,
though, he rem arks that “for a textual critic operating w ith only a few thousand lines of
text it is sim ply not w orth the trouble of program m ing the d ata for m achine-
processing...” 5
proced ure w as, first, extraord inarily laborious: for the fourteen Gospel m anuscripts
analyzed in his article of 1969, up to fifty-six m anual record ing acts w ere required for
every variant am ong one or m ore of the m anuscripts. Thus he w as constrained to loo k
at only sm all sam ples of the d ata. Moreover, if he had had access to m ore d ata, he m ay
4
Griffith, op . cit. 1968, p p . 113-14.
5
ibid .
Breind el 4
Griffith‟s proced ure (and , in all fairness, the biological m ethod s w ith he w orked )
had a m ore troublesom e lim itation in that they resulted only in associations of objects.
Griffith could assert the d istribution of manu scripts into various sub -groups w ith
statistically-argued accuracy, but the m ere grouping of the m anuscripts d oes not seem
to have accom plished m uch. H is m ethod s said nothing about the genealogical
relationships of the m anuscripts. For exam ple, if manu scripts A, B, and C are found to
be in a single taxon, w e have only form alized their external sim ilarity. As useful as su ch
form alization m ight be, little is ind icated about the genealogical relationships likely to
Thus, Griffith su cceed ed in bringing num erical taxonom y into the arena of
textual criticism , but the biological approach upon w hich he d epend ed w as not
am bitious enough to d escribe the relationships am ong the specim ens and so his textual
Criticism and Editorial Technique.6 In this w ork, West explains that com puters m ight
theoretically hold som e prom ise for stem m a construction, because, und er the best
possible circum stances, build ing a stem m a d em and s only sim p le logic. Such a stem m a
is, how ever, skeptical about the id ea and hold s out som e theoretical reservations:
regard to w hether they w ere right or w rong; but this schem e w ould be
capable of su spension from any point [i.e., the schem e could not
d istinguish the subarchetypes] ... The correct orientation could only be
d eterm ined by evaluating the quality of the variants, w hich no m achine is
capable of d oing.7
West‟s objections w ill be consid ered in d etail later, as they are im portant to the
present investigation. But it is w orth noting for now that even if West had w anted to
test a com puterized construction of a stem m a, there w ould have been obstacles to his
p rogress.
First, there w ould not have been read ily available technology for his purpose.
But m ore im portantly, outsid e of theoretical com puter science or m athem atical graph
theory, there had not been practical research on autom ating the construction of
stem m ata when the data for the specimens is inconsistent or underdetermined. That is, if the
variants in a set of manuscripts w ere com pletely com patible w ith a unique stem m a, w e
w ould need only m ake the right inferences to generate it. In reality though, there is
usually no stem m a w hich is not inconsistent w ith at least one locus in the m anuscripts;
inconsistencies, w e find a m ultitud e of possible stem m ata. These stem m ata w e m ust
d istinguish on the basis of som e criterion capable of evaluating the likelihood that each
Thus, a variety of d ifficult problem s, theoretical and com p utational, inhere in the
task of mechanically constructing a stem m a – and they are not problem s w hich
classicists w ere likely to attack on their ow n. Fortuitously, how ever, d evelopm ent had
sim ultaneously been taking place w ithin the biological d isciplines of taxonom y and
7
West, op . cit., p p . 71-2.
Breind el 6
system atics so as to m otivate biologists to attem pt these sam e problem s. For d erivation
of the evolutionary relationships of a group of extant specim ens w as a key part of the
Biologist Willi H ennig had begun to d evelop and ad vocate a strictly phylogenetic
approach to arranging organism s.8 H ennig‟s view , that the evolutionary relationships of
organism s form ed the best found ation for classifying and system atizing them , w as and
rem ains the object of d ebate.9 Parts of his theory how ever, seem to have been ad opted
The clad istic approach seem s intuitively obvious, and G.D.C. Griffiths (along
w ith m any d efend ers) insisted that it alone had the ad vantage of relying on objective
fact about the organism s in question (rather than d eploying the organism s into classes
invented by hum ans). Griffiths w rites, “[H ennig‟s m ethod ] provid es the only
theoretically sound basis for achieving an objective equivalence betw een the taxa
intuitively obvious can also be d eceptively fallacious, and clad istics d oes have a
d isingenuous sid e. It is w orth pointing out two objections to the system here, largely so
that the read er m ay see that th ey do not apply to a textual application of the theory.
8
H ennig m ight be called the father of m od ern clad istics; his w ork w as d evelop ed and d ebated in variou s
p u blications inclu d ing (1950) Grundzüge einer Theorie der Phylogenetischen Systematik, Berlin.
(1966) Phylogenetic Systematics, Urbana, Illinois.
(1971) “Zu r Situ ation d er biologischen System atik,” Erlanger Forschungen, R. Siew ing ed .,
Erlangen.
9
For view s on the early intellectu al p ositions in the d ebates, see Ernst Mayr (1976) Evolution and the
Diversity of Life: Selected Essays, Cam brid ge, Mass., p p . 435-41.
Griffiths, G.D.C. (1972) “The Phylogenetic Classification of Dip tera Cyclorrhap ha w ith Sp ecial
10
Reference to the Stru ctu re of the Male Postabd om en,” W. Ju nk, N .V., The H agu e.
Breind el 7
d eterm ining the level of d escent at w hich class d ivisions should be m ad e. We are only
show n that, having m ad e a choice, w e are bound to includ e and exclud e certain
specim ens.
Second , given three organism s, A, B, and C, suppose that A and B are sim ilar in
form , w hile C d iffers greatly from both A and B. Suppose further that A and C are
closer evolutionarily to one another than either is to B. In this situation – w hich is not
and C all together, or else to class A and C together against B (Figure 1).
Figure 1
N either of these options appeals to our intuition the w ay that the system at first d id . For
A and B appear to form a group as against C, and yet this is precisely the classification
These tw o objections, w hile having m uch practical im port for the classifying of
that classification as our ow n prod uction); second , w e have no sym pathy for sim ilarity
Breind el 8
of appearance betw een m anuscripts if w e have hard evid ence that they are unrelated in
origin (since it is the origin that is the object of the textual quest).
prom pted research into the creation and evalu ation of stem m ata (or clad ogram s) from
incom plete and incom patible d ata. The clad istic approach d epend s for a starting point
on d eterm ining the evolutionary relationships of the specim ens – and these
relationships m ust be assem bled from lists of variations am ong the specim ens. H ence,
in a sense, biologists set to w ork on the problem s w hich had stood in front of Martin
West.
But d ebate about the philosophical und erpinnings of the clad istic m ethod ology
Michigan classicist and zoologist H . Don Cam eron, d ue to clad istics‟ evid ent sim ilarity
to established techniqu es in trad itional (i.e., not m echanical) textual criticism . 11 Cam eron
along w ith N orm an I. Platnick d escribe the d ebate, and situate them selves in it, thus:
Platnick, N orm an I. and H . Don Cam eron (1977) “Clad istic Method s in Textu al, Lingu istic, and
11
The field s referred to are ... textual criticism ... and ... linguistic
reconstruction.12
Cam eron and Platnick, w riting for an aud ience of biologists, next sum m arize the
betw een biological and textual stem m atics – w hich Cam eron and Platnick view as
subord inate to an overarching sim ilarity – are d escribed in m od erate d etail. 14 The paper
serves also to ind icate that these scholars can recognize and m ake precise the
correspond ence betw een stem m a construction and clad istic analysis.
In a conference conclu d ed in 1983, Cam eron again presented his view of textual
criticism . The conference had been organized to investigate the biological and clad istic
m etaphor in other intellectual field s. 15 Cam eron treated stem m atics, but he d id not
d iscu ss stem m ata as a m etaphor from biology, since, as he points out, the stem m atic
m ethod s as used in both field s “w ere d eveloped by classical scholars system atically in
the nineteenth century and ... the origins of the m ethod can be found as early as the
sixteenth century...” 16 Beyond m erely recounting the techniques of Maas, Cam eron
explores the d istinction – as far as it im pacts his clad istics-stem m atics com parison –
betw een “vertical” or uncontam inated trad itions and “horizontal” transm issions, those
12
Platnick, op . cit., p . 380.
13
Maas, P. (1958) Textual Criticism, Oxford ; Platnick, op . cit., p . 381-3.
14
Platnick, op . cit., p . 384.
15
Biological Metap hor Ou tsid e Biology (1982) and Interd iscip linary Rou nd -Table on Clad istics and Other
Grap h Theoretical Rep resentations (1983) sym p osia at the University of Pennsylvania. Proceed ings in
H oenigsw ald , H enry M. an d Lind a F. Wiener, ed s. (1987) Biological M etaphor and Cladistic Classification,
Philad elp hia.
Cam eron, H .D. (1987) “The Up sid e-Dow n Clad ogram : Problem s in Manu scrip t Affiliation,” in
16
“full of Byzantine, and even ancient, ed iting and conjecture.” 17 In the latter cases,
“clad istic m ethod s give little aid .” But in the form er, he conclud es:
[V]ertical transm ission and uncontam inated text trad ition m ake the
m echanical application of clad istic m ethod s to reconstruct a single
archetype a w orkable and successful m ethod , w ith a claim to being
scientific...18
Thus, Cam eron argues that, at least in a vertical textual trad ition, w e ought to be able to
use m ethod s from clad istics to d erive a stemm a and even an archetype.
At this point, the next m ove for a textual critic m ight have appeared obvious:
m ate West‟s insight about m echanical prod uction of stem m ata w ith Cam eron‟s insight
that clad istics provid es the theoretical and algorithm ic und erpinning for West‟s
operation. That is, use clad istic techniques to attack thorny problem s of textual
transm ission. It is unclear w hy this approach w as not exploited in the 1980s. We m ight,
In the 1980s, three further d evelopm ents cam e about w hich m ad e the project
sequencing:20 it becam e possible to put genetic m aterial from various species into an
autom ated process and receive, as output, essentially a collation show ing every genetic
d ifference betw een the sam ples. 21 More abund ant d ata w as now available w ith w hich
17
Cam eron, op . cit., p . 238.
18
ibid .
19
It is im p ortant to note that none of these three d evelop m ents sp rang fu lly form ed from the head of Zeu s
in the 1980s. It is convenient to d escribe them here, as their conflu ence seem s to change the research
environm ent at the tim e, bu t research on DN A sequ encing, p arsim ony algorithm s, and of cou rse
com p u ters had a long p rior history.
20
In p aticu lar the d evelop m ent of p olym erase chain reaction (PCR) d u p lication of DN A segm ents.
21
That is, in the sequ enced strand s of DN A.
Breind el 11
The second d evelopm ent of this tim e period w as the availability of com puters
sophisticated enough to com pare and evaluate the thousand s or tens of thousand s of
possible clad ogram s (stem m ata) w hich m ight result from com paring large num bers of
species. That is, com p u ters allow ed biologists to overcom e that challenge w hich Maas
had id entified for textual critics, w hen he observed that a large num ber of specim ens or
w itnesses w ould prod uce an astronom ical num ber of possible stem m ata. 22
The last pre-requisite d evelopm ent w as softw are system s to put large quantities
of d ata (w hether from DN A or elsew here) together w ith the com puters. Softw are to
com pute likely stem m ata involves, at its core, algorithm s w hich have been topics in
com puter science and m athem atics for a half-century or m ore. H ence, strictly speaking,
appropriate softw are had probably been “in d evelopm ent” in research universities and
corporate labs for som e tim e. But the early 1980s saw the release of packages d esigned
specifically for clad istics, tailored to the needs of practicing biologists, and read y to run
stem m a for the textual trad ition of Sallust‟s De Coniuratione Catalinae using one such
softw are package, the freely-d istributable Phylogeny Inference Package (or, as henceforth,
PH YLIP).23
Maas, op . cit., p . 47: “If w e have fou r w it nesses, the nu m ber of p ossible typ es of stem m a am ou nts to 250,
22
Before proceed ing to describe the m ethod and outcom e of the experim ent, it is
appropriate to consid er tw o technical objections w hich textual critics have put forw ard
The first objection is one of M.L. West, printed above. West correctly pointed out
that any stem m a d erived by algorithm w ould be an unoriented stem m a (or, as the
clad ists say, an „unrooted clad ogram ‟).24 That is, the algorithm could d eterm ine the
branchings of the stem m a but could not ascertain w hich branching belongs “at the
top”(in practice, this am ounts to id entifying the nod es representing the subarchetypes).
An unrooted clad ogram (Figure 2) can represent several d istinct rooted versions (Figure
3). Each rooted clad ogram can, in turn represent several d istinct possible phylogenies
(Figure 4).25
Figure 3. Rooted cladograms. Each of these five rooted cladograms is consistent with
the unrooted cladogram above (Figure 2). By postulating the first branching in the
descent, the known relationships specify the remainder of the tree. Note, however, that
the lenths of branches, and the specimens which might lie on the nodes of the tree, are
not indicated.
24
West, op . cit., p p . 71-2.
H u m p hries, C.J. and P.H . William s (1994) “Clad ogram s and Trees in Biod iversity,” M odels in Phylogeny
25
Reconstruction, Robert W. Scotland , Darrell J. Siebert, and David M. William s, ed s., Oxford , p p . 336-7.
Breind el 13
Figure 4. Phylogenetic Trees. All four of these phylogenetic trees are compatible with a
single cladogram above (Figure 3.ii). Note that schemata involving direct descent are
included.
West‟s objection is legitim ate. It should not, though, prevent us from pursuing
autom ated stem m a construction, for several reasons. First, the unrooted clad ogram is, if
accurate, a great advance over no stem m a and an even greater advance over an
incorrect stem m a. Second , it m ay in m any cases be tolerably easy to properly root the
clad ogram , thus prod ucing a trad itional stemm a, based on our know led ge of the d ates
and locales of origin for the various m anuscripts. Third , com puter m ethod s are
particularly useful in the frequent circum stance that the collation is not uniquely
com patible w ith any single proposed stem m a. In such cases, w e shall be happy to have
The second objection is one ad vanced by Roger David Daw e in stud ies of the
trad itions of Aeschylus and Sophocles. 26 Daw e‟s contention is that there is so m uch
horizontal transm ission in the trad itions for these authors, as ind icated by num erous
true read ings appearing in d epend ent m anuscripts though absent in other m anuscripts,
as to invalid ate the stem m atic approach. 27 Daw e confronts the m ethod ology of Pasquali
26
Daw e, R.D. (1964) The Collation and Investigation of M anuscripts of A eschylus, Cam brid ge
and (1973) Studies on the Text of Sophocles, 2 vols., Leid en.
27
Cam eron, op . cit., p . 237.
Breind el 14
– and consequently confronts m y m ethod , w hich d erives partly through Pasquali, Maas,
and West – at least in the case of ind ivid ual authors such as Aeschylus. H e w rites:
We believe that the fact of unique preservation has been d em onstrated [in
the Aeschylean case]; consequently the fault m ust lie w ith the theory of
d escent, and w e conclud e that the ... stem m a d oes not after all represent,
even in the sim plest form , the true ch aracter of the trad ition. ...
Cam eron sum m arizes the problem s w hich Daw e‟s assertion poses to an y m ethod such
If there are no archetypes or stem m ata, and if true read ings are uniquely
preserved in any m anuscript regard less of its stem m atic position, w e are
then throw n back to a proced ure of evaluating read ings w hich is unaid ed
by consid erations of outgroup com parison, reconstruction of an
archetype, or to push the concept to its logical conclu sion, w ithout the
consid eration of manuscript authority of any kind . 29
Daw e‟s assertion to hold true in certain specific textual trad itions. But w e need not
suppose that any particular num ber of su ch trad itions invalid ates the d ed uctive
stem m atic m ethod in general. H ence, in the absence of any argum ent against stem m atic
representation of the Sallustian trad ition, w e can proceed to analyze it via the clad istic
approach.
Experimental Procedure
28
Daw e, op . cit. 1964, p p . 157-8.
29
Cam eron, op . cit., p p . 237-8.
Breind el 15
In this stud y, the m anuscripts containing the De Coniuratione Catilinae and the De
Bello Iugurthino w ere exam ined , as these tw o w orks are found together in one set of
the follow ing m ethod . Eleven m anuscripts w ere selected from those includ ed in L.D.
Siglum Manuscript
A Parisinus 16025
B Basileensis
C Parisinus 6085
D Parisinus 10195
F Hauniensis Fabricianus
H Berolinensis Phillippsianus 1902
K Vaticanus Palatinus 887
N Vaticanus Palatinus 889
P Parisinus 16024
Q Parisinus 5748
V Vaticanus 3864
(Florilegium Vaticanum)
Table 1
Beginning at Catilina 1.1, the first 300 loci w ere selected w hich contain variants in
one or m ore of the above eleven m anuscripts. 30 The ad apted collation w as then form ed
by listing, for each locus, the groups of manuscripts w hich exhibited the sam e read ing.
Table 2
30
To be m ore p recise, in keep ing w ith the biological m etap hor, only the latest m arkings in the
m anu scrip ts w ere collated . Thu s, as corrected m arkings w ere ignored , loci containing variants in ear lier
hand s are not inclu d ed in the 300. The selected loci d o, how ever, inclu d e every variant in the last hand (at
each locu s) of the ap p rop riate m anu scrip t from Catilina 1.1 to 52.35.
Breind el 16
To analyze the collation, the DN APARS com ponent of the PH YLIP package w as to be
em ployed , because it is the only com ponent of PH YLIP w hich can process m ulti-state
program w hich com pares DN A base sequences for a set of specim ens and evaluates
A parsim ony criterion favors arrangem ents of the specim ens w hich require the
few est character state changes in the course of the specim ens‟ evolution. For exam ple, a
to one possessing ACT and , thereafter, requires the specim en possessing ACT to give
rise to one possessing the sequence AAA again w ould not be favored . This proposed
phylogeny requires tw o bases to change state (AA to CT) and later to change again
(back to AA), involving four base changes overall. Instead , a parsim ony criterion m ight
favor an arrangem ent w here one specim en featuring the AAA sequence gives rise to the
other w ith the AAA sequen ce, and the latter gives rise to that possessing the ACT
character state changes overall, and is thus m ore parsim onious than the form er.
Further assum ptions involved in the parsimony m ethod , and d iffering view s
In ord er to evaluate the collation using DN APARS, the collation d ata had to be
converted from the form illustrated in Table 2 to a form w herein m anuscripts grouped
This p hylogeny “m ight” be favored becau se one can observe other p ossible p hylogenies w ith only tw o
32
character state changes. Su ch p hylogenies w ou ld be equ ally p arsim oniou s w ith the one given, and hence
w ou ld be ju d ged equ ally d esirable by a p arsim ony criterion.
“DN APARS – DN A Parsim ony Program ” (d ocu m entation) in Felsenstein, op . cit.
33
Breind el 17
by a shared read ing w ere each assigned a particular DN A base abbreviation (A, C, G, T,
or “-“, w hich ind icates a fifth state to DN APARS). The DN A base label assigned to a
Each row of the collation w ould yield one DN A base label for each m anuscript;
thus the 300 loci in the collation w ould prod uced a 300-base “DN A strand ” for each of
the eleven m anuscripts. The creation and d ata entry of these 3,300 base labels w as
beyond w hat could easily be accom plished m anually. To perform the task, a custom
application program was w ritten (MSS2DN A) w hich allow s the entry of the collation in
table form , perform s the translation to sequences of DN A base labels for the various
m anuscripts, and m ou nts the results on the Microsoft Wind ow s clipboard (Figure 5). 34
From the clipboard , the DN A d ata for the various m anu scripts w as assem bled
w ith a text ed itor into the file form at required by DN APARS, as d ocum ented by
Felsenstein.35 In ord er to facilitate com parison to Reynold s‟ stem m atic w ork on the
Sallust m anuscripts, and becau se they represent only parts of the text, d ata for
m anuscripts V (a florilegium ) and Q w ere rem oved from the d ata file, leaving the nine
m anuscripts for w hich Reynold s had published a stem m a. In rem oving V and Q, som e
27 (i.e., 9%) of the loci w ere rend ered irrelevant, although they rem ain in the set. 36
34
This p rogram , w hile not elegant, is p u blicly available (w ith sou rce cod e) so that others m ay
ind ep end ently cond u ct investigations or rep eat and verify the p resent investigation. The p r ogram ,
MSS2DN A, ru ns on 32-bit Microsoft Wind ow s p latform s (Wind ow s 95, Wind ow s 98, Wind ow s N T) and
m ay be d ow nload ed in archived (ZIP) form at http :/ / hom er.bu s.m iam i.ed u / ~ad breind / m ss2d na.zip
“Molecu lar Sequ ence Program s” in Felsenstein, op . cit.
35
36
These d ata p oints rep resent loci at w hich only Q and / or V d iffered from the consensu s of rem aining
m anu scrip ts. These sites can be id entified from Ap p end ix B, in the table m arked “step s in each site,” as
sites w here the table show s 0 step s. That is, the r em aining m anu scrip ts show consensu s at the site, so no
character state changes are requ ired for any p hylogenetic arrangem ent of the m anu scrip ts.
Breind el 18
The com pleted DN APARS file appears in this report as “Append ix A: Infile.”
The DN APARS program w as then run, u sing this file as its d ata source. 37
DN APARS prod uced the output file w hich appears in this report as “Append ix B:
Outfile,” and w hich includ es the prelim inary phylogenetic tree (Figure 6). DN APARS
w as then run on the input d ata several m ore tim es in ord er that other possible m ost
parsim onious trees m ight be d iscovered . N o other m ost parsim onious trees w ere found .
37
The 386-Wind ow s p recom p iled PH YLIP execu tables w ere u sed throu ghou t. The p rogram op tions
selected for DN APARS w ere all d efau lts w ith the follow ing excep tions: Rand om ize ord er w as selected ,
w ith a seed of 69 (=4*17+1) and 100 p erm u tations of the inp u t row s; term inal typ e w as set to (none); inp u t
sequ ences interleaved w as set to N o; and all p rinting op tions for the ou tp u t w ere selected .
Breind el 19
+--F.Hauniens
+--8
+--7 +--D.Par10195
! !
+--6 +-----H.Beroline
! !
+--------5 +--------K.VatP_887
! !
! +-----------N.VatP_889
+--4
! ! +--C.Par_6085
! ! +--3
--1 +--------------2 +--B.Basileen
! !
! +-----A.Par16025
!
+-----------------------P.Par16024
Figure 6
In ord er that the output from this program m ight be com pared to Reynold s‟
published stem m a for Sallust, and in recognition of Reynold s‟ jud gm ents about the
quality of the textual variants, the tree w as re-oriented using the PH YLIP‟s RETREE
because they had been collected in Reynold s‟ presentation of the Sallu st stem m a, the
nod e representing their com m on ancestor w as selected for the outgroup (or
the genealogical relationships inferred betw een the m anuscripts by DN APARS. 38 The
38
Re-orientation in effect asserts likely p ositions for the su barchetyp es. As d escribed above, West had
ind icated that su ch a step w ou ld be requ ired , and that it shou ld be cond u cted u sing a critic‟s evalu ation
of the variants.
Breind el 20
Session.” 39 The session also prod uced as output a new tree file. This tree file w as u sed as
Figure 7
For the sake of com parison, Reynold s‟ stem m a is reprod uced (Figure 8). 40
Figure 8
As can be observed from the com puter-generated tree and Reynold s‟ tree
(Figures 7 and 8), they are nearly id entical m od ulo inversion. There are, how ever, tw o
The p rogram op tions selected for RETREE w ere all d efau lts w ith the follow ing excep tion: “no grap hics”
39
w as selected .
40
Reynold s, L.D., ed . (1991) C. Sallusti Crispi: Catilina, Iugurtha, Historiarum Fragmenta Selecta, A ppendix
Sallustiana, Oxford , p . xi.
Breind el 21
d ifferences. First, Reynold s associates N and K m ore closely w ith each other than w ith
Reynold s associates A m ore closely w ith P than w ith B or C, w hile DN APARS ind icated
no such closer affiliation. This latter d istinction can in fact be attributed to d ifferences in
the text being collated , rather than to d ifferences betw een the analyses of Reynold s and
Analysis
Since several hund red rearrangem ents of the ord er of the “DN A strand s”
prod uced no further most parsim onious trees, it seem s reasonable to suppose that the
m anuscript collation d ata specify a unique m ost parsim onious tree. 41 The existence of a
unique m ost parsim onious tree is itself an ind ication that the present m ethod m ay be
prod uctive, as it obviates the need for a hum an to insert prejud ices into the analysis, by
selecting one clad ogram from a list of many. The sim ilarity of the results d erived
through Reynold s‟ analysis to those d erived through the parsim ony analysis can, in
This sim ilarity is further strengthened w hen w e account for one of the tw o
ind icated d ifferences betw een the stem m ata. As d escribed above (see n. 30), in keeping
w ith the m etaphor of biological evolution, only the latest extant m arkings (corrections,
not includ ing d eletions) on each m anu script w ere collated . Thus, w here the first and
This su p p osition is based on Felsenstein‟s im p licit assu m p tion that a relatively sm all nu m ber of
41
rearrangem ents of the inp u t d ata ou ght to yield m u ltip le m ost p arsim oniou s trees if they exist. Su ch an
assertion seem s m athem atically su sp ect, consid ering th e large nu m ber of p ossible p erm u tations of, say,
nine m anu scrip ts (over 360,000). On this m atter, how ever, I d efer to Felsenstein‟s know led ge as a
sp ecialist.
Breind el 22
second hand s of A d iffered , the second hand w as read for the collation instead of the
first. Reynold s naturally constructs his stem m a ind icating the position of the original A
text. But he notes that “Secund a m anu s (A 2) librum lectionibu s instruxit ex aliquo stirpis
[= B, C] cod ice petitis.” That is, w here read ings exist in A 2, they com e from the B-C
m anuscript to d escend both from an ancestor of P and also from a closer ancestor of B
and C. To test this hypothesis, w e w ould m erely need to m od ify the collation to reflect
only A-A 1 read ings, and then see w here DN APARS places the m anuscript.
H aving taken the d iscrepancies into account, it seem s that both the hum an and
the m achine-assisted analysis d erive results from the sam e und erlying pattern am ong
the m anuscript read ings. This stud y, then, prelim inarily suggests that the parsim ony
analysis technique cou ld substantively ad vance know led ge of textual transm ission.
Furtherm ore, the parsim ony analysis can ind icate the read ings likely to appear in
the archetype and subarchetypes, in ord er that they m ost efficiently give rise to the
the scope of this stud y. But am bitious read ers should note that Append ix B to this
paper (i.e., the DN APARS output) provid es the read ings likely to appear at various
nod es in the clad ogram for every locus stud ied . On Reynold s‟ view of the transm ission,
the archetype (his ), ought to bear the read ings given for nod e 4.
Future Research
Breind el 23
The future presents a num ber of im m ed iate challenges and possibilities for the
clad istic analysis of texts using p arsim ony techniques. The obvious m ethod s through
w hich the proced ure m ay be tested includ e exam ining a variety of texts, as w ell as
using full collations – in place of collations bu ilt from apparatus critici – so as to avoid
d epend ence on one ed itor‟s opin ion of w hat m ay be viable m anuscript read ings. 42
If positive results are ind icated , parsim ony analysis m ight be d eployed to assist
the textual critic in d eterm ining the relationships of texts, and in reconstructing
existing d ogm a about trad itions w hich have not been recently exam ined . 43 In the
classroom , the use of graphical interactive parsim ony program s, w hich allow one to
inconsistencies thus fostered , m ay facilitate integration of stem m atics into the stand ard
classics curriculum .44 Lastly, literary theorists m ay w ish to pond er the existence of
d eeper m etaphors connecting the enzym es and m utation s of DN A replication w ith the
correspond ing verbal agents and scribal errors giving rise to m any of our textual
variants.
“Read ings w hich m u st qu ite certainly be elim inated have no p lace u nd er the text,” w rites Maas ( p . 23),
42
thu s giving ed itors license to om it even from the app. crit. those read ings d eem ed eliminanda.
43
We m ay su p p ose that p arsim ony analysis w ill be effective in evalu ating relationship s betw een
m anu scrip ts of texts in m od ern, as w ell as ancient, langu a ges.
44
MacClad e (d istribu ted by Sinau er Associates) is one su ch p rogram . Many cand id ates w hich m ight be
u sefu l for heavy-d u ty analysis as w ell as p ed agogy are d escribed by Felsenstein at
http :/ / evolu tion.genetics.w ashington.ed u / p hylip / softw are.htm l
Appendix A: Infile
9 300
P.Par16024AAACCCCCCCAATACCCCCCAACGCCCACACCCAACCACCACCCCCACGACGGAAACCCCCCGCCCCCCACCACAC
CACACCCCCCACCCAACCCATACCACCAAACACACGCCCCCGCCACACGACGGCACACACCCCACACAACAC-
CTCACCCACAACCCCCCCCACCACCCAACAAACCGCCCAAACCCACACCCCCACCACCCAACCCCCACCCCACACCCCCACAAACC
CCCCACCTACCCCCCCCCACCAGCCCCACCAACCAAACCAACCCCCGAACCACCCCCACCCCCCA
A.Par16025CCACACCCCACAGCCCCCCCACCGCACCCCCCACCCCCCCCCCCCCCCATCGAACCCGCCACGCCCCCCGCAACCC
CCCCCCCCCCCACACTCCCCACCACCCACACCCCCGACCCAAACCAAAAACGGCCCCCCCCCCACACCCCCCCCAC-
CCACCCAAAACACCCACCCCCCCACACCCAACCAACAAAACCACCCCCCCCAACACCCCCCACCACCCCCCCACACCCCCCAAACA
ACCCACCCCCACCCCCCCGCCCCACCCACCACCCCACACCCCGACCCCACCCCGACACCGC
B.BasileenAACCCCCCCAAATCACCCCCAGCGCCCACACAACCCCCACACCCCCGCCCCGGAACCCCCCCCCACCACCCCCCCC
CCCCCCCCCCCACCATACCCAACGAACAAACCCCCGAACCACACACAACCCGGACACCGCCCCACACCCCCCACCACCCACCCCCA
ACCCCCACACCCCACCAGACAACCACCCCCCCCACCCCCCCCAACCCCCCCCCCCCAACCCCCCCCCCCCCCCCGCCCACCCCCCC
CACCACCCCGCACCACCCACCGCCCCACCCCCCGACCCCCCCCCCACACCGA
C.Par_6085AAACCCCCCAAAACACCCCCAGCGCACCCCCAACCCCCCCCCCCCCGCACCGGACCCCCCACACACCACTCAACCC
CACACCCCCCCACACTCCCCTACGCCCACACCCCCGCACCACACAAAACCCGGCCACCCCCCCACACCCCCCCCCACCCACCCCCA
ACCCCCCCCCCCCCCCAGCCACCCACCCCCAACACCCCCCCCACCACCCCCCACCCCAACACACCCCCCCCCCCGCCCCCCCCCCC
CACCCCCCCGCCCCACCCACCACCCCACCCCCCGACCCCACCCCGACACCGC
N.VatP_889CAACCCAACCCACACCACAAACCGCCCCACCCCCCAAAACACCCCCAAGGCAGCCCCACACAAACCCCCACCACCC
CCCCCAACCCCACCCGCCCCACAAAGAACCCACCCGCCCCCGCCCCCAGCAGGCGGCCGACACCACCCCCCAGCGCCCCCCCCCCC
AACCCCCCCCCCCCCCAGACAGCACACACCCAACACCCCCCCAACACCCCCACACCCCACCCCCCCCCCCAACCACAACAACCACC
CCCCCCCCAGCCCCCCCAACCACCCCCCCCCCCCCCCACCCCCCGCAGCCGC
K.VatP_887ACCCCCCACCCCTCCAACCAAGCGCCCCCCCCCCCAAAACACCACAAAGGAGGCCCCCCCACGACCCCCGCCGCCC
CCCCCCCCCACCCCGGACCCCACAAGACCCACCCCACACACGCCACCCGCAGAGGGCCCACACCACCCACCATCCCTCCACCCCCC
ACCACACACCCCCAACAGCCAGCCCCCCCCCAACCCCACCACAACCCACAAACCCCCCCCCCCCCCCCCCCACCACCCAACACACC
CCCACCCAAGCCCCACCACCCCCACCCCCCCCCGCCCACCACCCACCACCGC
H.BerolineAACCCACCACACTCCCCCCACGACCCCCACCCCCCCACACGCCCCAAAGCCGGACCCCACCGGCCCCACCACCCCC
CCCCCCCCCACACCCTACCCGCCTAGCCCAACCCACCCACCGCCCCCAGCCGGACTCCCCAACCCCCCGCCATCCC-
CCACCCCCAACCACACCCCACCACAAGACAGACCCCCCCCCCACCCCCCAAGCCCAAAAACAGACCAACCCCCACCCCCCCCCCCC
CAC-GCCCCCCCACCCCCACACCCCCACCAGCCCCACCCACCGCCCCCCCCCAACCTCCGC
D.Par10195CACACACCACCATCCACACAATCACCACCCACCCCCAAACGACAAAACGGACCCCCCCCCCAGACAAACACCCACC
CCCCACAACACCCCTCAAACGCCAAGCCCCACACAACCAACGCCCCCCGCACCGGGCCACAAACCCCCCCCAGCCCGCCCCACCCA
CCCCAACAACAACGAAAGCAAGCCCCCCCCCACCCAACCCAACCCAAAAAACCGACCAACCACCACCCCCCCCCCCCCACGCCCAC
CCCACCACCCCACACCCACCAGCAAACACCCAAGCCCCCCCACCCCCACCAC
F.HauniensCACACACCCACATCCAAACCAGCAACACCCACCCCCAACAAAACAAACGGACCCCCATCCCACACAAACACACACA
CCCCACAACCCACCTGAACCGGCAAGCCCCCCACCACAAACGCACCCCGCCAGTGTCCGCACACCCCACCCAGCCCGCCCCACCCA
CACAAACACCAAAGAAAGCAAGCCCCACCCCACGCAACACAACCCCAAAACCCCAACACCCACCACCCCCCCCCCACCACGCCCAC
CACAACACCCCACCCCCACCAGCCAAGACCAAAGCCACCCCACCCCC-ACAC
Breind el 25
Appendix B: Outfile
Name Sequences
---- ---------
+--F.Hauniens
+--8
+--7 +--D.Par10195
! !
+--6 +-----H.Beroline
! !
+--------5 +--------K.VatP_887
! !
! +-----------N.VatP_889
+--4
! ! +--C.Par_6085
! ! +--3
--1 +--------------2 +--B.Basileen
! !
! +-----A.Par16025
!
+-----------------------P.Par16024
1 CGAMCCMCCC CCRCCMCCSM
1 4 maybe ...C..C... .....A..GC
4 5 yes ..C..M.... ..........
5 6 maybe .......... ..A.......
6 7 maybe .....C.... ..........
7 8 yes A........A ..C.....A.
8 F.Hauniens yes ....A..... .....-A...
8 D.Par10195 no .......... ..........
7 H.Beroline yes .......... .A...T....
6 K.VatP_887 yes .....A..A. ..........
5 N.VatP_889 yes .C...A.... ..G.AG....
4 2 yes .......M.. ..GA......
2 3 no .......... ..........
3 C.Par_6085 maybe .......A.. ..........
3 B.Basileen yes .......C.. ..C......A
2 A.Par16025 maybe .......A.. ..........
1 P.Par16024 maybe ...A..A... ..A..C..CA
Breind el 30
Are these settings correct? (type Y or the letter for one to change)
0
Are these settings correct? (type Y or the letter for one to change)
y
,>>1:F.Hauniens
,>15
,>14 `>>2:D.Par10195
! !
,>13 `>>>>>3:H.Beroline
! !
,>>>>>>>12 `>>>>>>>>4:K.VatP 887
! !
! `>>>>>>>>>>>5:N.VatP 889
,>11
! ! ,>>6:C.Par 6085
! ! ,>17
-10 `>>>>>>>>>>>>>16 `>>7:B.Basileen
! !
! `>>>>>8:A.Par16025
!
`>>>>>>>>>>>>>>>>>>>>>>>9:P.Par16024
,>>1:F.Hauniens
,>15
,>14 `>>2:D.Par10195
! !
,>13 `>>>>>3:H.Beroline
! !
Breind el 31