You are on page 1of 2

This article appeared in the December 13, 2008, issue of THOROUGHBRED TIMES. To subscribe, call 1-888-499-9090.


by John P. Sparkman
MODERN Thoroughbred pedigree
analysis stands firmly on the foun-
dation built by Col. Jean-Joseph Vuil-
lier. A late 19th-centur y French
cavalry officer, Vuillier was the first
pedigree researcher to apply any-
thing approaching a scientific method
to Thoroughbred pedigrees.
Pedigree research perhaps re-
mains a more arcane subject than
necessary, but Vuillier was the first
to compile a scientifically sound, ex-
perimentally based study, and he pub-
lished detailed results in his 1902
book Les croisements rationnels dans
la Race Pure(loosely translated, Ra-
tional Thoroughbred breeding).
Vuillier based his research on the
winners of Englands classic races,
compiling 12-generation pedigrees
of 654 great horses, counting the oc-
currences of common names, and
correctly applying Galtons law of an-
cestral heredity to determine genetic
contributions of their most common
Vuillier found that after roughly
12 generations, the average contri-
bution of the most influential sires
and dams to any pedigree more or
less stabilizes. Vuilliers original re-
search identified 21 ancestors of the
Thoroughbred who had contributed
more than others, and he divided
them into five series grouped
roughly according to birth era (see
Table 1). In 1925, Vuillier published
a sixth series in a second volume of
Les croisements rationnels.
Aga Khan III, who entered Eng-
lish racing in 1921, was so taken with
Vuilliers work that he hired him as
his private pedigree adviser and for-
bade further publication. Vuillier died
a few years later, but his widow, Ger-
maine, maintained and updated the
system and supervised the Aga Khan
studs matings for almost 40 years.
An updated and improved version of
the system is still in use in the cur-
rent Aga Khans studs.
Varola and Roman
In the 1950s and 60s, an Italian jour-
nalist and pedigree adviser, Francesco
Varola, produced a new interpreta-
tion based on Vuilliers ideas, publish-
ing his findings in two seminal books,
Typology of the Racehorse and Func-
tional Development of the Thorough-
bred. Unfortunately, Varola abandoned
Vuilliers generational numbers, but
his invention of aptitudinal categories
for his chefs-de-racerecognized the in-
creasing specialization of racehorses
in the 20th century.
Varolas method merely counted
occurrences of names in pedigrees
and then attempted to measure the
balance in pedigrees by various in-
dexes. Unfortunately, there was no
scientific basis in genetics for Varolas
In 1981, Daily Racing Formcolum-
nist Leon Rasmussen published new
research by American chemist and
pedigree researcher Steve Roman
that attempted to combine the in-
sights of Varola with the more sci-
entific approach of Vuillier, again
assigning mathematically correct
numbersthough only through the
first four generations of a pedigree
to contributions of horses in Varolas
aptitudinal categories.
Like Varola, however, Roman at-
tempted to define indexes and guide-
lines he proposed would aid breeders
in devising pedigrees. Although
Varola specifically and adamantly dis-
avowed any connection between
stamina and distance preferences
and his aptitudes, Roman did just the
opposite, defining those contribu-
tions purely in terms of distance pref-
Roman published his own book,
Dosage: Pedigree and Performance, in
2002. Romans formulations remain
popular with many breeders, and his
current iterations may be found at his
website (
It is fair enough to say the Vuillier
system itself has successfully with-
stood the longest and most rigorous
field test of any breeding system in
history. Aga Khan III was the most
successful and influential European
breeder of his era, and his grandson,
Karim Aga Khan IV, is not far off that
mark among his contemporaries, as
his magnificent filly Zarkava demon-
strated in the 2008 Qatar Prix de lArc
de Triomphe (Fr-G1).
Other contemporary breeders,
however, cannot utilize the Vuillier
system since Aga Khan III privatized
it in the 1920s and the Aga Khan
Studs further research has not been
published. However, Vuilliers orig-
inal publications are in the public do-
main, and since his method was
simple mathematics, it is easily du-
plicated, and perhaps even improved.
Vuillier was lucky or smart enough
to correctly infer that individuals in
each generation provide, on average,
50% of the genetic material to the in-
dividual they produce in the next
In the 12th generation of any pedi-
gree, there are 4,096 potential an-
cestors. In Vuilliers system, each of
those positions is worth one point,
each position in the 11th generation
is worth two points, in the 10th, four,
and so on, so that in the first gener-
ation, each position is worth 2,048
points. Thus, in agreement with Gal-
tons law, each parent in each gen-
eration contributes 50% to its
So if, for example, a pedigree car-
ries two crosses of Northern Dancer
in the fourth and fifth generations,
Northern Dancers contribution to
the pedigree would be defined as 256
(4,096 divided by the 16 possible an-
cestors in the fourth generation) plus
128 (4,096 divided by the 32 possi-
ble ancestors in the fifth), or 384. The
further back you go in a pedigree,
the more duplications there are, and
individual appearances are added to-
gether according to generational
As a method to evaluate matings,
Vuilliers system posits that the best
matings are those in which the dosage
values in the prospective pedigree
are closest to the dosages in the pedi-
grees of the best horses. Essentially,
Vuilliers system quantifies what most
breeders attempt to do through less
scientific methods.
New approach
through research
For a modern breeder who wishes
to utilize a system based on Vuillier,
the problem lies in how to determine
which horses should be used as chefs-
de-racethe more evocative term ar-
chetypes is preferredand how to
incorporate the insights of Varola
and Roman and improve the system.
To solve the first problem, the au-
thor worked with Simon Morris, pro-
prietor of Australian-based Syntax
Software. Morriss widely used pedi-
gree analysis program TesioPower
already included a function that
counted occurrences of all ancestors
in a pedigree back as far as 20 gen-
erations and noted the generation in
which each cross occurred. Thus, it
was an easy step for Morris to apply
the correct numerical values noted
above for each generation and add
them together for any given pedi-
With that software tool in hand,
the obvious next step was to choose
a set of races. Focusing purely on
American racing, we chose all Breed-
ers Cup World Championships
races, the Triple Crown races, plus
the Hopeful Stakes (G1), Metropol-
itan Handicap (G1), Kentucky Oaks
(G1), Coaching Club American Oaks
(G1), and Alabama Stakes (G1) since
1984. The goal in choosing races
was to cover the full spectrum of
American racing without initially cre-
ating an overly ambitious research
The resulting sample of winners
thus included 318 horses (the re-
search occurred before the 2008
Breeders Cup but after all other races
included in the list). We ran each pedi-
gree through the TesioPower pro-
gram and recorded the dosage
numbers for 61 different ancestors
derived from each pedigree in a mas-
sive spreadsheet. Using Vuilliers final
St. Simon series as a starting point,
the 61 horses were chosen by obser-
vation of which ancestors consistently
generated the highest numbers, plus
certain other ancestors who were of
interest for reasons that varied from
historical to sentimental.
Three of Vuilliers final series, St.
Simon, Hermit, and Bend Or, were
included as a test of the validity of
Vuilliers original numbers. As can
be seen in Table 2, which summa-
rizes the data in various slices, the
average dosage numbers for those
three horses are consistently lower
than Vuilliers numbers.
In fact, that was the expected re-
sult. Those three horses and the
other members of Vuilliers final se-
ries are so far removed in time from
contemporary Thoroughbreds that
they are rapidly disappearing be-
yond the 12-generation horizon of
the research.
For example, in the 12-generation
pedigree of 2007 Alabama winner
Lady Joanne, the name of St. Simon
occurs 157 times at various genera-
tional distances, adding up to only
264 points (still, by the way, the high-
est concentration of any ancestor
whose name appears beyond the
fourth generation in her pedigree).
If one extends the pedigree four
more generations, however, one finds
no fewer than 195 more occurrences
of St. Simon, bringing the total St.
Simon dosage for Lady Joanne to
As Table 2 shows, average dosage
for St. Simon in the 318 pedigrees in
our sample is 347. As with the indi-
vidual number for Lady Joanne, that
is substantially lower than Vuilliers
number. The 12-generation horizon
partially explains that difference, but
it also is true that St. Simon was
nowhere near as prominent in Amer-
A new understanding of dosage
Additional research delineates the current influence of important sires on the American Thoroughbred
Zarkava, winner of 2008 Prix de lArc de Triomphe and recognized as Europes Horse of the Year at the
Cartier Racing Awards, is a product of the Vuillier dosage system used by the Aga Khans breeding program


1902 series
Eclipse Herod Highflyer
568 750 543
Waxy Orville Buzzard Blacklock Tramp Walton Sorcerer Cato
258 252 208 167 114 111 106 64
Birdcatcher Touchstone Pocahontas Voltaire Pantaloon
288 351 313 186 140
Melbourne Bay Middleton Gladiator
184 127 93
Stockwell Newminster
340 295
1925 series
St. Simon Galopin Isonomy Hampton Hermit Bend Or
420 405 280 260 235 210
Table 1
Vuillier series of influential horses dosages
ican pedigrees in the early 20th cen-
tury as he was in Europe, and, since
the pedigrees in the current sample
are largely American, his influence
is bound to remain lower than it would
be for a purely European sample.
Despite these factors and despite
the fact he was born as long ago as
1881, St. Simon remains the most in-
fluential factor in contemporar y
American pedigrees. It should be
less surprising for Thoroughbred
aficionados that Northern Dancer
is indeed the most significant 20th-
century progenitor of the Thorough-
bred, with an average number of 296,
followed by his grandsire, Nearco
(279), his contemporary sire-line
rival Mr. Prospector (256), and
Nearcos son *Nasrullah (243).
The numbers in column one of
Table 2 reflect the pedigrees of all
318 horses, winners of many of the
most important races in North Amer-
ica over a wide variety of distances
and on both turf and dirt. As Varola
first noted, however, the modern
Thoroughbred in a sense split into
different tribes specializing in differ-
ent aptitudes. Since Varolas day, that
specialization has shifted somewhat
from a primarily distance-based spe-
cializationespecially as races over
two miles or more have essentially
disappeared from the calendarto
specializations based more on sur-
face and precocity. Thus the ques-
tion for modern pedigree research
becomes: Can a system of pedigree
analysis distinguish between pedi-
grees of horses whose talents reflect
those new specialties?
The columns in Table 2 titled 2yo
races, turf races, foreign pedi-
grees, and American dirt classic
distance answer that question in the
affirmative, though necessarily for
smaller samples. Those four columns
divide the original sample into smaller
sub-samples based on the type of
races won.
The 2yo races column, for exam-
ple, includes winners of the Breed-
ers Cup Juvenile (G1), Breeders
Cup Juvenile Fillies (G1), Hopeful
Stakes, and Breeders Cup Juvenile
Turf. As the numbers in boldface in
that column note, certain ancestors,
notably Bold Ruler, Mr. Prospector,
Native Dancer, and *Nasrullah, exert
significantly higher influence on the
average pedigree of the more preco-
cious two-year-old race winners than
on the overall average. Given the his-
torical profiles of those sires, this
should not be a surprise.
Similarly, it should surprise no one
that Northern Dancers name is far
more prominent in the pedigrees of
winners of major turf races and in
the smaller subset of winners of races
on any surface that were either im-
ported or the offspring of two im-
ported parents (in other words, a
wholly foreign pedigree) as shown
in the turf races and foreign pedi-
grees columns.
Conversely, the name of Seattle
Slew does not appear in any of the
pedigrees of winners of turf races or
foreign pedigrees, but his influence
on winners at the American dirt clas-
sic distance of 1
4 miles and up is
highly significant. Equally interest-
ing, the numbers for Americas great-
est native stamina line, which extends
from Fair Play to Man o War to War
Admiral, rise markedly for winners
at dirt classic distances.
Finally, one problem in determin-
ing the proper dosages in any sys-
tem is that the composition of
pedigrees can change rapidly as new
forces, particularly exceptional sires
of sires, spread their influence
through the gene pool. This factor
has been greatly exacerbated by the
convergence of huge modern mare
books and the shuttle-sire concept,
each of which has worked to reduce
the number of stallions at stud and
concentrate only on fashionable
The final column in Table 2, headed
last ten years, addresses that prob-
lem. That column includes the re-
sults from the winners of all races in
our sample from 1998 through 2007
a much more contemporary group
of pedigrees. The numbers in that
column that are significantly higher
than in the first column that includes
all racesparticularly those in gold
boldface typedesignate the horses
whose influence on the breed is still
Most notable among those names
is Mr. Prospector, whose number
for the last ten years is almost 50%
higher than his overall number and
higher than any other ancestor. Al-
though there is no way of knowing
exactly where Mr. Prospectors
dosage will settle out from a histor-
ical viewpoint, it is likely that the
number will be higher than the cur-
rent overall numberat least for
American dirt pedigrees. Similarly,
the influences of Hail to Reason and
Seattle Slew are still on the rise.
Obviously more research with
larger samples of horses would be
useful in refining the initial data de-
tailed in Table 2. Equally obviously,
other categories, such as synthetic
surfaces and sprints, will be of great
interest to breeders in the coming
years. European breeders would
benefit from data based on their own
races, as would breeders in Aus-
tralia, New Zealand, and South Amer-
That is one of the beauties of the
software now available. Breeders and
pedigree researchers can construct
their own lists, compile their own
data, and reach their own private con-
John P. Sparkman is
bloodstock editor
His e-mail address is More
of his work can be found at
This article appeared in the December 13, 2008, issue of THOROUGHBRED TIMES. To subscribe, call 1-888-499-9090.

American Last
All 2yo Turf Foreign dirt classic ten
Ancestor races races races pedigrees distance years
Almahmoud 87 80 134 118 74 95
Bay Ronald 114 113 120 129 113 113
Ben Brush 81 84 58 44 89 80
Bend Or 171 169 159 151 176 150
Black Toney 76 72 66 56 85 74
Blandford 121 118 124 140 124 125
*Blenheim II 157 165 152 144 158 159
Blue Larkspur 100 106 86 60 106 105
Bold Ruler 157 200 82 34 128 126
Broomstick 60 63 43 32 66 65
Buckpasser 108 97 78 69 152 112
*Bull Dog 114 122 90 62 118 122
Canterbury Pilgrim 128 123 142 159 125 127
Chaucer 130 119 149 169 123 128
Commando 73 76 55 41 83 72
Count Fleet 74 84 66 61 78 77
Discovery 105 115 75 50 103 107
Djebel 38 38 44 69 26 36
Domino 81 87 57 40 89 78
Fair Play 112 118 85 62 125 112
Fair Trial 37 28 65 86 27 34
Fairway 42 40 56 90 29 38
Gainsborough 139 125 173 192 134 134
Hail to Reason 94 96 111 82 95 123
Hermit 151 143 149 152 149 112
Hyperion 150 130 188 217 145 139
*La Troienne 46 49 29 27 57 47
*Mahmoud 118 114 103 90 136 120
Man o War 116 112 89 68 139 113
Mr. Prospector 256 333 152 59 290 382
Mumtaz Begum 139 155 126 124 126 141
Mumtaz Mahal 107 114 95 90 103 109
*Nasrullah 243 275 220 219 220 243
Native Dancer 197 228 188 149 205 218
Nearco 279 272 307 306 239 267
Northern Dancer 296 271 469 407 248 299
Peter Pan 69 74 59 52 77 69
Phalaris 178 166 196 215 170 181
*Pharamond II 52 49 47 35 61 58
Pharos 184 172 206 215 171 187
Plucky Liege 118 123 102 81 123 123
*Princequillo 151 158 166 105 148 142
Raise a Native 190 227 87 44 224 235
*Ribot 75 40 68 43 94 69
Roughn Tumble 41 34 9 0 54 52
*Rough Shod II 13 1 37 30 17 15
Seattle Slew 120 117 0 0 124 184
Secretariat 87 115 36 26 66 102
Selene 128 118 137 142 131 125
*Sir Gallahad III 85 89 64 47 94 81
Somethingroyal 67 76 55 48 55 73
Spearmint 130 134 126 110 130 132
St. Simon 347 333 370 392 342 329
Sundridge 117 117 112 110 117 117
Swynford 119 115 131 145 117 119
*Teddy 190 201 163 149 198 198
The Tetrarch 90 97 87 92 86 89
Tourbillon 40 37 44 66 31 40
*Turn-to 102 93 98 75 104 114
Ultimus 53 55 39 25 59 53
War Admiral 85 79 61 46 106 80
Numbers in boldface type indicate sires that exert significantly higher influence on the
average pedigree in a category; gold boldface in last column designates horses whose
influence is still rising.
Table 2
Dosages for sub-groups