You are on page 1of 21

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/262566057

Comparing Chess Openings Part 3: Queen's Pawn Openings

Article in SSRN Electronic Journal · May 2014


DOI: 10.2139/ssrn.2441568

CITATIONS READS
5 5,474

1 author:

Jamal Munshi
Sonoma State University
118 PUBLICATIONS 455 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

carbon cap and trade markets View project

All content following this page was uploaded by Jamal Munshi on 31 January 2018.

The user has requested enhancement of the downloaded file.


COMPARING CHESS OPENINGS PART 3: QUEEN'S PAWN OPENINGS

JAMAL MUNSHI

ABSTRACT: A dual engine experimental design for comparing chess openings was described in a previous paper (Munshi,
Comparing Chess Openings, 2014). It is used in this paper to study ten chess openings that are initiated with the queen's pawn
move 1. d4. One of the openings is identified as the mainline and the other nine as variations from the mainline. Five of the
variations are found to be benign innovations and the other four are deemed to be failed innovations. The findings are mostly
consistent with expert opinion. The primary purpose of this paper, however, is not these specific findings but rather the further
development and verification of an objective and quantitative methodology for the evaluation of chess openings in general1.

1. INTRODUCTION

This paper is the third of a series in a study undertaken to develop a generally applicable methodology for
the objective evaluation of chess openings. The proposed methodology uses controlled experiments with
chess engines to compare chess openings. The first paper in this series (Munshi, A Method for Comparing
Chess Openings, 2014) presented a single engine experimental design (SED) to compare ten openings
that are initiated with the King's pawn move 1. e4. It demonstrated that the proposed methodology is able
to discriminate between known strong openings and known weak openings. The advantage of the SED is
that it removes the difference in playing strength from the experiment and isolates the effect of the
opening; but the disadvantage is that the same engine playing both sides of the board may introduce an
engine bias in the data by not playing a sufficiently diverse set of opening variations.

Subsequently, a dual engine design (DED) was proposed to address the issue of engine bias (Munshi,
Comparing Chess Openings, 2014). The second paper showed that there may have been a propensity for
engine bias in the SED and that the engine bias problem is mitigated by the DED which forces the
engines to play a greater number of variations. This paper describes a further test of the DED using a new
set of openings.

The motivation for this study is that conventional methods of evaluating chess openings are inadequate.
Grandmaster opinions are subjective and inconsistent, while the win-loss-draw statistics in opening book
databases are field data that were not taken under controlled conditions and are therefore confounded by
intervening variables that have a greater effect on game outcomes than the opening (Munshi, A Method
for Comparing Chess Openings, 2014). As a result, there are conflicting opinions on the merit of the
different lines in the opening book and these opinions have engendered ongoing debates that have no
satisfactory conclusion. It is proposed that an objective method for evaluating openings will settle these
issues and help to refine the opening book.

1
Date: May 2014
Key words and phrases: chess openings, chess engines, refutation, methodology, Monte Carlo simulation,
numerical methods, probability vector, Euclidean distance, robust statistics, bootstrap
Author affiliation: Professor Emeritus, Sonoma State University, Rohnert Park, CA, 94928, munshi@sonoma.edu
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 2

2. THEORY

Chess games may be thought of as a stochastic trinomial process driven by an unknown and unobservable
underlying probability vector given by

Equation 1 π = [pw, pb, pd]

where π is a vector with two degrees of freedom, pw is the probability that white will win, pb is the
probability that black will win, and pd = 1-pw-pb is the probability that the game will end in draw.

The components of the probability vector π are determined by (1) white's first move advantage or FMA,
(2) the general rate of imperfection in the moves or IMP, (3) the difference in playing strength between
the player making white moves and the player making black moves, or DIFF, and (4) the opening
employed (Munshi, A Method for Comparing Chess Openings, 2014). The value of FMA is not known
but we know that it is a universal constant and we suspect that its effect is relatively small. Experiments
designed to measure the effect of the opening must therefore control the values of IMP and DIFF so that
the opening effect can be observed. Our hypothesis is that the choice of opening line played can change π
and that therefore chess engine experiments under these controlled conditions may be used to detect the
effect of openings on the probability vector π.

In comparing two openings, opening-1 and opening-2, our research question and hypotheses are set up as
follows:

1. Research question: Is π1=π2?


2. Null hypothesis: Ho: π1=π2
3. Alternate hypothesis Ha: π1≠π2

A testable implication of this hypothesis is that if the true (and unknown) population mean results are
plotted in Cartesian coordinates with x=number of wins by white and y=the number of wins by black, and
the Euclidean distance between opening-1 and opening -2 is computed and designated as δ, then we may
write the testable hypotheses as:

Ho: δ=0
Ha: δ≠0

If we fail to reject Ho in this test, we immediately reach the conclusion that the evidence does not show
that the probability vector π is changed by using opening-2 instead of opening-1. If we reject Ho,
however, we know that the probability vector changed but we still don't know the direction of the change.
Further tests are necessary to determine whether the change favors white, whether it favors black, or
whether the change is in a neutral direction and favors neither black nor white.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 3

3. METHODOLOGY

3.1 Baseline and test openings. Well recognized and established opening book databases (Meyer-
Kahlen, 2000) (Jones & Powell, 2014) are used to select the first three moves (first six half moves) from
ten openings that begin with the queen's pawn move 1. d4. The opening sequence in this category most
used by grandmasters is the Queen's Indian Defense (Meyer-Kahlen, 2000)2 and it is identified as the
baseline opening and the other nine opening sequences selected are likewise described as test openings, or
innovations. The proposed methodology for comparing chess openings is then used to compare each of
the test openings with the baseline opening. The ten opening sequences selected for this study are shown
in Table 1.

The rarity data shown in the table refer to the frequency of the baseline relative to the test opening
according to the opening database used for this purpose3 (Meyer-Kahlen, Opening Database, 2000). The
nine test openings shown were selected to include a large rarity range and they are listed in the table from
the most frequently played to the least. The first four test openings listed may be considered to be
commonly used. The next two are not very common, and the last three are rarely played. The test
openings selected are expected to represent a wide spectrum of possibilities in the Queen's pawn game.

ID Name Fixed 3-move sequence Rarity Innovator


E12QID Queen's Indian Defense d4Nf6 c4e6 Nf3b6 - -
E43NID Nimzo Indian Defense d4Nf6 c4e6 Nc3Bb4 1.0 White
D37QGD Queen's Gambit Declined d4Nf6 c4e6 Nf3d5 1.0 Black
E61KID King's Indian Defense d4Nf6 c4g6 Nc3Bg7 1.7 Black
E11BID Bogo Indian Defense d4Nf6 c4e6 Nf3Bb4+ 2.6 Black
E01CAT Catalan Opening d4Nf6 c4e6 g3d5 7.1 White
A81DUD Dutch Defense d4f5 g3Nf6 Bg2g6 8.1 Black
A52BUG Budapest Gambit d4Nf6 c4e5 dxe5Ng4 23.3 Black
A45TVA Trompovsky Attack d4Nf6 Bg5Ne4 Bf4c5 25.2 White
A83DSG Staunton Gambit d4f5 e4fxe4 Nc3Nf6 108.2 White
Table 1 Baseline and test openings

3.2 Dual engine experimental design. The dual engine design (DED) described in a previous paper
(Munshi, Comparing Chess Openings, 2014) is used to compare each test opening with the baseline using
chess engine experiments. Each experiment consists of 300 games played between two chess engines. The
engines selected are Houdini3Pro and Houdini4Pro (Houdart, 2013), generally regarded as the leaders in
this kind of chess software (Wikipedia, 2014). All engine parameters are set to their default values. In
each experiment, each of the two engines plays 150 games as white and 150 games as black. Every game
of each experiment begins with the six half moves being evaluated. These move sequences are shown in
Table 1. Engine calculations begin with the fourth move by white. The engine moves may cause
transpositions of the opening into different opening designations than that by which it is identified in this
paper and these transpositions are noted in the Appendix.

2
The identification of the "mainline" varies among databases. The selection of the baseline is therefore somewhat
arbitrary since any of the first three openings listed could have been used as the mainline.
3
The rarity values differ among opening databases. They should be taken only in a very approximate sense.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 4

The Deep Shredder chess GUI4 software (Meyer-Kahlen, Deep Shredder, 2007) is used to set up the
engine matches. The search depth is fixed and set to 21 half moves for both engines, a level at which the
engines are expected to play at the grandmaster level or better (Ferreira, 2013). The very high level of
play is evident in the relatively low percentage of decisive games and a low estimated value of IMP. For
example, in the baseline case, 12% of the games were decisive with the remaining 88% ending in draw.
The estimated value of IMP, the rate of imperfection in the moves, is 2% as measured by the number of
wins by black5. Also, a comparison of the playing strength of the two engines6 under the controlled
experimental conditions of this study shows no evidence of a significant difference in playing strength
(DIFF). These statistics are indicative of a very high level of play in which the effect of the opening is
unlikely to be overcome by move imperfections (IMP) or by the difference in playing strength (DIFF).

The relevant data recorded for each experiment are shown below. The opening variability is a count of the
number of unique moves made by the engines from the fourth to the tenth move. It serves as a measure of
the number of variations computed by the engines during the opening phase of the game (Munshi,
Comparing Chess Openings, 2014).

1. White the number of games won by white


2. Black the number of games won by black
3. OV the opening variability
4. Transpositions whether the engine moves changed the opening designation

3.3 Comparing test openings against the baseline. We assume that game outcomes in the baseline
opening are driven by an underlying, unknown, and unobservable probability vector π and if the opening
innovation7 to be evaluated changes the vector π we will be able to observe the effect of this change in the
data. We then use the data to classify each test opening into one of three categories:

Category A: Successful innovation.


The probability vector has been changed in favor of the innovator.
Category C: Benign innovation.
The probability vector is either unchanged or it was changed in a neutral direction.
Category F: Failed innovation.
The probability vector has been changed in favor of the opponent.

The test is carried out in stages. First we test to see if the Euclidean distance between the baseline opening
result and the test opening result in the population from which our sample was taken is greater than zero.
The hypotheses for this test are:

Ho: δ=0
Ha: δ≠0

4
Graphical User Interface
5
See (Munshi, A Method for Comparing Chess Openings, 2014) for a detailed explanation.
6
The comparison is shown in the Appendix.
7
The terms "test opening" and "opening innovation" are used interchangeably. It is assumed that any opening
sequence that differs from the mainline is an innovation.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 5

We set the probability value for our level of disbelief at α=0.001 as suggested by Valen Johnson who has
studied the relationship between the α level and the irreproducibility of results and found that the higher
values of α such as α=0.05 or α=0.01 normally used can lead to spurious findings (Johnson, 2013). If the
probability of observing a sample distance as large or larger8 than the one being tested (i.e. the p-value) is
greater than α we fail to reject Ho and conclude that it is possible that the observed distance is a result of
sampling variation in a sample of 300 games taken from an unobservable population in which δ=0. In
these cases we can immediately classify the test opening into Category C as a benign innovation because
we have no evidence that the probability vector has been changed by the test opening.

However, if the p-value is less than α, we know that δ≠0 and conclude that the test opening has changed
the probability vector but we are unable to classify the test opening until we determine the direction of the
change. If the direction is well within the first or third quadrant, it is possible that the change is in a
neutral direction and therefore we can classify the opening as Category C, benign innovation. This finding
implies that the effect of the opening was only a change in the probability of decisive games with p w and
pb changing proportionately with neither color gaining an advantage due to the opening innovation.

If the direction is in the second or fourth quadrant, then the relative values of pw and pb have changed and
one color has gained an advantage over another. In this case we can classify the opening as either
Category A or Category F according to whether the change favors the innovator or the opponent. The
possibilities are shown in Table 2.

Innovator Quadrant 1 Quadrant 2 Quadrant 3 Quadrant 4


White C F C A
Black C A C F
Table 2 Classification according to innovator and direction (δ≠0)

3.4 Monte Carlo simulation. As in the previous papers we use a Monte Carlo numerical technique
to create a simulated sampling distribution from which we derive a measure of variance that we use in our
hypothesis test for distance (Munshi, A Method for Comparing Chess Openings, 2014) (Wikipedia,
2014). The sample data are used to estimate π=[pw,pb,pd] and these estimates are used to generate one
thousand simulated replications of the experiment. For each opening, we compute the squared Euclidean
distance of each simulated game from the mean9. Thus we have one thousand squared distances for each
opening. When comparing two openings we have two thousand squared distances from their respective
means. These squared distances are used to estimate what may be termed the "within treatment" variance
of distance10. This variance serves as a measure of how different the sample results can be when taking
samples of 300 games from the same population with a fixed value of π. This measure of variance can
then be used to compute the probability of observing distances "between treatments11" greater than or
equal to the observed distance if Ho is true and δ=0. This probability serves as the basis of our hypothesis
test.

8
Since the distance is computed as a square root it can be either positive or negative and therefore this is a two
tailed test. The reference to the magnitude of the distance as being "large or larger" refers to its absolute value.
9
By definition, the mean is represented by the sample data that were used to estimate π.
10
In Fisher's terminology each opening is a treatment
11
Between a test opening and the baseline opening.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 6

4.0 DATA ANALYSIS

The raw data from ten experiments of 300 games each are shown in Table 3. The essential data are the
number of games won by white (White), the number of games won by black (Black), and the opening
variability (OV). The OV data serve as a measure of the number of different variations played by the
engines in the "opening phase12" of the game after the first three moves specified and fixed for each
experiment are exhausted. These variations often cause the ECO designation13 to change. All such
transpositions are listed in the Appendix along with references to high profile grandmaster games14 for
each ECO designation played by the engines. All three thousand games played are available in PGN
format in the online data archive for this paper (Munshi, PGN Files, 2014).

ID Name OV White Black Decisive Draw Pct Draw


E12QID Queen's Indian Defense 999 30 6 36 264 88%
E43NID Nimzo Indian Defense 692 30 7 37 263 88%
D37QGD Queen's Gambit Declined 821 31 3 34 266 89%
E61KID King's Indian Defense 560 68 3 71 229 76%
E11BID Bogo Indian Defense 940 29 10 39 261 87%
E01CAT Catalan Opening 1002 26 5 31 269 90%
A81DUD Dutch Defense 914 48 3 51 249 83%
A52BUG Budapest Gambit 403 63 1 64 236 79%
A45TVA Trompovsky Attack 603 7 7 14 286 95%
A83DSG Staunton Gambit 419 3 20 23 277 92%
Table 3 Observed sample data

4.1 Hypothesis test for distance. We can now use the sample data15 to compute the Euclidean
distance of each test opening from the baseline opening. The distance may be visualized in Cartesian
coordinates where the x-axis represents the number of white wins and the y-axis represents the number of
black wins in any given simple random sample of 300 games. Each point in this x-y space represents a
sample in our study. In the population of all possible games each point in this space represents a unique
chess game and its probability vector. The data in Table 3 are shown in this format in Figure 1.

12
Arbitrarily assumed to constitute the first ten moves of the game
13
Encyclopedia of Chess Openings
14
Tournaments from which these games are taken include the London Classic, World Championship Candidates,
Moscow Open, Tata Steel, Tal Memorial, Chigorin Memorial, and the Geneva Chess Masters
15
Each experiment of 300 games is considered to be a simple random sample of 300 games taken from a
population of an infinite number of games in which all games are driven by the same unobservable probability
vector.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 7

25
Number of games won by black

20 Staunton

15

10 Bogo Indian

Trompovsky Nimzo Indian


Queens Indian
5 Catalan
Queens Gambit Dutch Kings Indian
Budapest
0
0 10 20 30 40 50 60 70 80
Number of games won by white

Figure 1 Sample data in Cartesian coordinates

The Queen's Indian Defense is our baseline opening to which the other openings will be compared. So
what we are interested in is the distance and direction of each of the test openings from the Queen's
Indian. These distances and their directions are visualized more easily of we move the axis of the plot to
the Queen's Indian and set it as the (0,0) point. This visualization of the distance vectors in our study is
shown in Figure 2. An example distance computation is shown in Table 4. All the observed distances are
tabulated in Table 5 along with the estimated standard deviation and the hypothesis tests for distance.

White Black
Dutch Defense 48 3
Queens Indian 30 6
Difference 18 -3
Squared difference 324 9
Squared distance=sum of squared differences 333
Euclidean distance=square root of squared
distance 18.25
Table 4 Example distance computation
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 8

20

Staunton

10

Bogo Indian

Trompovsky Nimzo Indian


0
-50 -25 Catalan
0 25 50
Kings Indian
Queens Gambit Dutch Budapest

-10
Figure 2 Visualization of the distance and direction of the test openings from the baseline

ID Name distance stdev16 t-value p-value Result


E43NID Nimzo Indian Defense 1.00 5.85 0.171 8.6E-01 --
D37QGD Queen's Gambit Declined 3.16 5.68 0.557 5.8E-01 --
E61KID King's Indian Defense 38.12 6.64 5.737 1.1E-08 Reject Ho
E11BID Bogo Indian Defense 4.12 5.88 0.701 4.8E-01 --
E01CAT Catalan Opening 4.12 5.61 0.734 4.6E-01 --
A81DUD Dutch Defense 18.25 6.17 2.956 3.2E-03 --
A52BUG Budapest Gambit 33.38 6.52 5.120 3.3E-07 Reject Ho
A45TVA Trompovsky Attack 23.02 4.89 4.712 2.6E-06 Reject Ho
A83DSG Staunton Gambit 30.41 5.34 5.692 1.4E-08 Reject Ho
17
Table 5 Hypothesis test for distance

The hypothesis tests in Table 5 show that the observed distances of five of the test openings from the
baseline opening are small enough to have been the result of sampling variation. In these cases we do not
reject the null hypothesis Ho that δ=0 and conclude that the evidence does not show that the opening
innovation has changed the probability vector. These test openings are therefore classified as Category C,
benign innovation. The data are consistent with the hypothesis that the probability vector that generates
game outcomes in these test openings is not different from that which generates game outcomes in the
baseline Queen's Indian Defense, that is, π(test opening) = π(baseline opening).

16
The term stdev refers to the standard deviation of distance and its value is estimated by using a Monte Carlo
simulation procedure. The computational details are available in the online data archive for this paper (Munshi,
Numerican analysis, 2014).
17
The comparison of each test opening with the baseline opening is shown graphically in the Appendix.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 9

In the remaining four test openings, marked in Table 5 as "Reject Ho", we find that the observed sample
distance is too large to be explained by sampling variation alone. In these cases we reject the Ho
hypothesis and conclude that δ≠0 and that therefore the test opening innovation has changed the
probability vector so that π(test opening) ≠π(baseline opening). To classify these openings we must
examine the direction of the change to determine whether the change in π favors the innovator, or whether
it favors the opponent, or whether the change is in a neutral direction and does not favor either party. The
direction information for these four test openings are shown in Table 6.

ID Name distance angle quadrant favors innovator Category


E61KID King's Indian Defense 23.36 356 4 white black F
A52BUG Budapest Gambit 23.50 352 4 white black F
A45TVA Trompovsky Attack 25.15 178 2 black white F
A83DSG Staunton Gambit 24.66 161 2 black white F
Table 6 Classification of distant test openings according to direction

What we see in Table 6 is that none of the test openings that changed the π vector gained from the change
and also that none of these changes are in a neutral direction. All of these innovations are detrimental to
the innovator and therefore all of them are classified as Category F, failed innovation. The information in
Table 6 is presented visually in Figure 2 where one can see clearly that the Category F innovations by
white decreased white's chance of winning or increased black's chance of winning or that they did both.
Likewise Figure 2 also shows that the Category F innovations by black decreased black's chance of
winning or increased white's chance of winning or that they did both.

We now summarize our findings in Table 7. The table shows our final classification of all the test
openings in view of the data we collected from our controlled engine experiments and their analysis as
presented above. As noted in the table, none of the openings tested was a successful innovation and none
of the changes in the probability vector occurred in a neutral direction.

ID Name Category Reason for the classification


E43NID Nimzo Indian Defense C Observed distance explained by sampling variation
D37QGD Queen's Gambit Declined C Observed distance explained by sampling variation
E61KID King's Indian Defense F Probability vector changed in favor of the opponent
E11BID Bogo Indian Defense C Observed distance explained by sampling variation
E01CAT Catalan Opening C Observed distance explained by sampling variation
A81DUD Dutch Defense C Observed distance explained by sampling variation
A52BUG Budapest Gambit F Probability vector changed in favor of the opponent
A45TVA Trompovsky Attack F Probability vector was changed in favor of the opponent
A83DSG Staunton Gambit F Probability vector changed in favor of the opponent
NONE ---- A Probability vector changed in favor of the innovator
NONE ---- C Probability vector changed in a neutral direction
Table 7 Summary of findings
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 10

5. CONCLUSIONS

Engine experiments carried out under controlled conditions show that the Queen's Indian Defense, chosen
as the baseline opening in this study, may be considered to be neutral and perfect in this category of
openings because none of the nine innovations tested offered any advantage to the innovator18. Of the
nine test openings, five were found to be benign innovations as they had no measurable effect on the
underlying and unobservable probability vector that determines chess game outcomes under the baseline
conditions. The observed distances of these openings from the baseline may be explained in terms of
sampling variation19.

Two members of this group, the Dutch Defense, an innovation by black, and the Catalan Opening, an
innovation by white, are noteworthy in that our findings are inconsistent with their rarity of play and
supportive of their positive evaluation by analysts who have studied these openings (Bologan, 2012)
(Harding, 2010) (Kelley, 2005) (Kelley, Catalan, 2008). The other three members of this group, the
Nimzo Indian Defense, Queen's Gambit Declined, and the Bogo Indian Defense are universally
considered to be strong openings (Dearin, 2005) (Sielecki, 2014) comparable with the Queen's Indian
Defense and our findings along with their popularity in the opening book support this view.

In four of the test openings, the evidence indicates that the opening innovation changed the probability
vector. This means that the probability vector that generates game outcomes in these openings is not the
same as that which generates game outcomes in the baseline Queen's Indian Defense. In all four cases the
change in the probability vector goes against the innovator and so they are classified as failed innovations.

The most significant member of this group is the King's Indian Defense, an opening that is at once a
popular line in the opening book database (Jones & Powell, 2014) and also viewed in a positive light by
many analysts (Kelley, Kings Indian, 2008) (Gserper, 2010) (Golubov, 2006). Our experiment shows that
it is a failed innovation by black. Some analysts agree (Semkov, 2009) (Hansen, 2009). The other three
failed innovations are less controversial because our findings are consistent with general opinion and the
opening book. The Staunton Gambit, the Trompovsky Attack, and the Budapest Gambit are played very
rarely at high level games according to the opening books (Jones & Powell, 2014) and analysts generally
tend to project a negative opinion on these innovations (Dzindzichashvili, 2009) (Schiller, 1993) (Prie,
2009).

The motivation for this study is not so much to pass judgment on specific opening lines but rather to
develop and refine an objective methodology for the evaluation of chess openings in general. We
recognize that at a sufficiently high move imperfection rate (IMP) chess games would be decided mostly
by move errors and in those games the disadvantage of the failed opening innovations noted in this study
may not become apparent20. Yet the relative merit of opening lines is of great interest to the chess
community and its evaluation has a practical application for designers of opening books.

18
A comparison of the Queen's Indian Defense with the Sicilian mainline in the Appendix further supports its
selection as a neutral and perfect baseline against which other queen's pawn openings may be compared.
19
The term "sampling variation" refers to the difference among samples taken from the same population.
20
The move imperfection rate in the baseline opening as measured by the percentage win by black is 13% in
grandmaster games (Jones & Powell, 2014) compared with 2% in our engine experiments.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 11

6. APPENDIX

6.1 Comparison of the playing strength of the engines. One of the test opening experiments had a
result very similar to that of the baseline opening. To compare engine strengths at the same sample size
used for comparing openings (n=300), we combine these two experiments into a large sample of 600
games as shown below.

Opening Games played White won Black won


E12 Queens Indian Defense 300 30 6
E43 Nimzo Indian Defense 300 30 7
Combined E12 + E43 600 60 13

Of the 600 games, each engine played 300 games as white and 300 as black. We now count for each
engine the number of games won as white and the number won as black. These data allows us to set up
the comparison of the engines as follows.

Engine Games played as white Won as White Games played as black Won as Black
Houdini3Pro 300 33 300 4
Houdini4Pro 300 27 300 9

These results, plotted in Cartesian coordinates show a Euclidean distance between them of 7.8102. As in
the evaluation of openings, we create a simulated sampling distribution and estimate the variation of the
distance that we can expect from one sample to the next. We can now set up a hypothesis test for distance
as follows:

Ho: δ=0
Ha: δ≠0

The data are tabulated below and shown graphically in Figure 3..

Observed distance 7.8102


Standard deviation of the sampling distribution of distances 5.8151
The value of the t-statistic for Ho: δ=0 1.3431
Probability that we would observe a t-value this large or larger if δ=0 0.1794
Probability value that serves as our threshold of disbelief 0.001

The test shows that the distance observed is one we would to observe as sampling error even when the
true value of the distance is zero. Therefore we fail to reject the Ho statement that δ=0 and conclude that
the evidence does not show that there is a difference in playing strength between the two engines under
the experimental conditions used in this study.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 12

Houdini4Pro -vs- Houdini3Pro


25

20
Won as Black

15

10

0
0 10 20 30 40 50 60
Won as White
Figure 3 Comparison of engine playing strength

6.2 Comparison of two baseline openings. A baseline opening used in a previous study (Munshi,
Comparing Chess Openings, 2014) and the one used in this study are compared. The sample data from the
DED experiments are as follows21:

Opening Games played White won Black won


E12 Queens Indian Defense 300 30 6
B53 Sicilian Defense 300 19 3

The Euclidean distance between these results in Cartesian coordinates is 11.402 and the Queen's Indian
Defense projects at an angle of 14 degrees from the Sicilian Defense. The standard deviation of distance
is estimated using Monte Carlo simulation to be 5.3208. Using the t-distribution we find that the
probability of observing a sample distance of 11.402 or greater under these conditions is 0.0322, much
larger than our threshold of α=0.001. We fail to reject Ho that the true distance δ=0 and that the two
probability vectors are the same. In any event, even if the observed distance were larger and we had
rejected Ho, we would have to consider that the angle lies in the first quadrant and that therefore the large
distance would only indicate a difference in the probability of decisive games and not necessarily a
relative advantage to either white or black. The comparison is shown graphically in Figure 4.

21
The three-move sequences used in the test are: B53 Sicilian Defense = e4c5 Nf3d6 d4cxd4 and E12 Queen's
Indian Defense = d4Nf6 c4e6 Nf3b6.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 13

B53 Sicilian -vs- E12 QID


Number of games won by 15

10
Black

0
0 10 20 30 40 50 60
Number of games won by White
22
Figure 4 E12 Queen's Indian Defense compared with B53 Sicilian Defense

6.3 Transpositions. The ECO codes used to identify the baseline and test openings apply to the first
three moves that were fixed for each experiment. Engine calculations began in the fourth move and the
engine moves often caused transpositions to different ECO designations. All such transpositions are noted
in Table 8 along with references to recent grandmaster games.

ID Transpositions Grandmaster games


E12QID E12 Queen's Indian/Petrosian Variation (Carlsen-Karjakin, 2012)
E14 Queen's Indian/ Classical Variation (Kramnik-Pelletier, 2013)
E16 Queen's Indian/ Classical Variation (Gelfand-Gashimov, 2012)
E43 Nimzo Indian/Nimzowitch Variation
E43NID E43 Nimzo Indian/ Nimzowitsch Variation (Radjabov-Leitao, 2001)
E47 Nimzo Indian/ Mainline (Gelfand-Grischuk, 2012)
D37QGD D37 Queen's Gambit Declined (Topalov-Kramnik, 2014)
E61KID E70 Kings Indian Defense (Carlsen-Radjabov, 2014)
E71 Kings Indian Defense (Aronian-Radjabov, 2013)
E90 Kings Indian Defense (Aronian-Carlsen, 2013)
E91 Kings Indian Defense (Nakamura-Morozevich, 2013)
E11BID E11 Bogo Indian Defense (Caruana-Short, 2013)
E15 Queens Indian/Classical Variation (Radjabov-Karjakin, 2012)
E16 Queens Indian/Classical Variation (Bratteteig-Pleninger, 2012)
E01CAT E01 Catalan Opening (Kramnik-Leko, 2012)
E04 Catalan Opening (Andreikin-Mamedyarov, 2014)
E06 Catalan Opening (Caruana-Karjakin, 2012)
E11 Bogo Indian Defense
E16Queen's Indian/Classical Variation _______________________________
A81DUD A81 Dutch Defense (Cramling-Kosteniuk, 2000)
A87 Dutch/Leningrad Variation (Gupta-Carr, 2013)
A88 Dutch/Leningrad Variation (Aronian-Namakura, 2012)
A89 Dutch/Leningrad Variation (Phillips-Moloney, 2012)
A52BUG A52 Budapest Gambit (Gelfand-Rapport, 2014)
A45TVA A45 Trompovsky Attack (Rapport-Aronian, 2014)
Table 8 Transpositions

22
In all such graphs the baseline is shown in blue with square markers and the test opening is shown in red with
diamond shaped markers.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 14

6.4 Graphical depiction of Monte Carlo simulation results

E12 QID -vs- E43 NID


20
Number of games won by

15
Black

10

0
0 10 20 30 40 50 60
Number of games won by White

E12 QID -vs- D37 QGD


15
Number of games won by

10
Black

0
0 10 20 30 40 50 60
Number of games won by White

E12 QID -vs- E61 KID


15
Number of games won by

10
Black

0
0 20 40 60 80 100
Number of games won by White
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 15

E12 QID -vs- E11 BID


Number of games won by 20

15
Black

10

0
0 10 20 30 40 50 60
Number of games won by White

E12 QID -vs- E01 CAT


15
Number of games won by

10
Black

0
0 10 20 30 40 50 60
Number of games won by White

E12 QID -vs- A81 DUD


15
Number of games won by

10
Black

0
0 20 40 60 80
Number of games won by White
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 16

E12 QID -vs- A52 BUG


Number of games won by 15

10
Black

0
0 20 40 60 80 100
Number of games won by White

E12 QID -vs- A45 TVA


20
Number of games won by

15
Black

10

0
0 10 20 30 40 50 60
Number of games won by White

E12 QID -vs- A83 DSG


40
Number of games won by

30
Black

20

10

0
0 10 20 30 40 50 60
Number of games won by White
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 17

7. REFERENCES

Andreikin-Mamedyarov. (2014). World Chess Championship Candidates. Retrieved 2014, from


chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1751553

Aronian-Carlsen. (2013). Tata Steel. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1704703

Aronian-Namakura. (2012). Tal Memorial. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1669174

Aronian-Radjabov. (2013). World Championship Candidates. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1714072

Bologan, V. (2012). The Powerful Catalan. New in Chess.

Bratteteig-Pleninger. (2012). London Classic. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1700479

Carlsen-Karjakin. (2012). Tata Steel. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1654399

Carlsen-Radjabov. (2014). Gashimov Memorial. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1753385

Caruana-Karjakin. (2012). Tata Steel. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1744284

Caruana-Short. (2013). London Chess Classic. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1741071

Cramling-Kosteniuk. (2000). WCC. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1258901

Dearin, E. (2005). Play the Nimzo Indian. Everyman Chess.

Dzindzichashvili, R. (2009). Budapest gambit. Retrieved 2014, from Youtube:


http://www.youtube.com/watch?v=-ShFKpTkL9Q

Ferreira, D. (2013). The impact of search depth on chess playing strength. Retrieved 2014, from Instituto
Superior Tecnico: http://web.ist.utl.pt/diogo.ferreira/papers/ferreira13impact.pdf

Gelfand-Gashimov. (2012). Tata Steel. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1654442
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 18

Gelfand-Grischuk. (2012). World Rapid Championship. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1671296

Gelfand-Rapport. (2014). Tata Steel. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1744267

Golubov, M. (2006). Understanding the King's Gambit. Gambit Publications.

Gserper, G. (2010). King's Indian Defense. Retrieved 2014, from chess.com:


http://www.chess.com/article/view/openings-for-tactical-players-kings-indian-defense

Gupta-Carr. (2013). London Classic. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1740547

Hansen, C. (2009). Checkpoint. Retrieved 2014, from Chesscafe.com:


http://www.chesscafe.com/text/hansen125.pdf

Harding, T. (2010). Play the Dutch. Retrieved 2014, from chesscafe.com:


http://www.chesscafe.com/text/kibitz175.pdf

Houdart, R. (2012). Houdini. Retrieved November 2013, from cruxis.com:


http://www.cruxis.com/chess/houdini.htm

Houdart, R. (2013). Houdini Chess. Retrieved 2014, from cruxis.com:


http://www.cruxis.com/chess/houdini.htm

Johnson, V. E. (2013, November). Revised Standards for Statistical Evidence. Retrieved December 2013,
from Proceedings of the National Academy of Sciences:
http://www.pnas.org/content/110/48/19313.full

Jones, R., & Powell, D. (2014). Game Database. Retrieved February 2014, from chesstempo.com:
http://chesstempo.com/game-database.html

Kelley, D. (2008). Catalan. Retrieved 2014, from chessopenings.com: http://chessopenings.com/catalan/

Kelley, D. (2005). Dutch. Retrieved 2014, from chessopenings.com: http://chessopenings.com/dutch/

Kelley, D. (2008). Kings Indian. Retrieved 2014, from chessopenings.com:


http://chessopenings.com/kings+indian/

Kramnik-Leko. (2012). Dortmund. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1672566

Kramnik-Pelletier. (2013). Geneva Chess Masters. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1722084
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 19

Meyer-Kahlen, S. (2007). Deep Shredder. Retrieved 2014, from Shredderchess.com:


http://www.shredderchess.com/chess-software/deep-shredder12.html

Meyer-Kahlen, S. (2000). Opening Database. Retrieved January 2014, from Shredder Chess:
http://www.shredderchess.com/online-chess/online-databases/opening-database.html

Munshi, J. (2014). A Method for Comparing Chess Openings. Retrieved 2014, from SSRN:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2415203

Munshi, J. (2014). Comparing Chess Openings. Retrieved 2014, from SSRN:


http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2427542

Munshi, J. (2014). Numerican analysis. Retrieved 2014, from Drop box:


https://www.dropbox.com/sh/c7ze8c64ukpf525/AADAuuDx_mxXU6BVH-6GoF2ka

Munshi, J. (2014). PGN Files. Retrieved 2014, from Dropbox:


https://www.dropbox.com/sh/nj14ur3cucew5xo/AABaz_LViWNpRTSaBKlUWmj5a

Nakamura-Morozevich. (2013). FIDE Grand Prix Zug. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1716006

Phillips-Moloney. (2012). London Classic. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1700347

Prie, E. (2009). d-pawn specials. Retrieved 2014, from chesspublishing.com:


http://www.chesspublishing.com/content/8/jul09.htm

Radjabov-Karjakin. (2012). Tata Steel. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1654273

Radjabov-Leitao. (2001). E43 Nimzo Indian. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1242308

Rapport-Aronian. (2014). Tata Steel. Retrieved 2014, from chessgames.com:


http://www.chessgames.com/perl/chessgame?gid=1744275

Schiller, E. (1993). How to play against the Staunton Gambit. Chess Digest.

Semkov, S. (2009). Kill KID Vol. 1. New in Chess.

Sielecki, C. (2014). Nimzo and Bogo Indian. Everyman Chess.

Topalov-Kramnik. (2014). World Chess Championships Candidates. Retrieved 2014, from


chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1751385

Wikipedia. (2014). Houdini Chess. Retrieved 2014, from Wikipedia:


http://en.wikipedia.org/wiki/Houdini_(chess)
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 20

Wikipedia. (2014). Monte Carlo Simulation. Retrieved 2014, from Wikipedia:


http://en.wikipedia.org/wiki/Monte_Carlo_method

View publication stats

You might also like