Textbook Laboratory Experiments in Information Retrieval Sample Sizes Effect Sizes and Statistical Power Tetsuya Sakai Ebook All Chapter PDF

Laboratory Experiments in Information
Retrieval: Sample Sizes, Effect Sizes,

and Statistical Power Tetsuya Sakai
Visit to download the full and correct content document:
https://textbookfull.com/product/laboratory-experiments-in-information-retrieval-sampl
e-sizes-effect-sizes-and-statistical-power-tetsuya-sakai/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
Sample Sizes for Clinical, Laboratory and Epidemiology

Studies David Machin
https://textbookfull.com/product/sample-sizes-for-clinical-
laboratory-and-epidemiology-studies-david-machin/
Biota Grow 2C gather 2C cook Loucas
https://textbookfull.com/product/biota-grow-2c-gather-2c-cook-
loucas/
Introduction to Information Retrieval Manning
https://textbookfull.com/product/introduction-to-information-
retrieval-manning/
Introduction to Statistical Methods Design of

Experiments and Statistical Quality Control Dharmaraja
Selvamuthu
https://textbookfull.com/product/introduction-to-statistical-
methods-design-of-experiments-and-statistical-quality-control-
dharmaraja-selvamuthu/
Experiment and Evaluation in Information Retrieval
Models 1st Edition K. Latha
https://textbookfull.com/product/experiment-and-evaluation-in-
information-retrieval-models-1st-edition-k-latha/
Basic Theory and Laboratory Experiments in Measurement

and Instrumentation A Practice Oriented Guide Andrea
Cataldo
https://textbookfull.com/product/basic-theory-and-laboratory-
experiments-in-measurement-and-instrumentation-a-practice-
oriented-guide-andrea-cataldo/
Mobile Information Retrieval 1st Edition Prof. Fabio

Crestani
https://textbookfull.com/product/mobile-information-
retrieval-1st-edition-prof-fabio-crestani/
Information Retrieval Technology: 14th Asia Information

Retrieval Societies Conference, AIRS 2018, Taipei,
Taiwan, November 28-30, 2018, Proceedings Yuen-Hsien
Tseng
https://textbookfull.com/product/information-retrieval-
technology-14th-asia-information-retrieval-societies-conference-
airs-2018-taipei-taiwan-november-28-30-2018-proceedings-yuen-
hsien-tseng/
Think Data Structures: Algorithms and Information

Retrieval in Java 1st Edition Allen B. Downey
https://textbookfull.com/product/think-data-structures-
algorithms-and-information-retrieval-in-java-1st-edition-allen-b-
downey/
The Information Retrieval Series
Tetsuya Sakai
Laboratory
Experiments in
Information
Retrieval
Sample Sizes, Effect Sizes, and
Statistical Power
The Information Retrieval Series Volume 40
Series Editors
ChengXiang Zhai
Maarten de Rijke
Editorial Board
Nicholas J. Belkin
Charles Clarke
Diane Kelly
Fabrizio Sebastiani
More information about this series at http://www.springer.com/series/6128
Tetsuya Sakai
Laboratory Experiments in
Information Retrieval
Sample Sizes, Effect Sizes, and Statistical
Power
123
Tetsuya Sakai
Waseda University
Tokyo, Japan
ISSN 1387-5264
The Information Retrieval Series
ISBN 978-981-13-1198-7 ISBN 978-981-13-1199-4 (eBook)
https://doi.org/10.1007/978-981-13-1199-4
Library of Congress Control Number: 2018951773
© Springer Nature Singapore Pte Ltd. 2018

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
Classical statistical significance tests are useful to some extent in information

retrieval (IR) evaluation, if we can regard the data at hand (e.g., a set of search
queries or “topics” from a test collection) as a random sample from some population.
In such a situation, we usually want to discuss population parameters (e.g., popula-
tion means) based on what we have observed (e.g., sample means and variances).
However, significance tests are sometimes misinterpreted and/or misused. In the
first half of this book (Chaps. 1, 2, 3, 4, and 5), I first review parametric significance
tests for comparing system means (i.e., t-tests and ANOVAs) and show how easily
they can be conducted using Microsoft Excel or R. I also discuss a few multiple
comparison procedures for researchers who are interested in comparing every
system pair, including a randomised version of the Tukey HSD test. I then discuss
known limitations of classical significance testing and provide practical guidelines
for reporting research results regarding comparison of means. These chapters (minus
Chap. 1 Sect. 1.3 where noncentral distributions are discussed) should be suitable for
undergraduate students in computer science.
Chapters 6 and 7 (plus the aforementioned Chap. 1 Sect. 1.3), in which statistical
power is discussed, are probably suitable for graduate students and researchers
who regularly conduct laboratory experiments. In Chap. 6, I introduce topic set size
design, which leverages the statistical power and confidence interval techniques as
described in Nagata (see below), to enable test collection builders to determine an
appropriate number of topics to create. My Excel tools for topic set size design based
on t-tests, one-way ANOVA and confidence intervals are easy to use. In Chap. 7, I
describe power-analysis-based methods for determining an appropriate sample size
for a new experiment based on a similar experiment done in the past. My R tools
for power analysis with t-tests and ANOVA were adopted from those developed by
Toyoda (see below), and rely on R libraries including pwr. I provide case studies
from IR for both Excel-based topic set size design and R-based power analysis.
My main claims in this book are as follows. Researchers should conduct
statistically well-designed experiments so that the right amount of effort is spent
for seeking the truth; they should report experimental results appropriately so that
v
vi Preface
their collective efforts will actually add up. I discuss IR as the primary target of
research, but the techniques discussed in this book are applicable to related fields
such as natural language processing and recommendation, as long as data at hand
can be regarded as a random sample from some population.
This book occasionally cites work from IR literature in the 1960s and 1970s.
Some of them are available for download from the SIGIR Museum.1
My five Excel tools for topic set size design (described in Chap. 6) are based
on a Japanese book by Professor Yasushi Nagata of Waseda University. He kindly
answered my numerous questions about statistical significance testing and sample
size design. My five R scripts for power analysis (described in Chap. 7) are based on
the original scripts written by Professor Hideki Toyoda, also of Waseda University.
He kindly allowed me to modify his scripts and distribute them for research
purposes. I thank these two professors for publishing fantastic Japanese books on
these topics, from which I have learnt a lot.
I am also very grateful to Dr. Shunsuke Horii of Waseda University for
carefully checking my manuscript and giving me very useful comments, and to
my anonymous reviewer who gave me excellent suggestions from the perspective
of an IR evaluation expert. I would also like to thank Mio Sugino of Springer
for her patience and support and my PhD student Zhaohao Zeng for checking the
manuscript, including the code provided in this book.
Finally, I thank my family (Miho, Rio, and Sunny-Uni-Hermione) for under-
standing the life and responsibilities of an academic and supporting me all the time.
Tokyo, Japan Tetsuya Sakai

June 2018
1 http://sigir.org/resources/museum/
Contents
1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Principles of Significance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Sample Means and Population Means . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Hypotheses, Test Statistics, and P -Values . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 α, β, and Statistical Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Well-Known Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Normal Distribution and the Central Limit Theorem . . . . . . . . . 8
1.2.3 χ 2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.4 t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.5 F Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Less Well-Known Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.1 Noncentral t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.2 Noncentral χ 2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.3 Noncentral F Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2 t-Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Paired t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Two-Sample t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Welch’s Two-Sample t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Which Two-Sample t-Test?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Conducting a t-Test with Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.7 Conducting a t-Test with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.8 Confidence Intervals for the Difference in Population Means . . . . . . . . 39
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3 Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.1 One-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.1.1 One-Way ANOVA with Equal Group Sizes . . . . . . . . . . . . . . . . . . . 44
3.1.2 One-Way ANOVA with Unequal Group Sizes . . . . . . . . . . . . . . . . 47
vii
viii Contents
3.2 Two-Way ANOVA Without Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.3 Two-Way ANOVA with Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4 Conducting an ANOVA with Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5 Conducting an ANOVA with R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.6 Confidence Intervals for System Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4 Multiple Comparison Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Familywise Error Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Bonferroni Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.1 Principles and Limitations of the Bonferroni Correction . . . . . 61
4.3.2 Bonferroni Correction with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Tukey HSD Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4.1 Tukey HSD with Unequal Group Sizes . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4.2 Tukey HSD with Equal Group Sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.3 Tukey HSD with Paired Observations . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.4 Simultaneous Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4.5 Tukey HSD with R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5 Randomisation Test and Its Tukey HSD Version . . . . . . . . . . . . . . . . . . . . . . 73
4.5.1 Randomisation Test for Two Systems . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5.2 Randomised Tukey HSD Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5 The Correct Ways to Use Significance Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.1 Limitations of Significance Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.1.1 Criticisms from the Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.1.2 Three Problems, Among Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Effect Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2.1 Effect Sizes for t-Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2.2 Effect Sizes for Tukey HSD and RTHSD. . . . . . . . . . . . . . . . . . . . . . 88
5.2.3 Effect Sizes for ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.3 How to Report Your Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.3.1 Comparing Two Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.3.2 Comparing More Than Two Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 95
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6 Topic Set Size Design Using Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.1 Overview of Topic Set Size Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.2 Topic Set Size Design with the Paired t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2.1 How to Use the Paired t-Test-Based Tool . . . . . . . . . . . . . . . . . . . . . 102
6.2.2 How the Paired t-Test-Based Topic Set Size
Design Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.3 Topic Set Size Design with the Two-Sample t-Test . . . . . . . . . . . . . . . . . . . 108
6.3.1 How to Use the Two-Sample t-Test-Based Tool . . . . . . . . . . . . . . 108
Contents ix
6.3.2
How the Two-Sample t-Test-Based Topic Set Size
Design Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.4 Topic Set Size Design with One-Way ANOVA. . . . . . . . . . . . . . . . . . . . . . . . 111
6.4.1 How to Use the ANOVA-Based Tool . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.4.2 How the ANOVA-Based Topic Set Size Design Works. . . . . . . 112
6.5 Topic Set Size Design with Confidence Intervals for Paired Data . . . . 115
6.5.1 How to Use the Paired-Data CI-Based Tool . . . . . . . . . . . . . . . . . . . 115
6.5.2 How the Paired-Data CI-Based Tool Works . . . . . . . . . . . . . . . . . . . 116
6.6 Topic Set Size Design with Confidence Intervals
for Unpaired Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.6.1 How to Use the Two-Sample CI-Based Tool . . . . . . . . . . . . . . . . . . 118
6.6.2 How the Two-Sample CI-Based Tool Works . . . . . . . . . . . . . . . . . . 119
6.7 Estimating Population Variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.8 Comparing the Different Topic Set Size Design Methods . . . . . . . . . . . . 122
6.8.1 Paired and Two-Sample t-Tests vs. One-Way ANOVA . . . . . . . 124
6.8.2 CI-Based Topic Set Size Design: Paired
vs. Unpaired Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.8.3 One-Way ANOVA vs. Confidence Intervals . . . . . . . . . . . . . . . . . . . 129
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7 Power Analysis Using R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.2 Overview of the R Scripts for Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . 134
7.3 Power Analysis with the Paired t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.4 Power Analysis with the Two-Sample t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.5 Power Analysis with One-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.6 Power Analysis with Two-Way ANOVA Without Replication . . . . . . . 139
7.7 Power Analysis with Two-Way ANOVA with Replication . . . . . . . . . . . 140
7.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.1 A Quick Summary of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.2 Statistical Reform in IR?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Chapter 1
Preliminaries
Abstract This chapter discusses the basic principles of classical statistical signif-
icance testing (Sect. 1.1) and defines some well-known probability distributions
that are necessary for discussing parametric significance tests (Sect. 1.2). (“A
problem is parametric if the form of the underlying distribution is known, and it is
nonparametric if we have no knowledge concerning the distribution(s) from which
the observations are drawn.” Good (Permutation, parametric, and bootstrap tests of
hypothesis, 3rd edn. Springer, New York, 2005, p. 14). For example, the paired t-test
is a parametric test for paired data as it relies on the assumption that the observed
data independently obey normal distributions (See Chap. 2 Sect. 2.2); the sign test
is a nonparametric test; they may be applied to the same data if the normality
assumption is not valid. This book only discusses parametric tests for comparing
means, namely, t-tests and ANOVAs. See Chap. 2 for a discussion on the robustness
of the t-test to the normality assumption violation.) As this book is intended for
IR researchers such as myself, not statisticians, well-known theorems are presented
without proofs; only brief proofs for corollaries are given. In the next two chapters,
we shall use these basic theorems and corollaries as black boxes just as programmers
utilise standard libraries when writing their own code. This chapter also defines less
well-known distributions called noncentral distributions (Sect. 1.3), which we shall
need for discussing sample size design and power analysis in Chaps. 6 and 7. Hence
Sect. 1.3 may be skipped if the reader only wishes to learn about the principles and
limitations of significance testing; however, such readers should read up to Chap. 5
before abandoning this book.
Keywords p-values · Significance testing · Statistical power · Statistical

significance
© Springer Nature Singapore Pte Ltd. 2018 1

T. Sakai, Laboratory Experiments in Information Retrieval,
The Information Retrieval Series 40, https://doi.org/10.1007/978-981-13-1199-4_1
2 1 Preliminaries
1.1 Principles of Significance Testing
1.1.1 Sample Means and Population Means
In laboratory experiments for evaluating two or more systems, we often compare

multiple sample means to discuss which systems might be better than others. For
example, in experimental IR papers, we see tables containing mean average preci-
sion (MAP) [1] values, mean normalised discounted cumulative gain (nDCG) [9]
values, etc. all the time. That is, given n topics (or search requests) for evaluating
the systems, we obtain n Search Engine Result Pages (SERPs) from each system,
and an nDCG score can be computed for each SERP; the per-topic nDCG scores are
then averaged over the n topics, for each system.1
I used the term “sample mean” which implies that I regard the n values as
a random sample from a population of possible nDCG scores for a particular
system.2 A random sample means that I draw an nDCG score at random for n
trials, completely independently. For example, suppose we have a query log from
a web search engine, which contains an infinite number of queries. If we randomly
draw n = 100 queries, this can probably be regarded as a random sample; if we
randomly draw another 100 queries, that would be another random sample, different
from the first one. From different samples, we generally obtain different sample
means, and therefore we can regard the sample mean for a system as a realisation
(i.e., actual value taken) of a continuous random variable; in this book, we only
consider continuous random variables, so “continuous” shall be omitted hereafter.
Henceforth, we primarily consider a set of evaluation measure scores (such as nDCG
scores) as a sample.3
Figure 1.1 depicts a typical situation where two sample means are compared
from the viewpoint of classical statistics. We have a set of (say) nDCG scores for
System X, which are realisations4 of a random variable x. From the n realisations
of x, as shown on the right, we compute the sample mean x̄, which is also a random
variable. Here, we assume that there is an underlying population distribution whose
population mean is μX , a constant. The curve (whatever it actually looks like) is
1 Henceforth, for simplicity, I will basically ignore the difference between “topics” (i.e., informa-
tion need statements) and “queries” (i.e., the sets of keywords input to the search engine) and that
between “queries” and the “scores” computed for their search results.
2 While this book primarily discusses sample means over n topics, some IR researchers have
explored the approach of regarding the document collection used in an experiment as a sample
from a large population of documents [4, 14, 15].
3 Is a TREC (Text REtrieval Conference) topic set a random sample? Probably not. However,
the reader should be aware that IR researchers who rely on significance tests such as t-tests and
ANOVA for comparing system means implicitly rely on the basic assumption that a topic set is a
random sample. The exact assumptions for t-tests and ANOVA are stated in Chaps. 2 and 3. Also, a
computer-based significance test that does not rely on the random sampling and any distributional
assumptions will be described in Chap. 4 Sect. 4.5.
4 In this book, a random variable and its realisations are denoted by the same symbol, e.g. x.
1.1 Principles of Significance Testing 3
Sample mC – mU Sample
mean 0.123 0.567 mean
y x
y– 0.456 0.234
x–
: :
0.789 0.890
Random Random
sample sample
Populations
mU mC
Population means
Fig. 1.1 Sample means and population means
an example of a probability density function (pdf), which returns a nonnegative

value for any input so that the area under the curve adds up to 1. In the figure, the
population mean μX is the value that is most likely to occur, while values towards
the end of the two tails (i.e., those far from μX ) are much less likely to occur.
Similarly, the nDCG scores for System Y are regarded as realisations of a random
variable y, from which we obtain the sample mean ȳ (another random variable),
where the underlying population μY is a constant.
Thus, from data, we obtain sample means as realisations of random variables;
they vary, whereas population means are constants.5 The question we often want
to ask is: “What do these sample means tell us about the population means?” For
example, what can we say about the “real” difference between Systems X and Y ,
measured as μX − μY , given the two samples shown in Fig. 1.1? If you are an IR
researcher, System X could be your proposed system, and System Y could be a
system described in prior art; you probably wish that the population mean μX of
(say) nDCG scores is actually greater than μY .
1.1.2 Hypotheses, Test Statistics, and P -Values
As was mentioned above, suppose that you are interested in whether the population
means μX and μY are really different, based on the observation that your sample
5 Incontrast to classical significance testing, Bayesian statistics [2, 10, 17–19] treat population
parameters as random variables. See also Chap. 8.
4 1 Preliminaries
means x̄ and ȳ are different. To answer your question, in classical significance

testing, we set up a null hypothesis H0 , which says that μX = μY . That is, H0
says that the two population means are actually the same and that the difference you
observed in the sample means just reflects the properties of that particular sample.
In this section, although we continue with our specific example of comparing two
means, note that what is actually presented here is the general principle of classical
significance testing.
Along with H0 , we also consider an alternative hypothesis H1 : in a two-
sided test,6 the alternative to H0 is H1 , μX = μY ; in a one-sided test, the
alternative is H1 , μX > μY ; or H1 , μX < μY , depending on your belief or your
need. Throughout this book, I shall focus on two-sided t-tests, which are more
conservative (i.e. reluctant to yield a statistically significant difference) than their
one-sided counterparts.7 IR researchers often hope that H1 is true that the two
systems are really different in terms of retrieval effectiveness.
Having set up H0 and H1 , we then compute a test statistic from the observed
data: proven theorems from statistics tell us that, under a few basic assumptions
(e.g. the samples are drawn independently from normal distributions) plus the
assumption that H0 is true, this test statistic obeys a well-understood distribution
(e.g. t distribution, F distribution, etc.), which is referred to as the null distribution.
Figure 1.2 shows an example of a pdf of a null distribution and a test statistic. Recall
that the area under the curve adds up to one; the two purple tails of the curve add
up to what is known as the p-value, defined as the probability of observing the test
statistic or something more extreme given that H0 is true. It is worth re-emphasising
that we do not know whether the test statistic actually obeys this distribution; we
only know that it should obey this distribution if H0 is true.
Prior to actually examining the data, we decide on a significance criterion α,
for the purpose of either accepting or rejecting H0 , i.e., to make a dichotomous
decision.8 Let us consider α = 0.05 and look at Fig. 1.2 again: the two red tails of
the curve each represents a 2.5% of the area under the curve and add up to 5%. The
point on the x axis that defines the 2.5% area on the right is called the (two-sided)
critical value for α. In this particular example, it can be observed that the p-value
(sum of the purple tails) is smaller than α (sum of the red tails) or, equivalently, that
6 Not to be confused with two-sample tests, which means you have a sample for System X and a
different sample for System Y , possibly with different sample sizes (See Chap. 2 Sect. 2.3).
7 A one-sided test of the form H : μ > μ would make sense if either μ < μ is simply
1 X Y X Y
impossible, or if you do not need to consider the case where μX < μY even if this is possible [12].
For example, if you are measuring the effect of introducing an aggressive stemming algorithm into
your IR system in terms of recall (not precision) and you know that this can never hurt recall,
a one-side test may be appropriate. But in practice, when you propose a new IR algorithm and
want to compare it with a competitive baseline, it is rarely the case that you know in advance that
your proposal is better. Hence I recommend the two-sided test as default. Whichever you choose,
hypotheses H0 and H1 must be set up before actually examining the data.
8 “Dichotomous thinking,” one of the major reasons why classical significance testing has been
severely criticised for decades, will be discussed in Chap. 5.

1.1 Principles of Significance Testing 5
Null
distribution
-α
α/ α/
p-value Test statistic
Two-sided critical value for α
Fig. 1.2 Null distribution, test statistic, p-value, and the significance criterion α
the test statistic is larger than the critical value. Assuming that the test statistic really
obeys the null distribution, there is less than 5% chance that we happen to obtain
such an extreme value. How come we observed such an unlikely event?
Whenever we observe a test statistic that is less than 100α% likely according to
the null distribution, we blame the null hypothesis H0 : we have assumed that H0 is
true and therefore that the test statistic obeys the null distribution, but since it does
not seem to obey the null distribution (with 100(1 − α)% confidence, since this is
not totally impossible), H0 is probably wrong. Hence we reject H0 . For example,
if H0 said that the two population means are equal, we conclude that the difference
between the two systems is statistically significant at (say) α = 0.05.
What if the p-value is larger than α or, equivalently, the test statistic is smaller
than the critical value for α in Fig. 1.2? In this case, we cannot reject H0 ; all we
know is that the test statistic lies near the middle of the curve and does not contradict
with H0 . In this case, we say that the difference is not statistically significant at
α = 0.05. Note that such a significance test result does not mean that the two
systems are actually equivalent; it just means “we cannot tell from the data.”
1.1.3 α, β, and Statistical Power
Table 1.1 shows the relationship between the truth (i.e. whether H0 is true or false)
and our decisions (i.e. whether to reject H0 or not). If H0 is true, then Fig. 1.2 is
correct, that is, the test statistic actually obeys the null distribution. From the upper
row of Table 1.1, it is clear that when H0 is true, we manage not to reject it 100(1 −
α)% of the time, while rejecting it 100α% of the time. Thus, by construction, we
make a Type I error with 100α% probability. Alternatively, if H0 is false, we might
make a Type II error, which means “failing to reject an incorrect null hypothesis,”
with some probability β. Ideally, we would like to correctly reject an incorrect H0 ;
6 1 Preliminaries
Table 1.1 α, β and statistical power

We cannot reject H0 We reject H0
(not statistically significant) (statistically significant)
H0 is true Correct conclusion Type I error
(e.g. systems are equivalent) (probability: 1 − α) (probability: α)
H0 is false Type II error Correct conclusion
(e.g. systems are not equivalent) (probability: β) (probability: 1 − β)
that is, we would like to detect a real difference whenever there is one. Clearly, the
probability of reaching this type of correct conclusion is 100(1 − β)%, which is
known as the statistical power of an experiment [11].
Recall from Sect. 1.1.2 that it is easy to set α: all we have to do is to declare
in advance what α is going to be and then compute a p-value from data and reject
H0 if the p-value is smaller than α. In contrast, ensuring that β (or, equivalently, the
power) takes a particular value is not as straightforward, because of the relationships
among α, β, the sample size, and the effect size [3, 5, 8], which will be discussed
extensively in Chaps. 5, 6, and 7. At any rate, a well-designed experiment should
have a reasonably high statistical power (1 − β): what is the point of conducting
an experiment if we are highly likely to miss differences that are actually present?
In practice, a high statistical power can be ensured by choosing a large enough
sample size. The desired Type I and Type II error probabilities α and β should be
set according to the actual needs of researchers, but a typical setting is Cohen’s five-
eighty convention [6], which represents α = 0.05 and β = 0.20 (i.e. 80% power).
This means that a Type I error is taken to be four times as serious as a Type II error.
While this does not mean that IR researchers should follow it blindly, henceforth we
shall primarily consider this particular setting by default.
1.2 Well-Known Probability Distributions
1.2.1 Law of Large Numbers
Consider a random variable x whose probability density function (pdf) is given by

f (x), which tells us how likely
∞ a particular realisation of x will occur. Recall that
f (x) ≥ 0 for any x and that −∞ f (x)dx = 1 (See Sect. 1.1.1). The expectation of
any given function g(x) is defined as:
∞
E(g(x)) = g(x)f (x)dx . (1.1)
−∞
In particular, E(x), i.e., the expectation of g(x) = x, is a constant called the

population mean. This is the “central” position of x as it is observed an infinite
1.2 Well-Known Probability Distributions 7
number of times. Moreover, consider g(x) = (x − E(x))2 : we call V (x) =

E((x −E(x))2 ) the population variance of x, which is a constant that quantifies how
the random variable x varies√ from the population mean. The population standard
deviation of x is defined as V (x).
Before introducing the Law of Large Numbers, let us formally define the notion
of independence. Consider random variables x and y and a joint probability density
function f (x, y), which tells us how likely a realisation of the combination (x, y)
occurs at the same time. As before, f (x, y) should have the following properties:
∞
f (x, y) ≥ 0, f (x, y)dxdy = 1 . (1.2)
−∞
Also, consider the following marginal probability density functions:

∞ ∞
fx (x) = f (x, y)dy , fy (y) = f (x, y)dx . (1.3)
−∞ −∞
Thus, fx (x) has eliminated the random variable y from consideration, and fy (y)
has eliminated x from consideration.
If the following holds for any realisations of (x, y), then x and y are said to be
independent:
f (x, y) = fx (x)fy (y) . (1.4)
Informally, “x and y do not influence each other.” Independence for more than two
random variables can be defined in a similar manner.
Theorem 1 (Law of large numbers) Let x1 , x2 , . . . , xn be independent and
identically distributed random variables such that E(xj ) = μ (and V (xj ) = σ 2 )
for j = 1, 2, . . . , n. Then, as we increase the sample size n, the sample mean
given by
1
n
x̄ = xj (1.5)
n
j =1
converges9 to the population mean μ [12].

Note that Theorem 1 does not say anything about what the common distribution
actually is. This theorem says that the population mean can be estimated more and
more accurately from the sample mean by just increasing the sample size. Hence,
9 For a discussion on the difference between convergence in probability (which is used in the
weak law of large numbers) and almost sure convergence (which is used in the strong law of
large numbers), see, for example, https://stats.stackexchange.com/questions/2230/convergence-in-
probability-vs-almost-sure-convergence.
8 1 Preliminaries
if a researcher is interested in the population mean of (say) nDCG scores for an IR

system, it would be better to estimate it using a sample mean over a large topic set
than one over a small topic set.
1.2.2 Normal Distribution and the Central Limit Theorem
The normal distribution, one of the most fundamental distributions used in signifi-
cance testing, is formally defined as follows. For a given pair of parameters μ and
σ (> 0), the pdf of a normal distribution is given by:

1 (x − μ)2
f (x; μ, σ ) = √ exp − (−∞ < x < ∞) . (1.6)
2π σ 2σ 2
From the above formula, note that a normal distribution is symmetric across μ; it
is known that the population mean and the population variance of the above normal
distribution are exactly μ and σ 2 , respectively:
E(x) = μ , (1.7)
V (x) = σ 2 . (1.8)
We denote such a normal distribution by N(μ, σ 2 ).

The normal distribution has several convenient properties besides the ones
mentioned above. Let us discuss a few of them below.
x−μ
Theorem 2 (Standardisation) If x obeys N(μ, σ 2 ), then u = σ obeys
N(0, 12 ) [13].
Here, N (0, 12 ), that is, a normal distribution with mean 0 and standard deviation 1,
is called a standard normal distribution.
Let u be a random variable that obeys N(0, 12 ). Then, for any given probability
P , the value zinv (P ) that satisfies Pr{u ≥ zinv (P )} = P is called the upper 100P %
z-value.10 That is, on the pdf of a standard normal distribution, the area under the
curve to the right of zinv (P ) is exactly P . Owing to the symmetry of the pdf across
zero, the following holds:
Pr{u ≥ zinv (P )} = Pr{u ≤ zinv (1 − P )} = P . (1.9)
For example, the upper 5% of the area under the pdf curve (Pr{u ≥ zinv (0.05)}) is
equal to the area to the left of an upper 95% z-value (Pr{u ≤ zinv (0.95)}).
10 With Microsoft Excel, z (P ) can be obtained as NORM.S.INV(1 − P ); with R, it can be

inv
obtained as qnorm(P , lower.tail=FALSE).
Theorem 3 If x obeys N(μ, σ 2 ), then for any constants a and b, ax + b obeys

N(aμ + b, a 2 σ 2 ) [13].
Theorem 4 (Reproductive property) Let a1 , a2 , . . . , an be constants. If
x1 , x2 , . . . , xn are independent and obey N(μj , σj2 ), respectively (j = 1, 2, . . . , n),
then a1 x1 + a2 x2 + · · · + an xn obeys N(a1 μ1 + a2 μ2 + · · · + an μn , a12 σ12 + a22 σ22 +
· · · + an2 σn2 ) [13].
The above theorem means that “a linear combination of normally distributed random
variables still obeys a normal distribution.”
Corollary 1 If x1 , x2 , . . . , xn are independent and obey an identical normal distri-
2
bution N (μ, σ 2 ), then the sample mean x̄ = n1 nj=1 xj obeys N (μ, σn ).
Proof This is a corollary of Theorem 4. In Theorem 4, consider the case where
aj = 1/n, μj = μ, σj = σ . Then:
a1 μ1 + a2 μ2 + · · · + an μn = μ/n + μ/n + · · · + μ/n = μ , (1.10)
a12 σ12 + a22 σ22 + · · · + an2 σn2 = σ 2 /n2 + σ 2 /n2 + · · · + σ 2 /n2 = σ 2 /n . (1.11)
Corollary 2 If x1 , x2 , . . . , xn are independent and obey an identical normal distri-

bution N (μ, σ 2 ), the following random variable (standardised sample mean) obeys
N (0, 12 ):
x̄ − μ
u= . (1.12)
σ 2 /n
Proof This is a corollary from Corollary 1 and Theorem 2.

Corollary 2 is important, because it implies that if (say) nDCG score differences
all obey N (μ, σ 2 ), then the above u, which is a standardised form of the sample
mean x̄, obeys N (0, 12 ). However, for IR researchers, the problem is that we do not
know the population parameters μ and σ 2 . Of these two, μ is not really a problem
because as soon as we set up a null hypothesis of the form μ = 0, it disappears
from Eq. 1.12. Hence the problem is how to estimate σ 2 and what happens to the
property of u if we replace it with that estimate. We shall come back to this point
after discussing Corollary 6.
Theorem 5 (Central Limit Theorem) Let x1 , x2 , . . . , xn be independent and
identically distributed random variables such that E(xj ) = μ and V (xj ) = σ 2 for

j = 1, 2, . . . , n. Then, for a sufficiently large n, the sample mean x̄ = n1 nj=1 xj
2
approximately obeys N(μ, σn ) [12].
Note the difference between Corollary 1 and this theorem; here, there is no
requirement that the individual random variables be normally distributed. The
central limit theorem is indeed central to classical significance testing, for it says
10 1 Preliminaries
that the sample mean with a “sufficiently large” n can be treated as normally
distributed even if the original population distribution is not normal. Consider
nDCG or any other measure that has the [0, 1] range: such scores are clearly
not normally distributed since the range of any normal distribution is (−∞, ∞).
However, the theorem says that, given a “sufficiently large” topic set, the mean of
the scores can be regarded as a random variable that obeys a normal distribution.
How large is “sufficiently large” then? Of course, the answer depends on what
the original distribution looks like, but even a sample size as small as n = 20 may
suffice for many practical applications. For example, if we draw many samples of
size n = 5 from a uniform distribution, the distribution of x̄ already looks like a
normal distribution; even for a binomial distribution which only gives us either 0
or 1 with probability P = 0.5, if we draw many samples of size n = 15 from it,
the distribution of x̄ already looks quite like a normal distribution. However, if the
probability for the binomial distribution is (say) P = 0.2, the distribution of the
samples of size n = 15 still looks skewed [12].11
Corollary 3 (An Alternative Form of the Central Limit Theorem.) Let
x1 , x2 , . . . , xn be independent and identically distributed random variables such
that E(xj ) = μ and V (xj ) = σ 2 for j = 1, 2, . . . , n. Moreover, let x̄ = n1 nj=1 xj .
x̄−μ
Then, for a sufficiently large n, u = σ approximately obeys N (0, 12 ).
Proof This is a collorary from Theorems 2 and 5.
2
If x̄ obeys N (μ, σn ), then, from Eq. 1.7, E(x̄) = μ holds: we say that the sample
mean x̄ is an unbiased estimator of the population mean μ. Below, we shall discuss
an unbiased estimator of the population variance. Unbiased estimators are useful,
because we never actually know the population means and variances, and therefore
we often need to estimate them as accurately as possible from the samples that we
have.
Recall that for a random variable x, the population variance is given by V (x) =
E((x − E(x))2 ). To compute something similar from a sample, let us consider a
sum of squares:
n

n
n
( j =1 xj )
2
S= (xj − x̄) =
2
xj2 − . (1.13)
n
j =1 j =1
Furthermore, let the sample variance be:
11 Itis often recommended that a binomial distribution be approximated with a normal distribution
provided nP ≥ 5 and n(1 − P ) ≥ 5 hold [12]. Note that when n = 15andP = 0.5, we have
nP = n(1 − P ) = 7.5 > 5, whereas, when n = 15andP = 0.2, we have nP = 3 < 5. In the latter
situation, the above recommendation suggests that we should increase the sample size to n = 25:
however, note that even this is not a large number.
S
V = . (1.14)
n−1
Note12 that the denominator is n − 1 rather than n. We have the following theorem
for V .
Theorem 6 If x1 , x2 , . . . , xn are independent and all obey N (μ, σ 2 ), then E(V ) =
σ 2 holds [12].
In other words, the sample variance V is an unbiased estimator of the population
variance σ 2 . This suggests a solution to the aforementioned problem regarding the
lack of knowledge about σ 2 in Eq. 1.12: if we replace σ 2 (a constant) with V (a
random variable) in Eq. 1.12, u may no longer strictly obey N (0, 1) but obey a
distribution reasonably similar to it. This is exactly what a t-distribution is, which
we rely on when we conduct t-tests.
While the sample variance may also be defined by using n as the denominator
rather than n − 1, that version is not an unbiased estimator of σ 2 and is not very
useful in the context of classical significance testing.13
1.2.3 χ 2 Distribution
Here we define the χ 2 distribution. In this book, we shall use it to define the t
distribution (which we need to conduct t-tests) and the F distribution (which we
need to conduct ANOVAs).
Let u1 , u2 , . . . , uk be independent random variables that each obey N (0, 12 ).
The probability distribution of the following random variable χ 2 is called a χ 2
distribution with φ = k degrees of freedom:

k
χ 2 = u21 + u22 + · · · + u2k = u2j , (1.15)
j =1
12 The sample mean uses n as the denominator because it is based on n independent pieces of
information, namely, xi . In contrast, while Eq. 1.13 shows on (xi − x̄)’s, these are
that S is based
not actually n independent pieces of information, since ni=1 (xi − x̄) = ni xi − nx̄ = 0 holds.
There are only (n − 1) independent pieces of information in the sense that, once we have decided
on (n − 1) values out of n, the last one is automatically determined due to the above constraint.
For this reason, dividing S by n − 1 makes sense [12]. See also Sect. 1.2.4 where the degrees of
freedom of S is discussed.
13 To be more specific, E(S/n) = n−1 σ 2 < σ 2 and hence S/n underestimates σ 2 [12]. Also
n √
of note is that the sample standard deviation V is not an unbiased estimator of the population
standard deviation σ despite the fact that E(V ) = σ 2 .
12 1 Preliminaries
and is denoted by χ 2 (φ). That is, if you construct a sum of squares from independent
variables that each obey a standard normal distribution, the distribution of that sum
of squares is called a χ 2 distribution. The degrees of freedom φ can be interpreted
as a measure of how “accurate” the sum of squares is: we shall come back to this
point in Sect. 1.2.4.
For any x > 0, the pdf of χ 2 (φ) is given by [13]14 :
1
f (x; φ) = x (φ/2)−1 e−x/2 , (1.16)
(φ/2)2φ/2
where (a) is the gamma function given by:

∞
(a) = x a−1 e−x dx (a > 0) . (1.17)
0
Let χ 2 be a random variable that obeys χ 2 (φ). Then, for any given probability
2 (φ; P ) that satisfies Pr{χ 2 ≥ χ 2 (φ; P )} = P is called the upper
P , the value χinv inv
100P % χ -value.15 That is, on the pdf of a χ 2 distribution, the area under the curve
2
2 (φ; P ) is exactly P .
to the right of χinv
Corollary 4 If x1 , x2 , . . . , xn are independent and each obey N (μ, σ 2 ), then the
following random variable obeys χ 2 (n):
n
j =1 (xj − μ)2
. (1.18)
σ2
x −μ
Proof From Theorem 2, uj = j σ for j = 1, 2, . . . , n are independent, and each
obey N (0, 12 ). Hence, from the definition of χ 2 distribution, the above Corollary
holds.
The following theorem is used to derive a property of the t-distribution concern-
ing the two-sample t-test (Sect. 1.2.4) and to discuss how ANOVA works (Chap. 3
Sect. 3.1).
Theorem 7 (Reproductive property) Consider a random variable χ12 that obeys
χ 2 (φ1 ) and another χ22 that obeys χ 2 (φ2 ). If the two are independent, then χ12 + χ22
obeys χ 2 (φ1 + φ2 ) [13].
Again, let x1 , x2 , . . . , xn be independent and each obey N (μ, σ 2 ), and recall the
sample mean x̄ and the sum of squares S:
is known that the population mean and the population variance of χ 2 are given by E(χ 2 ) = φ
14 It
and V (χ 2 ) = 2φ, respectively.

15 With Microsoft Excel, χ 2 (φ; P ) can be obtained as CHISQ.INV.RT(P , φ); with R, it can be
inv
obtained as qchisq(P , φ, lower.tail=FALSE).
1
n n
x̄ = xj , S = (xj − x̄)2 . (1.19)
n
j =1 j =1
We will also need the following two theorems concerning S to discuss an important
property of the t distribution in Sect. 1.2.4.
Theorem 8 x̄ and S are independent [13].
Theorem 9 S/σ 2 obeys χ 2 (n − 1) [13].
Corollary 4 is used in the proof of Theorem 9. Note that while the random
variable discussed in Corollary 4 involves the population mean μ, that discussed
in Theorem 9 involves the sample mean x̄, and that the degrees of freedom for the
latter is n − 1 rather than n.
1.2.4 t Distribution
Having gone through normal and χ 2 distributions, we can now define the t
distribution. Let u be a random variable that obeys N (0, 12 ) and χ 2 be a random
variable that obeys χ 2 (φ). Furthermore, let u and χ 2 be independent. Then, the
probability distribution of the following random variable t is called a t distribution
with φ degrees of freedom:
u
t= . (1.20)
χ 2 /φ
The aforementioned distribution is denoted by t (φ).

For any x, the pdf of t (φ) is given by [13]16 :
−(φ+1)/2
1 x2
f (x; φ) = √ 1+ , (1.21)
φB(1/2, φ/2) φ
where B(a, b) is the beta function given by:

1
B(a, b) = x a−1 (1 − x)b−1 dx (a > 0, b > 0) . (1.22)
0
Let t be a random variable that obeys t (φ). Then, for any given probability
P , the value tinv (φ; P ) that satisfies Pr{|t| ≥ tinv (φ; P )} = P is called the two-
is known that the population mean and the population variance of t are given by E(t) = 0 (for
16 It
φ
φ ≥ 2) and V (t) = φ−2 (for φ ≥ 3), respectively.
14 1 Preliminaries
sided 100P % t-value.17 Note that, while zinv (P ) was defined as the upper 100P %
value, tinv (φ; P ) is defined as a two-sided 100P % value, following Nagata [13]: As
Fig. 1.2 (Sect. 1.1.2 of this chapter) suggests, a two-sided (say) 5% t-value secures
2.5% on either side of the pdf curve. Owing to the symmetry of the pdf across zero,
the following holds:
P
Pr{t ≥ tinv (φ; P )} = Pr{t ≤ −tinv (φ; P )} = . (1.23)
2
The following properties of the t distribution enable us to conduct t-tests. To be
more specific, Corollary 5 forms the basis of the paired t-test (Chap. 2 Sect. 2.2),
while Corollary 6 forms the basis of the two-sample t-test (Chap. 2 Sect. 2.3). The
paired t-test can be used, for example, when we want to compare two systems with
the same topic set; the two-sample t-test can be used, for example, when we want
to compare two systems based on a user study where User Group A was assigned to
one system and User Group B was assigned to the other.
following random variable obeys t (n − 1):
x̄ − μ
t=√ , (1.24)
V /n
where x̄ is the sample mean and V = S/(n − 1) is the sample variance (See
Eq. 1.19).

Proof First, from Corollary 2, u = (x̄ − μ)/ σ 2 /n obeys N (0, 12 ). Second, from
Theorem 9, χ 2 = S/σ 2 = (n − 1)V /σ 2 obeys χ 2 (n − 1). Therefore,

x̄ − μ u σ 2 /n u
t=√ = = . (1.25)
V /n χ σ /(n − 1)n
2 2 χ /(n − 1)
2
Moreover, from Theorem 8, x̄ and S are independent, and therefore u and χ 2 are
also independent. Hence, from the definition of t distribution, the above corollary
holds.
In the above discussion, the degrees of freedom for the t distribution was φ =
n − 1. Note that:
S
V = . (1.26)
φ
17 With Microsoft Excel, t

inv (φ; P ) can be obtained as T.INV.2T(P , φ); with R, it can be obtained
as qt(P /2, φ, lower.tail=FALSE).
We have already seen when V was first introduced in Eq. 1.14 that V relies on
φ = n − 1 independent pieces of information: φ represents the degrees of freedom
associated with the sum of squares S. Now, compare u given in the above proof,
which obeys N (0, 12 ), and t as defined in Eq. 1.24. It can be observed that t is
obtained by replacing the population variance σ 2 in the definition of u with a
sample variance V ; as a result t obeys a distribution very similar to N (0, 12 ). How
similar depends on the accuracy of V as an estimator of σ 2 , which is exactly what
the degrees of freedom φ represents. The accuracy can be improved arbitrarily by
increasing the sample size n; when φ = ∞, the t distribution is exactly N (0, 12 ).
Corollary 6 If x11 , x12 , . . . , x1n1 each obey N(μ1 , σ 2 ), x21 , x22 , . . . , x2n2 each
obey N (μ2 , σ 2 ), and they are all independent, then the following random variable
obeys t (n1 + n2 − 2):
x̄1• − x̄2• − (μ1 − μ2 )

t= , (1.27)
Vp (1/n1 + 1/n2 )
where
n 1 n 2
j =1 x1j j =1 x2j
x̄1• = , x̄2• = , (1.28)
n1 n2
and Vp is a pooled variance given by [12]:

n1
n2
S1 + S2
S1 = (x1j − x̄1• )2 , S2 = (x2j − x̄2• )2 , Vp = .
n1 + n2 − 2
j =1 j =1
(1.29)
Proof From Corollary 1, x̄1• obeys N(μ1 , σ 2 /n1 ), while x̄2• obeys N (μ2 , σ 2 /n2 ).
Since x̄1• and x̄1• must also be independent, from Theorem 4, x̄1• − x̄2• obeys
N(μ1 − μ2 , σ 2 (1/n1 + 1/n2 )). Hence, from Theorem 2,
x̄1• − x̄2• − (μ1 − μ2 )

u= (1.30)
σ 2 (1/n1 + 1/n2 )
obeys N (0, 12 ). On the other hand, from Theorem 9, S1 /σ 2 obeys χ 2 (n1 −1), while
S2 /σ 2 obeys χ 2 (n2 − 1). Since these must also be independent, from Theorem 7,
χ 2 = S1 /σ 2 + S2 /σ 2 obeys χ 2 (n1 + n2 − 2), which gives us S1 + S2 = χ 2 σ 2 .
Therefore, Eq. 1.27 can be rewritten as:

u σ 2 (1/n1 + 1/n2 ) u
t= = . (1.31)
χ 2σ 2
(1/n1 + 1/n2 ) χ /(n1 + n2 − 2)
2
n1 +n2 −2
16 1 Preliminaries
Moreover, from Theorem 8, x̄1• and S1 are independent, x̄2• and S2 are independent,
and therefore u (Eq. 1.30) and χ 2 (= (S1 + S2 )/σ 2 ) are independent. Hence, from
the definition of t distribution, the above corollary holds.
1.2.5 F Distribution
Let us now define the F distribution. Consider a random variable χ12 that obeys
χ 2 (φ1 ), and another χ22 that obeys χ 2 (φ2 ). Furthermore, let χ12 and χ22 be
independent. Then the probability distribution of the following random variable F
is called an F distribution with (φ1 , φ2 ) degrees of freedom:
χ12 /φ1
F = . (1.32)
χ22 /φ2
The aforementioned distribution is denoted by F (φ1 , φ2 ).

For x > 0, the pdf of F (φ1 , φ2 ) is given by [13]18 :
φ1 /2 −(φ1 +φ2 )/2

1 φ1 φ1 x
f (x; φ1 , φ2 ) = x φ1 /2−1 1+ ,
B(φ1 /2, φ2 /2) φ2 φ2
(1.33)
where B(a, b) is as defined in Eq. 1.22.
Let F be a random variable that obeys F (φ1 , φ2 ). Then, for any given probability
P , the value Finv (φ1 , φ2 ; P ) that satisfies Pr{F ≥ Finv (φ1 , φ2 ; P )} = P is called
the upper 100P % F -value.19 That is, on the pdf of an F distribution, the area under
the curve to the right of Finv (φ1 , φ2 ; P ) is exactly P .
The following relationship between the t distribution and the F distribution will
be used to show that one-way ANOVA with two groups is strictly equivalent to
two-sample t-test in Chap. 6 Sect. 6.8.1.
Corollary 7 Let t be a random variable that obeys t (φ). Then, t 2 obeys F (1, φ).
Proof From Eq. 1.20, t 2 can be expressed as:
u2 u2 /1
t2 = = , (1.34)
χ 2 /φ χ 2 /φ
φ2
18 It is known that the population mean and the population variance of F are given by E(F ) = φ2 −2
2φ22 (φ1 +φ2 −2)
(for φ2 ≥ 3) and V (F ) = φ1 (φ2 −2)2 (φ2 −4)
(for φ2 ≥ 5), respectively.
19 WithMicrosoft Excel, Finv (φ1 , φ2 ; P ) can be obtained as F.INV.RT(P , φ1 , φ2 ); with R, it can
be obtained as qf(P , φ1 , φ2 , lower.tail=FALSE).
1.3 Less Well-Known Probability Distributions 17
where u obeys N (0, 12 ) and χ 2 obeys χ 2 (φ). Now, from the definition of the χ 2
distribution (Eq. 1.15), clearly u2 obeys χ 2 (1). Hence, from the definition of the F
distribution (Eq. 1.32), the above corollary holds.
The following property of the F distribution is what enables us to conduct
ANOVA, as we shall discuss in Chap. 3.
Theorem 10 Let x11 , x12 , . . . , x1n1 be random variables that each obey N (μ1 , σ12 ).
Let x21 , x22 , . . . , x2n2 be random variables that each obey N (μ2 , σ22 ). Moreover, let
all of the variables be independent. Furthermore, let:
n 1 n 2
j =1 (x1j − x̄1• )2 j =1 (x2j − x̄2• )2
V1 = , V2 = , (1.35)
n1 − 1 n2 − 1
where x̄1• and x̄2• are as defined in Eq. 1.28. Then, the following random variable
obeys F (n1 − 1, n2 − 1) [13]:
V1 /σ12
. (1.36)
V2 /σ22
1.3 Less Well-Known Probability Distributions
The noncentral distributions discussed below are required for understanding what
goes on behind topic set size design (Chap. 6) and power analysis (Chap. 7). While
the central t and F distributions described in Sect. 1.2 are what the test statistics
obey when the null hypothesis H0 is true, noncentral t and F distributions described
below are the ones they obey when H0 is false. As was mentioned earlier, this section
may be skipped if the reader only wants to learn how to conduct t-tests and ANOVAs
and to report the results appropriately.
1.3.1 Noncentral t Distribution
Let y be a random variable that obeys N(λ, 12 ) for some constant λ; let χ 2 be
a random variable that obeys χ 2 (φ). If y and χ 2 are independent, the probability
distribution of the following random variable t is called the noncentral t distribution
with φ degrees of freedom and a noncentrality parameter λ:
y
t = , (1.37)
χ 2 /φ
18 1 Preliminaries
and is denoted by t (φ, λ). Compare Eq. 1.37 with Eq. 1.20: it is clear that when
λ = 0, then t (φ, λ) reduces to t (φ). That is, the noncentral t distribution generalises
the (central) t distribution.
For the sake of completeness, the pdf of t (φ, λ) is provided below [13]20 :
∞ −(φ+j +1)/2
exp(−λ2 /2) φ+j +1 (λx)j 2 j/2 x2
f (x; φ, λ) = √ 1+ .
φπ(φ/2) 2 j! φ φ
j =0
(1.38)

following random variable obeys t (n − 1, λ):
x̄ − μ0
t0 = √ , (1.39)
V /n
where
√
n(μ − μ0 )
λ= . (1.40)
σ
Proof The√proof is similar to that of Corollary 5: from that proof, we have x̄ −
μ = uσ/ n (where u obeys N(0, 12 )) and V = χ 2 σ 2 /(n − 1) (where
√ χ 2 obeys
χ (n − 1)). Also, if we define λ as in Eq. 1.40, then μ − μ0 = λσ/ n. Therefore,
2
√ √
(x̄ − μ) + (μ − μ0 ) uσ/ n + λσ/ n u+λ
t0 = √ = = . (1.41)
V /n σ χ /(n − 1)n
2 χ /(n − 1)
2
Now, since u obeys N(0, 12 ), y = u + λ obeys N(λ, 12 ) (Theorem 3). Hence, from
the definition of noncentral t distribution, the above corollary holds.
Corollary 9 If x11 , x12 , . . . , x1n1 each obey N(μ1 , σ 2 ), x21 , x22 , . . . , x2n2 each
obey N (μ2 , σ 2 ), and they are all independent, then the following random variable
obeys t (n1 + n2 − 2, λ):
x̄1• − x̄2•
t0 = (1.42)
Vp (1/n1 + 1/n2 )
where
√
n1 n2 /(n1 + n2 )(μ1 − μ2 )
λ= . (1.43)
σ
20 It is known that the

√
population mean and the population variance of t are given by E(t ) =
2)
λ φ/2((φ−1)/2)
(φ/2) (for φ ≥ 2) and V (t ) = φ(1+λ 2
φ−2 − {E(t )} (for φ ≥ 3), respectively.
1.3 Less Well-Known Probability Distributions 19
Proof The proof is similar to that of Corollary 6: from that proof, we have

n1 + n2
x̄1• − x̄2• = uσ + μ1 − μ2 (1.44)
n1 n2
(where u obeys N (0, 12 )) and Vp = χ 2 σ 2 /(n1 + n2 − 2) (where χ 2 obeys χ 2 (n1 +

n2 − 2)). Also, if we define λ as in Eq. 1.43, then

n1 + n2
μ1 − μ2 = λσ . (1.45)
n1 n2
Therefore,
n1 +n2 n1 +n2
uσ n1 n2 + λσ n1 n2 u+λ
t0 = = . (1.46)
χ 2σ 2 n1 +n2 χ 2 /(n1 + n2 − 2)
n1 +n2 −2 n1 n2
Now, since u obeys N(0, 12 ), y = u + λ obeys N(λ, 12 ) (Theorem 3). Hence, from
the definition of noncentral t distribution, the above corollary holds.
Theorem 11 (Approximating a noncentral t distribution with a normal distri-
bution) For random variables t and u that obey t (φ, λ) and N (0, 12 ), respec-
tively, the following approximation holds [13]:

wc∗ − λ
Pr{t ≤ w} ≈ Pr u ≤ (1.47)
1 + w 2 (1 − c∗2 )
where
⎛ √ φ+1
⎞
χ2 2 2
c∗ = E ⎝ ⎠=
√ φ (1.48)
φ φ 2
and χ 2 is a random variable that obeys χ 2 (φ).

A gist of the proof is given in Appendix 1 of Sakai [16].
Corollary 10 For random variables t and u that obey t (φ, λ) and N (0, 12 ),
respectively, the following approximation holds [13]:

w(1 − 1/4φ) − λ
Pr{t ≤ w} ≈ Pr u ≤ . (1.49)
1 + w 2 /2φ
Another random document with
no related content on Scribd:
qu’on ne pourrait plus reconstruire. Ne valait-il pas mieux s’en
accommoder, et se dire même que c’était mieux ainsi, qu’on vivrait
une vie aisée et plus large, moins emprisonnée dans l’austérité ?
Berthe lui avait menti. Pour s’épargner la honte d’un pardon,
d’une grâce à octroyer, il préférait amnistier d’une façon générale
toutes les pauvres petites créatures d’amour. Il ne fallait pas les
prendre au tragique. Il résolut de se livrer dès le soir même à la
débauche, pour se persuader que la vie n’était qu’une chose frivole.
— Est-ce que la Prune vient toujours ici ? demanda-t-il à Julius.
— Je pense. Je l’ai encore vue la semaine dernière.
— Si seulement elle pouvait venir ce soir ! Je n’aurai pas cette
veine.
Il eut cette veine presque au même instant. Il aperçut la Prune
assise à une table. Mais il ne l’aborda pas. Il ne l’avait jamais vue
aussi courte. Comme ses yeux étaient à fleur de tête ! Et sur ses
noirs cheveux luisants, quel petit chapeau de velours grenat,
ridicule !
— Je m’en vais, dit Daniel. Je vais prendre le train de minuit.
Il monta dans un long train morne, presque vide. Il ne choisit pas
un compartiment de milieu. Ça lui était bien égal de dérailler, ce soir-
là.
Cinq mois auparavant, quand il était revenu pour la seconde fois
à Bernainvilliers, il faisait un temps ensoleillé. La Providence mettait
sur sa route une jeune fille très belle et très riche, qui n’avait jamais
aimé que lui. Ce jour-là, il s’était pris à mépriser le bonheur, qu’il
trouvait trop rapide et trop facile.
XXVII
SAGESSE NOCTURNE
Pour l’unique voyageur qu’il déposa à une heure du matin sur le

quai de la gare, le train de Paris fit vraiment un bruit exagéré. Daniel
donna son coupon de retour à l’employé des billets, qui parut
d’ailleurs aussi indifférent devant cette maigre récolte que pour les
mille petits cartons dont les voyageurs du dimanche lui remplissaient
les mains. Daniel envia ce fonctionnaire modeste, et les longs
sommeils insouciants qu’il devait goûter tous les jours, après son
service de nuit.
Lui, pour l’instant, n’avait pas le cœur à dormir. Il ne put se
résoudre à rentrer chez lui tout de suite. Il préféra errer une partie de
la nuit dans les rues de Bernainvilliers, que les villas bordaient de
feuillages sombres. Il prit l’avenue circulaire. C’était, par hasard sans
doute, le chemin de la maison Voraud.
En temps ordinaire, il n’était pas homme à faire des promenades
de nuit, mais le manteau romantique imaginaire dont il se drapait le
préservait de toute crainte, en ce pays d’ailleurs paisible. Cependant
il eut un léger frisson en passant auprès d’un monument érigé dans
un carrefour, à l’endroit où des gens étaient morts pendant la guerre.
Il se souvint qu’un soir de l’été précédent, lui et Julius s’étaient
fait peur en se racontant des histoires. Aussi avaient-ils passé une
partie de la nuit à se reconduire l’un chez l’autre. Chaque fois qu’on
allait atteindre une des deux maisons, celui dont ce n’était pas la
demeure ranimait la conversation sans en avoir l’air, afin que l’on pût
revenir sur ses pas. A la fin Julius avait eu le dessus dans cette lutte
inavouée. Arrivé devant sa grille, il allégua une telle fatigue qu’il fallut
le laisser aller se coucher. Daniel, d’ailleurs, aussitôt seul sur la
route, avait eu beaucoup moins peur. Il faisait tête aux embuscades
de l’ombre et sentait grandir son courage.
L’esprit de Daniel séjourna quelques instants dans ces vieux
souvenirs. Puis la piste de Julius le ramena naturellement aux
révélations récentes. Il revit quel visage implacable avait son ami, en
lui dénonçant les relations de Berthe et d’André Bardot.
Alors, il imagina Berthe installée sur les genoux d’André, tandis
que déjà Louise Loison, près de la porte, faisait le guet.
Il vit Berthe se pencher sur André, l’embrasser dans le cou, au-
dessus du haut col blanc. Elle avait baisé de ses lèvres cette peau
étrangère, cette peau dure de blond rasé.
Il y avait donc dans le passé de Berthe le souvenir de ces
contacts. Il y avait un souvenir qu’elle lui avait caché !
Sa vanité s’irritait de cette dissimulation, et qu’il eût existé entre
André et Berthe un secret dont lui, Daniel, avait été exclu. Il ne se
disait pas que la dissimulation de la jeune fille ressemblait beaucoup
à un oubli.
Il ignorait encore à cette époque que certaines jeunes femmes,
dès qu’elles entrent en relations sentimentales avec un monsieur
nouveau, mettent en son honneur une mémoire propre, toute neuve,
sans une tache, et sans un pli.
Nerveusement, il traîna sa canne en travers sur les barreaux
d’une grille. Un bruit violent se fit entendre qui, dans la maison
endormie, dut réveiller des têtes anxieuses ; soulevées brusquement
au-dessus de l’oreiller, elles guettaient maintenant le retour d’un bruit
pareil.
Daniel se mit alors à rire et recommença le même bruit sur une
autre grille pour s’amuser.
La sensation qu’il goûtait, au souvenir des enlacements de
Berthe et d’André, ne lui était vraiment pas trop déplaisante.
D’imaginer avec une certaine rage le plaisir que cet autre avait eu,
en serrant dans ses bras le souple corps de Berthe, c’était encore
une façon de retrouver soi-même ce plaisir. Et puis la ferveur
d’André ranimait la sienne. L’autel qu’il était un peu las d’adorer avait
retrouvé son prestige, en s’enrichissant d’un nouveau fidèle.
Ce qui l’avait tout d’abord attiré vers Berthe, c’était le besoin de
familiariser cet être charmant et lointain, de voir prendre à cette jolie
fille des attitudes d’amoureuse. S’il eût été plus âgé, il eût peut-être
éprouvé autant de joie, et plus de joie délicate, à voir ses beaux bras
enlacer un autre cou que le sien. Mais, à vingt ans, la si pure
satisfaction de regarder les dames est gâtée par le désir de profiter
de leurs faveurs si déplorablement identiques. Et l’on abandonne
cette attitude si commode, si désintéressée, du spectateur, libéré de
la préoccupation d’un rôle à tenir.
Il était une fois une baguette de bois que l’on avait plantée dans
un vase pour soutenir une plante grimpante, qui n’attendait que cet
instant pour grimper. La petite baguette en avait conçu un grand
orgueil. Elle se disait que c’était grâce à elle, à cause d’elle, que la
plante croissait et se contournait en des enlacements si gracieux.
Elle ne pensait pas que beaucoup d’autres baguettes eussent rempli
ce glorieux office.
Le dernier jeune homme distingué par Mlle Voraud était arrivé
jusqu’à la maison de sa bien-aimée. A travers la grille, il contempla
le jardin immobile, qui, sous la lumière de la nuit, avec la balustrade
du perron, ressemblait à ces paysages gris-perle où les
photographes du Second Empire aimaient à placer nos oncles en
redingote, à la barbe ronde et aux cheveux bouclés sur l’oreille, et
aussi le cousin que personne n’a jamais vu, le digne chef de
bataillon, pacifique ornement des albums.
Daniel, soutenu par l’idée qu’il accomplissait un acte de
prodigieuse audace, pénétra dans le jardin par la petite porte du
coin, qu’on ne fermait pas. Il s’approcha à une vingtaine de pas du
perron et resta derrière un arbre de la pelouse. Il regarda la maison
muette où Berthe était couchée. Entre les autres fenêtres du premier
étage, sa fenêtre dormait, les volets fermés. C’était derrière ces
volets, sur un lit étroit, laqué de blanc, que sa fiancée était étendue.
Daniel l’imagina les cheveux en désordre, l’épaule peut-être
découverte. Et il sentit qu’entre les draps fins la jambe de Berthe
touchait son autre jambe nue.
Quelques jours auparavant, elle avait été souffrante et était
restée au lit pendant une journée. Le jeune fiancé avait été autorisé
à aller la voir dans sa chambre, sous la surveillance de Mme Voraud
qui les laissa seuls une minute pour chercher son ouvrage dans une
pièce à côté. Berthe lui tendit les bras ; Daniel pour la première fois
avait senti contre lui la forme et la chaleur de son corps. Mais
l’absence de Mme Voraud s’était trouvée être trop courte et trop
imprévue. Cet instant de plaisir avait manqué de préparations.
Daniel l’avait revécu maintes fois, en des regrets profonds.
Il pensa maintenant qu’il était tout près de Berthe, qu’il n’avait
qu’à gravir le perron, à traverser la salle, à monter l’escalier, à ouvrir
doucement la porte de la chambre. Il se livrait en paix à ces
imaginations, sachant bien qu’il ne pourrait jamais forcer les volets
de fer des fenêtres haut vitrées qui donnaient sur le perron. Le
charme de ses rêves n’était pas gâté par la possibilité et l’obligation
de les accomplir.
La grande raison, d’ailleurs, de sa quiétude, après la révélation
de Julius, c’est que rien ne l’obligeait plus à agir. Du moment qu’il
savait ce qui s’était passé entre Berthe et André Bardot, il n’avait
plus à travailler pour sortir du doute. L’explication avec Berthe n’avait
qu’une faible importance : il se réjouissait seulement à l’idée de
prendre, tour à tour, en lui pardonnant, l’attitude d’un garçon
magnanime ou celle d’un esprit supérieur, dégagé de tous les
préjugés.
Et puis ses parents à lui ne sauraient rien de tout cela. C’était
l’essentiel.
Il savait très bien cacher aux gens ce qu’ils ne devaient jamais
savoir. Il se croyait pourtant incapable de dissimulation. Mais en
réalité sa franchise n’était qu’une hâte craintive à s’accuser lui-
même de ce qui pouvait être révélé par d’autres.
En somme, du moment que la chose demeurait entre lui et
Berthe, il en faisait son affaire.
Il sentait bien que son roman avec Berthe n’était plus, après
toutes les révélations sur la fortune et la conduite de sa fiancée,
l’aventure rare, l’occasion exceptionnelle qui l’avait exalté. C’était
très bien encore tel que cela était.
Il était toujours installé derrière un arbre de la pelouse et il n’y
avait aucun motif pour qu’il quittât cette place. Mais il remarqua tout
à coup qu’il ne pensait plus à rien et qu’il tombait de sommeil. Alors il
reprit machinalement le chemin du chalet Pilou, monta chez lui, se
coucha et s’endormit sans s’en apercevoir, ayant même négligé ce
soir-là les formalités pourtant obligatoires qui précédaient son repos
de chaque nuit, à savoir : ronde d’exploration dans le salon et la
salle à manger ; double, triple et même quadruple vérification des
deux robinets de gaz, à la cuisine ; seconde tournée sans lumière
dans toutes les pièces précitées, pour s’assurer qu’il n’y a aucune
chance d’incendie ; examen de l’escalier à ce même point de vue
après avoir placé la bougie dans la chambre ; fermeture, une fois
rentré, de la porte de cette chambre, en plaçant la clef sur la table de
nuit de telle sorte que la tige de cette clef reste parallèle à un des
bords de la table (cette dernière prescription ne reposant d’ailleurs
sur aucun motif bien défini). En dernier lieu, examen des armoires à
vêtement, du dessous du lit ; introduction d’une canne le plus haut
possible dans la cheminée pour en scruter l’intérieur.
XXVIII
L’ATTACHEMENT
Quand Daniel se rendit à la maison Voraud, pour déjeuner chez

sa fiancée, il marchait avec une certaine hâte, étant très pressé de
pardonner.
Il avait dormi profondément depuis la veille. A huit heures, il avait
chassé d’un grognement la femme de chambre qui venait ouvrir les
volets. Elle était revenue vers dix heures sur les injonctions de Mme
Henry, qui n’admettait pas qu’on dormît toute la matinée. Une clarté
barbare avait envahi la chambre. Il ne restait plus à Daniel que la
cloison de ses paupières pour protéger la paix obscure de son âme
contre l’invasion du jour. Et ce n’était plus la nuit que voyaient ses
yeux fermés, mais une sorte d’ombre rose fatigante. Il se retourna
vers la ruelle et remonta son drap sur ses yeux.
Cependant toutes sortes d’obligations s’éveillaient en lui. Il fallait
se lever, se laver, s’habiller, aller chez Berthe, parler. Rien que pour
se lever, il faudrait quitter ses draps, mettre les jambes à l’air,
chercher en gémissant sous le lit la pantoufle qui disparaît toujours.
Et toutes les formalités du lavabo ! Il se rendormit lâchement
pendant deux minutes, et rêva qu’il se levait. Oh ! quel ennui que ce
soit le matin et que la nuit clémente ne soit pas encore de retour !
Et, à peine levé, sans qu’il s’en aperçût, il fut tout de suite
consolé de ne plus dormir. Il n’eut plus que le besoin de sortir, d’aller
se promener.
Depuis la grille, il aperçut dans la salle à manger des Voraud,
Berthe, Mme Voraud et Louise Loison. La vieille grand’mère avait
quitté Bernainvilliers depuis la veille. Elle était retournée à Paris,
chez un oncle de Berthe, qui, pour trois mois maintenant, en avait le
dépôt ; Daniel fut très content de voir Louise Loison. Il était
embarrassé pour entamer la conversation avec Berthe. Louise serait
l’intermédiaire indiqué.
Mais sa joie d’apporter le pardon à sa bien-aimée tomba un peu
quand il fut près d’elle. Tant d’événements étaient survenus depuis
la veille, et Berthe avait passé, dans ses réflexions, sous tant de
points de vue divers, qu’elle lui semblait revenue d’un long voyage. Il
l’avait vue tellement changée dans son âme qu’il fut blessé de lui
retrouver le même visage. Il l’en accusa comme d’une hypocrisie.
Elle portait toujours, avec son petit col blanc, la robe de drap gris
uni qu’il aimait tant, qui la gardait jadis à lui comme un bien exclusif ;
cette robe l’irritait maintenant, elle lui semblait un voile de mensonge
jeté sur un corps profané. Car Julius, certainement, n’avait pu tout lui
dire. Rien de ce qu’on appelle décisif ne s’était, sans doute, passé
entre André et Berthe. Mais que s’était-il passé ? Daniel savait
jusqu’à quelles licences peut aller l’impatience d’un amoureux.
Ne s’était-il pas promis d’être magnanime, ou insouciant ? Mais il
ne pouvait dominer son irritation. Il voulait se venger de Berthe, lui
faire de la peine. Il lui rendit son baiser, cependant. Il ne fallait pas
avoir l’air si fâché devant sa mère.
— Vous rentrez toujours à Paris après-demain ? dit Mme Voraud,
quand on se fut mis, tous les quatre, à table.
— Oui, madame, après-demain, répondit Daniel avec beaucoup
de déférence. Il lui plaisait d’exagérer ses prévenances, afin de
montrer à Berthe qu’il avait beaucoup de respect pour cette mère
dont elle n’était pas digne. A vrai dire, toutes ces intentions se
lisaient difficilement dans le ton de ses paroles, mais il se figurait
qu’on les devinait. La plupart des malentendus dont il souffrait
venaient ainsi de ce qu’il se figurait être deviné.
Il se trouva que le déjeuner était très bon et qu’il avait beaucoup
d’appétit. Il sentit grandir en lui un besoin d’optimisme. Il écarta les
soucis qui gênaient sa digestion. Pourquoi supposer des choses que
l’on ne lui avait pas dites et que personne, sans doute, ne lui dirait
jamais ?
Avant qu’on servît le café, il se leva de table et dit en prenant un
grand air de mystère, qu’il exagérait pour masquer son embarras :
« J’aurais une communication…, très importante… à faire à Mlle
Loison. »
Ils allèrent tous deux dans le petit salon, pendant que Berthe
restait à table avec sa mère. Berthe, quand Daniel s’était levé, lui
avait jeté un regard inquiet, vite détourné ; elle avait semblé pâle et
sérieuse, et c’est en remuant à peine les lèvres qu’elle avait répondu
à une question de sa mère. Daniel en eut une grande pitié et résolut
de hâter les confidences, pour que Louise pût aller la rassurer plus
vite.
Ils restèrent, Louise Loison et lui, debout près d’une des hautes
fenêtres :
— Je commence par vous dire que je n’en veux pas à Berthe et
que je lui pardonne tout ce que j’ai appris.
— Vous allez encore me rapporter des histoires absurdes, dit
Louise avec une promptitude maladroite, et qui montrait bien que
Berthe l’avait mise au courant des soupçons de Daniel.
— Non, Louise, dit Daniel un peu agacé… Je vous en prie… Ne
niez pas… Ce sont des choses certaines.
Et il lui raconta les confidences de Julius, surtout préoccupé de
trouver des mobiles généreux aux indiscrétions de son ami, parce
qu’il croyait voir, chaque fois qu’il prononçait son nom, une
expression de blâme dans les yeux de la jeune fille…
— Et vous croyez cela ?
— Oui, oui, dit Daniel, je crois cela. Mais puisque je vous dis que
ça ne fait rien, et que je pardonne tout… Berthe ne m’a pas toujours
connu. Si elle a aimé quelqu’un avant de me connaître, je n’ai pas le
droit de le lui reprocher. Je voudrais que vous lui disiez vous-
même… parce que ça me gêne de lui en parler… que vous lui disiez
qu’elle se tranquillise et que je lui pardonne tout.
— Je veux bien le lui dire. Mais je vous assure que vous n’avez
rien à lui pardonner.
Ils rentrèrent dans la salle à manger et reprirent place autour de
la table. Louise, le visage grave, répondait d’un air distrait à Mme
Voraud tout en coupant méticuleusement du bout d’un couteau à
dessert les pelures de pomme qui restaient dans son assiette.
Berthe eut le bon esprit de se lever la première, et d’aller attendre
son amie, dans le salon à côté.
Quand Louise l’eut suivie, ce fut le tour de Daniel de soutenir la
conversation de Mme Voraud et de lui répondre, complètement au
hasard, sur divers projets de voyage et d’installation. Si Mme Voraud
avait eu quelque chose à lui demander, le moment eût été bien
choisi, car il répondait : oui, avec empressement, et écartait tout
sujet de discussion. Quelques instants après, Louise rentra auprès
d’eux et Daniel se leva pour aller rejoindre Berthe.
Il eut un serrement de cœur en voyant le salon vide. Il ouvrit la
porte de la lingerie. Berthe était sur un fauteuil. Elle avait les yeux
dans un mouchoir minuscule, où elle pleurait, comme une pauvre
petite fille, toutes les larmes de son corps. Il eut tout de suite
l’impression d’une maladresse irrémédiable, d’avoir joué de ses
mains brutales avec un jouet trop délicat. Il se mit à pleurer plus fort
qu’elle, en marchant avec agitation, à pleurer sans retenue, si bien
que sa douleur fuyait peu à peu dans ses sanglots. Mme Voraud
accourut au bruit et vit leurs vilaines figures.
— Qu’est-ce que ça veut dire ? Pourquoi pleure-t-elle ? demanda-
t-elle à Daniel avec sévérité, et comme si lui-même n’eût pas été en
train de pleurer aussi.
Mais il repartait en sanglots plus violents. Berthe se calma la
première, et dit en s’essuyant les yeux :
— Va-t-en, maman. Ça ne te regarde pas.
Mme Voraud ne s’en alla pas et dit encore à Daniel :
— Je veux savoir pourquoi ma fille pleure…
Ce qui fit reprendre à Berthe tout son sang-froid. Elle dit à sa
mère d’un ton décidé :
— Il ne te le dira pas. C’est notre affaire.
Daniel alla l’embrasser pour cette bonne parole. Elle se pencha
sur son épaule. Il embrassa son charmant visage, mouillé de larmes,
et aussi ses yeux plus tendres. Il l’embrassa avec ardeur, sans faire
attention à Mme Voraud.
— Vous savez peut-être pourquoi elle pleure ? demanda Mme
Voraud à Louise Loison, qui, après avoir essayé de la retenir dans la
salle à manger, était venue, elle aussi, dans la lingerie.
Louise Loison répondit à cette question par tous les signes en
usage dans les différentes écoles de mimes pour exprimer la plus
complète ignorance.
— Enfin !… je ne sais pas deviner les énigmes, dit Mme Voraud,
en se contentant, faute de mieux, de ce faible mot de sortie.
Louise Loison la suivit, pensant que les choses s’arrangeraient
mieux si Berthe et Daniel restaient seuls ensemble.
Daniel s’était assis sur le fauteuil. Il avait pris Berthe sur ses
genoux. C’était ainsi, dit une voix maligne, qu’elle venait jadis
s’asseoir sur les genoux d’André. Mais ça lui était bien égal. Une
large affection, qu’il n’avait encore jamais éprouvée, anéantissait
toutes ses hésitations et tous ses scrupules. Cette petite-là, qu’il
avait sur ses genoux, entre ses bras, il sentait bien qu’il ne se
détacherait jamais d’elle.
Il ne voudrait jamais qu’elle eût de la peine. Sa douleur n’était
pas la même que les autres douleurs. C’était une douleur
insupportable. Jamais il ne la quitterait en fermant une porte, avec
l’idée qu’elle était à pleurer derrière. Ce n’était pas adroit d’être ainsi
pitoyable. Il l’eût dominée sans doute, s’il eût paru plus fort. Mais, au
risque de la perdre, il se retournait toujours, pour regarder son
visage, et pour s’assurer qu’il n’était pas désolé.
Dire qu’il l’avait connue, par hasard, il n’y avait pas un an de
cela ! Il avait pensé souvent : je me suis engagé dans cet amour
comme un promeneur sans but entre dans un chemin. C’est vrai
qu’une fois sur le chemin, je n’ai pu retourner sur mes pas ; mais,
quand j’ai pris cette route, aucune nécessité ne m’avait poussé aux
épaules. Et cette Berthe que je ne connaissais pas, il y a un an, elle
est maintenant dans toute ma vie. C’est Berthe.
La première fois qu’ils s’étaient parlé, au bal des Voraud, elle lui
était apparue comme une jeune fille parfaite, comme celle qui le
comprendrait. Et elle n’avait jamais rien compris de lui. Il avait pensé
qu’elle aimerait de lui tous ses goûts, toutes ses ferveurs, toutes ses
amitiés, l’affection qu’il avait pour ses parents, sa sympathie
profonde pour Julius. Et elle n’acceptait presque rien de tout cela.
Il avait cru qu’elle n’avait jamais aimé que lui, et il était prouvé
qu’elle en avait aimé un autre.
Ainsi, rien ne subsistait de ce qui l’avait attiré vers elle, et elle le
retenait cependant.
Il était rare qu’il fût réellement heureux de l’embrasser. Il était las
de ses baisers identiques. Mais il éprouvait une joie certaine à la
tenir enfermée dans ses bras, et à se dire que plus rien jamais ne le
séparerait d’elle.
Et quand Berthe lui demanda : « Alors vous m’aimez toujours ? »
il ne put lui répondre : oui, tellement il le pensait.
Son mariage, qui lui avait toujours paru un événement
irréalisable, lui semblait impossible à rompre maintenant. Il n’osa pas
trop se rassurer cependant, car il savait que le Destin n’aime pas
qu’on ait des certitudes. Il décida qu’il parlerait dès le même soir à
M. Voraud, pour avancer la date et laisser moins de champ aux
malices de l’Imprévu.
— Comme on a été mauvais pour maman ! dit tout à coup Berthe,
en riant… Il faut aller la voir.
Il la serra encore d’une longue étreinte, puis tous deux revinrent
dans la salle à manger où Mme Voraud, un pince-nez aux yeux,
travaillait à une tapisserie.
— C’est calmé ? dit-elle.
— Oui, maman, dit Berthe en l’embrassant… Embrassez-la, dit-
elle à Daniel.
Daniel, sans élan, mais avec beaucoup d’émotion, posa au
hasard un petit baiser rapide sur de la peau de front, sur du sourcil
et un morceau de binocle.
XXIX
ÉPILOGUE
Le lendemain de son mariage, Daniel sortit, vers cinq heures du

soir, de son nouvel appartement, pour aller retenir des places au
Palais-Royal, et pour rapporter à Berthe de ces petites épingles-
neige qui servent à maintenir les cheveux des tempes et du front. Il
avait mis un volumineux pardessus doublé d’astrakan, que la mère
de Berthe lui avait acheté à l’occasion de son mariage. C’était la
première fois qu’il l’endossait, et il arrivait à en être fier, avec un peu
de bonne volonté.
Il ne sentait pas le froid ; mais il avait un peu mal à la tête et mal
au cœur.
Les jeunes époux habitaient un appartement au quatrième étage,
dans la rue Caumartin. On était à deux pas de chez M. Voraud et à
dix minutes de chez M. Henry. Ils avaient une jolie chambre à
coucher en peluche gris argent, une salle à manger, très claire pour
une salle à manger, avec des chaises en imitation de Cordoue et
une grande table carrée. Leur cabinet de toilette était tendu en étoffe
Pompadour. L’appartement comprenait encore trois pièces vides et
décorées à neuf, dont un salon assez vaste, meublé simplement
d’un piano et d’un écran en tapisserie, cadeau de noces d’une vieille
demoiselle, que les Henry avaient connue à Vichy, et qui était noble.
Depuis qu’on était revenu de la campagne, le temps avait
marché avec une rapidité inconcevable. Deux mois s’étaient écoulés
sans qu’on s’en aperçût. Puis on avait dit tout à coup : mais c’est
mardi en quinze ! Et on n’avait même plus eu le temps de compter
les jours. On était arrivé sans pouvoir s’arrêter jusqu’au jeudi du
contrat. Puis on s’était retrouvé d’un bond au mariage civil, à la
mairie de la rue Drouot. Berthe avait une robe de satin gris broché,
et, pour la première fois, un chapeau à brides. Ses amies, et de
vieux oncles, étaient venus lui dire, après les paroles sacramentelles
du maire : Embrasse-moi, madame. Cette plaisanterie rituelle n’avait
pas déridé le garçon de mairie, un homme à boutons de métal, dont
la vie normale se mêlait chaque jour à la vie exceptionnelle de gens
heureux qu’il ne connaissait pas.
Puis, le lendemain, après les fleurs, la musique et l’ahurissement
du mariage religieux, le lunch, chez les Voraud, avait donné lieu à
mille salutations, force présentations, Daniel gardant l’attitude du
jeune homme qu’on envie, obligé d’être amoureux et d’être heureux
pour ne pas contrarier tous ces gens qui s’étaient dérangés, et qui
avaient imaginé à son propos une légende d’amour qu’il n’eût point
osé démentir.
Vers six heures et demie du soir, on leur avait servi à tous deux
un petit dîner substantiel. Puis, ils étaient partis avec mystère,
guettés malicieusement dans l’antichambre par les petits garçons de
quatorze ans. Le coupé au mois de M. Voraud les avait conduits rue
Caumartin.
L’oncle Émile, l’après-midi, n’avait pas manqué de prendre Daniel
à part, pour lui recommander de ne pas brutaliser Berthe. Il avait
employé des expressions anatomiques avec une gravité indécente.
Il avait ajouté que la nuit de noces laisse une impression définitive
dans l’âme d’une jeune femme, et que c’est bien souvent de ce
moment-là que dépend le bonheur de toute une vie.
Personne ne se trouva pour dire à Daniel que c’était lui-même
surtout qu’il ne fallait pas fatiguer et qu’il importait de ne pas courir
constamment au-devant de la satiété.
Quand il était entré dans le lit nuptial, Berthe et lui avaient eu une
seconde de joie véritable, un frémissement de bonheur, à s’enlacer,
à se sentir si près l’un de l’autre, à mêler la chaleur de leurs corps.
Puis Daniel avait gâté cet instant par une hâte maladroite. Il craignait
toujours de ne pas paraître assez pressé.
Berthe s’endormit vers minuit. Daniel tâcha de rester éveillé,
parce qu’il avait peur de ronfler. Jusqu’au petit jour, il dormit en
croyant qu’il ne dormait pas, par petits sommes entrecoupés. Berthe
ne remuait pas. Daniel s’était blotti sur le devant du lit par crainte de
la réveiller. Il était triste. Il lui avait toujours semblé que la nuit de
noces devait se passer dans une sorte d’ivresse paradisiaque. Et
voilà que l’on dormait ! C’était une nuit de noces manquée.
Puis il s’endormit sérieusement, et se réveilla vers dix heures du
matin. Il chercha où il était. La ligne de lumière verticale qu’il voyait
tous les matins entre les rideaux de la fenêtre, n’était pas à sa place
familière. Il regarda les meubles avec stupeur. Il se souvint qu’il était
marié.
Au dehors, un jour d’hiver, humble et résigné, attendait
patiemment qu’on ouvrît la fenêtre. Ce n’était pas la fougue
envahissante des matins d’été. Daniel se retourna et aperçut le dos
de Berthe, qui dormait toujours sans bouger.
Il s’était beaucoup fatigué la veille au soir, et n’avait pas assez
d’énergie pour être heureux. Mais il avait besoin de bonheur et
chercha avidement des raisons d’être content de son sort. La
meilleure qu’il trouva fut qu’il n’aurait rien à faire pendant quinze
jours, et que personne ne lui dirait de travailler.
Que de fois ils avaient parlé, Berthe et lui, de ces deux semaines
de bonheur, où ils vivraient isolés dans leur amour, sans voir
personne, Berthe avec Daniel, Daniel avec Berthe ! C’était leur sujet
de conversation favori, et qu’ils retrouvaient toujours, quand les
autres venaient à manquer.
Ils s’étaient couchés la veille avec l’idée qu’ils ne se lèveraient
que le surlendemain. Daniel pensait maintenant qu’il vaudrait mieux
sortir l’après-midi pour faire un tour ensemble sur le boulevard, et
aller le soir au théâtre.
Il se leva doucement, et gagna le cabinet de toilette. Il se
rafraîchit le visage avec de l’eau froide et de l’eau de Cologne, et
revint se coucher. Il éprouva une sensation agréable en se
retrouvant dans le lit chaud ; mais il s’était un peu refroidi dans le
cabinet de toilette ; Berthe grogna doucement en dormant, et le
repoussa d’un petit coup de pied, le premier coup de pied conjugal.
Il attendit quelques instants, le temps de se réchauffer. Puis il
s’approcha d’elle et, une fois de plus, la prit dans ses bras. Dès qu’il
sentait son amour renaître, il se hâtait naïvement de l’épuiser.
Quelques instants plus tard, Jacqueline, la femme de chambre
de Mme Voraud, qu’on leur avait prêtée pour quelques jours, frappa à
la porte : « Entrez ! » dit fièrement Berthe. Tous deux couchés côte à
côte la regardèrent en riant. Elle leur dit : « Bonjour, monsieur
madame », et demanda s’ils voulaient déjeuner. Elle avait été
chercher deux œufs et deux côtelettes. Fallait-il servir le déjeuner
sur une petite table dans le cabinet de toilette ? Mais ils étaient trop
impatients de déjeuner seul à seul dans leur salle à manger. Berthe,
après une toilette rapide et provisoire, mit un peignoir blanc et tous
deux s’installèrent en face l’un de l’autre, de chaque côté de la
grande table carrée. Le feu était allumé depuis peu, et il faisait un
froid de loup. Ils durent aller chercher tout ce qu’ils possédaient de
manteaux et de fourrures. Sur la nappe neuve, brillante, coupée de
raides cassures, il y avait un huilier et une salière. Le morceau de
gruyère les fit rire, tant il était petit.
Après déjeuner, Berthe rentra dans son cabinet de toilette, et
Daniel, sans en avoir l’air, alla se rendormir sur le lit.
Vers deux heures, un coup de sonnette l’effraya. Il se réveilla tout
honteux, comme s’il eût été en faute. On vit apparaître Mme Voraud,
qui prit son air le plus naturel pour leur dire bonjour. Daniel ne
s’attendait pas à la voir si tôt. On avait dû comploter cette visite à
son insu. Mais il n’en fut pas mécontent. Quand Mme Voraud, au
bout d’une heure, fit mine de s’en aller, on la retint énergiquement de
part et d’autre.
— Restez donc, madame, dit Daniel, il faut que j’aille chercher
des places au théâtre. Vous resterez avec Berthe pendant ce temps-
là.
Il s’en alla à pied au Palais-Royal. Il cherchait les glaces des
devantures, et s’y regardait dans son pardessus neuf, en pensant :
Je suis marié.
Il avait dit qu’il serait absent une demi-heure. Et il s’aperçut qu’il
était parti depuis quarante minutes. Consterné, il prit une voiture
pour rentrer rue Caumartin. Qu’allait-il dire Berthe ? Le lendemain de
son mariage, s’en aller comme ça, et lui manquer de parole !
Il la trouva causant paisiblement avec sa mère.
— Vous avez pris les places ? dit-elle.
— Oui, une baignoire.
— Pourquoi n’avez-vous pas pris deux places ? C’est bien
suffisant.
— J’ai pris une baignoire pour être seuls ensemble. Et puis je
croyais que vous ne vouliez pas être vue…
Mme Voraud se leva pour les laisser seuls. Ils n’osèrent plus la
retenir. Mais au moment où elle mettait son chapeau, Berthe prit
Daniel à part :
— Puisqu’on a quatre places, si on emmenait maman et papa ?
Daniel se donna l’air d’hésiter :
— Si vous voulez.
— Maman dit Berthe, veux-tu venir avec nous au Palais-Royal ?
Mme Voraud répondit :
— Non, mes enfants, vous êtes bien gentils. Je ne veux pas vous
gêner…
— Tu ne nous gênes pas, dit Berthe.
— Allez-y seuls, dit Mme Voraud… Mais savez-vous ce que vous
devriez faire pour faire plaisir à papa ? Venez dîner chez nous, au
lieu d’aller au restaurant vous abîmer l’estomac.
— Il n’y a personne chez nous ? dit Berthe.
— Il n’y a que papa et moi. Nous étions tout tristes de déjeuner
seuls à midi.
Berthe regarda Daniel.
— Qu’est-ce que vous en dites ?
Daniel l’attira à lui et l’embrassa tendrement, comme s’il eût été
prêt aux plus grands sacrifices.
— Tout ce que tu voudras.
— Eh bien, c’est entendu ? dit Mme Voraud. Je rentre chez moi
pour dire que vous dînez. N’arrivez pas trop tard si vous voulez
arriver de bonne heure au théâtre.
A peine avait-elle refermé la porte, que Daniel et Berthe
s’embrassèrent fougueusement, comme s’ils s’abandonnaient à une
ardeur longtemps contenue.
Il l’entraîna doucement dans le cabinet de toilette et l’assit sur
ses genoux.
— Tu m’aimes ?
— Je t’aime, dit-elle.
— Je suis sorti trois quarts d’heure, dit-il. Et j’ai trouvé le temps
terriblement long après toi. Nous ne nous quitterons plus jamais.
— Jamais, jamais, dit Berthe…
— Je resterai toujours, toujours avec toi ?
— Tu resteras, toujours, toujours avec moi… Laissez-moi
m’habiller, fit-elle. Il ne faut pas arriver trop tard chez maman.
Quand ils arrivèrent chez, Mme Voraud il semblait qu’ils
revenaient d’un long voyage. Berthe dit à la femme de chambre :
Bonjour, Louise. Et la femme de chambre répondit : Bonjour,
mademoiselle. Cette méprise fit beaucoup rire la nouvelle mariée.
Elle la raconta à Daniel, qui n’eut pas de peine à en rire beaucoup,
lui aussi.
Daniel se sentait plus heureux. Il regardait Berthe et pensait qu’il
était chez les Voraud et qu’elle était sa femme. Ils allèrent dans la
chambre de Berthe. Ils aperçurent son lit de jeune fille, et firent une
folie que Daniel jugea exceptionnellement perverse et qui eut pour
résultat de lui faire perdre encore une fois sa bonne humeur.
Après le dîner, on parla à nouveau des deux places libres dans la
baignoire et M. Voraud se laissa tenter.
On jouait un vaudeville en trois actes, le Porte-Allumettes. Daniel
ne prit aucun plaisir au premier acte ; préoccupé de savoir si Berthe
s’amusait. A l’entr’acte, il se pencha vers elle :
— Vous ne vous amusez pas, fit-il tristement.
— Si, répondit-elle avec bienveillance.
Il se pencha plus près d’elle :
— Tu m’aimes toujours ?
— Oui, dit-elle.
Il l’emmena au foyer, en lui donnant le bras, et lui nomma sur le
grand panneau peint les anciens acteurs du Palais-Royal, dont les
noms sont d’ailleurs écrits en toutes lettres.
Berthe, parmi les spectateurs, aperçut un drôle de couple : une
petite femme en cheveux, en robe jaune, et un grand monsieur, dans
une redingote un peu surannée. Daniel parut les trouver
excessivement amusants. Chaque fois qu’il passait auprès du
couple, il se penchait contre Berthe et feignait d’éclater, si bien qu’à
la fin elle ne rit plus du tout.
Après le théâtre, M. et Mme Voraud résistèrent à l’offre du souper.
— Non, non, dit tranquillement M. Voraud, amusez-vous, les
jeunes ! Les vieux préfèrent rentrer se coucher.
Daniel conduisit Berthe dans un restaurant du boulevard, très
élégant. Ils étaient seuls dans une salle. Berthe disait : J’aurais
préféré un endroit où il y a du monde… Daniel, désespéré, s’écria :

Textbook Laboratory Experiments in Information Retrieval Sample Sizes Effect Sizes and Statistical Power Tetsuya Sakai Ebook All Chapter PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Textbook Laboratory Experiments in Information Retrieval Sample Sizes Effect Sizes and Statistical Power Tetsuya Sakai Ebook All Chapter PDF

Uploaded by

Copyright:

Available Formats

Laboratory Experiments in Information

Retrieval: Sample Sizes, Effect Sizes,

Sample Sizes for Clinical, Laboratory and Epidemiology

Biota Grow 2C gather 2C cook Loucas

Introduction to Information Retrieval Manning

Introduction to Statistical Methods Design of

Basic Theory and Laboratory Experiments in Measurement

Mobile Information Retrieval 1st Edition Prof. Fabio

Information Retrieval Technology: 14th Asia Information

Think Data Structures: Algorithms and Information

Library of Congress Control Number: 2018951773

© Springer Nature Singapore Pte Ltd. 2018

Classical statistical significance tests are useful to some extent in information

Tokyo, Japan Tetsuya Sakai

3.2 Two-Way ANOVA Without Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Keywords p-values · Significance testing · Statistical power · Statistical

© Springer Nature Singapore Pte Ltd. 2018 1

1.1 Principles of Significance Testing

1.1.1 Sample Means and Population Means

In laboratory experiments for evaluating two or more systems, we often compare

Fig. 1.1 Sample means and population means

an example of a probability density function (pdf), which returns a nonnegative

1.1.2 Hypotheses, Test Statistics, and P -Values

means x̄ and ȳ are different. To answer your question, in classical significance

severely criticised for decades, will be discussed in Chap. 5.

p-value Test statistic

Two-sided critical value for α

1.1.3 α, β, and Statistical Power

Table 1.1 α, β and statistical power

1.2 Well-Known Probability Distributions

1.2.1 Law of Large Numbers

Consider a random variable x whose probability density function (pdf) is given by

In particular, E(x), i.e., the expectation of g(x) = x, is a constant called the

number of times. Moreover, consider g(x) = (x − E(x))2 : we call V (x) =

Also, consider the following marginal probability density functions:

f (x, y) = fx (x)fy (y) . (1.4)

converges9 to the population mean μ [12].

if a researcher is interested in the population mean of (say) nDCG scores for an IR

1.2.2 Normal Distribution and the Central Limit Theorem

We denote such a normal distribution by N(μ, σ 2 ).

Pr{u ≥ zinv (P )} = Pr{u ≤ zinv (1 − P )} = P . (1.9)

10 With Microsoft Excel, z (P ) can be obtained as NORM.S.INV(1 − P ); with R, it can be

Theorem 3 If x obeys N(μ, σ 2 ), then for any constants a and b, ax + b obeys

a1 μ1 + a2 μ2 + · · · + an μn = μ/n + μ/n + · · · + μ/n = μ , (1.10)

Corollary 2 If x1 , x2 , . . . , xn are independent and obey an identical normal distri-

Proof This is a corollary from Corollary 1 and Theorem 2.

Furthermore, let the sample variance be:

where (a) is the gamma function given by:

and V (χ 2 ) = 2φ, respectively.

The aforementioned distribution is denoted by t (φ).

where B(a, b) is the beta function given by:

17 With Microsoft Excel, t

x̄1• − x̄2• − (μ1 − μ2 )

and Vp is a pooled variance given by [12]:

x̄1• − x̄2• − (μ1 − μ2 )

The aforementioned distribution is denoted by F (φ1 , φ2 ).

φ1 /2 −(φ1 +φ2 )/2

1.3 Less Well-Known Probability Distributions

1.3.1 Noncentral t Distribution

Corollary 8 If x1 , x2 , . . . , xn are independent and each obey N (μ, σ 2 ), then the

20 It is known that the

(where u obeys N (0, 12 )) and Vp = χ 2 σ 2 /(n1 + n2 − 2) (where χ 2 obeys χ 2 (n1 +

and χ 2 is a random variable that obeys χ 2 (φ).

where (a) is the gamma function given by: