You are on page 1of 67

Applied medical statistics 1st Edition

Jingmei Jiang
Visit to download the full and correct content document:
https://ebookmass.com/product/applied-medical-statistics-1st-edition-jingmei-jiang/
Applied Medical Statistics

ffirs.indd 1 30-03-2022 21:10:12


ffirs.indd 2 30-03-2022 21:10:12
Applied Medical Statistics

Jingmei Jiang

Department of Epidemiology and Biostatistics,


Institute of Basic Medical Sciences,
Chinese Academy of Medical Sciences/School of Basic Medicine,
Peking Union Medical College,
Beijing, China

ffirs.indd 3 30-03-2022 21:10:13


This edition first published 2022
© 2022 John Wiley and Sons, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or
otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from
this title is available at http://www.wiley.com/go/permissions.

The right of Jingmei Jiang to be identified as the authors of this work has been asserted in accordance
with law.

Registered Office
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office
9600 Garsington Road, Oxford, OX4 2DQ, UK

For details of our global editorial offices, customer services, and more information about Wiley
products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some
content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty


The contents of this work are intended to further general scientific research, understanding, and
discussion only and are not intended and should not be relied upon as recommending or promoting
scientific method, diagnosis, or treatment by physicians for any particular patient. In view of ongoing
research, equipment modifications, changes in governmental regulations, and the constant flow of
information relating to the use of medicines, equipment, and devices, the reader is urged to review
and evaluate the information provided in the package insert or instructions for each medicine,
equipment, or device for, among other things, any changes in the instructions or indication of usage
and for added warnings and precautions. While the publisher and authors have used their best efforts
in preparing this work, they make no representations or warranties with respect to the accuracy
or completeness of the contents of this work and specifically disclaim all warranties, including
without limitation any implied warranties of merchantability or fitness for a particular purpose. No
warranty may be created or extended by sales representatives, written sales materials or promotional
statements for this work. The fact that an organization, website, or product is referred to in this work
as a citation and/or potential source of further information does not mean that the publisher and
authors endorse the information or services the organization, website, or product may provide or
recommendations it may make. This work is sold with the understanding that the publisher is not
engaged in rendering professional services. The advice and strategies contained herein may not be
suitable for your situation. You should consult with a specialist where appropriate. Further, readers
should be aware that websites listed in this work may have changed or disappeared between when
this work was written and when it is read. Neither the publisher nor authors shall be liable for any
loss of profit or any other commercial damages, including but not limited to special, incidental,
consequential, or other damages.

Library of Congress Cataloging-in-Publication Data


Names: Jiang, Jingmei, 1958- author.
Title: Applied medical statistics / Jingmei Jiang.
Description: Hoboken, NJ : John Wiley & Sons, Inc., 2022. | Includes
bibliographical references and index.
Identifiers: LCCN 2021021097 (print) | LCCN 2021021098 (ebook) | ISBN
9781119716709 (hardback) | ISBN 9781119716778 (pdf) | ISBN 9781119716792
(epub) | ISBN 9781119716822 (ebook)
Subjects: LCSH: Medicine--Research--Statistical methods--Textbooks. |
Medical statistics--Textbooks. | Biometry--Textbooks.
Classification: LCC R853.S7 J53 2022 (print) | LCC R853.S7 (ebook) | DDC
610.72/7--dc23
LC record available at https://lccn.loc.gov/2021021097
LC ebook record available at https://lccn.loc.gov/2021021098

Cover image: © Andriy Onufriyenko/Getty Images


Cover design by Wiley

Set in 9.5/12pt STIXTwoText by Integra Software Services Pvt. Ltd, Pondicherry, India

10 9 8 7 6 5 4 3 2 1

ffirs.indd 4 30-03-2022 21:10:13


v

Contents

Preface xiii
Acknowledgments xv
About the Companion Website xvii

1 What is Biostatistics 1
1.1 Overview 1
1.2 Some Statistical Terminology 2
1.2.1 Population and Sample 2
1.2.2 Homogeneity and Variation 3
1.2.3 Parameter and Statistic 4
1.2.4 Types of Data 4
1.2.5 Error 5
1.3 Workflow of Applied Statistics 6
1.4 Statistics and Its Related Disciplines 6
1.5 Statistical Thinking 7
1.6 Summary 7
1.7 Exercises 8

2 Descriptive Statistics 11
2.1 Frequency Tables and Graphs 12
2.1.1 Frequency Distribution of Numerical Data 12
2.1.2 Frequency Distribution of Categorical Data 16
2.2 Descriptive Statistics of Numerical Data 17
2.2.1 Measures of Central Tendency 17
2.2.2 Measures of Dispersion 26
2.3 Descriptive Statistics of Categorical Data 31
2.3.1 Relative Numbers 31
2.3.2 Standardization of Rates 34
2.4 Constructing Statistical Tables and Graphs 38
2.4.1 Statistical Tables 38
2.4.2 Statistical Graphs 40
2.5 Summary 47
2.6 Exercises 48

3 Fundamentals of Probability 53
3.1 Sample Space and Random Events 54

ftoc.indd 5 30-03-2022 21:10:36


vi Contents

3.1.1 Definitions of Sample Space and Random Events 54


3.1.2 Operation of Events 55
3.2 Relative Frequency and Probability 58
3.2.1 Definition of Probability 59
3.2.2 Basic Properties of Probability 59
3.3 Conditional Probability and Independence of Events 60
3.3.1 Conditional Probability 60
3.3.2 Independence of Events 60
3.4 Multiplication Law of Probability 61
3.5 Addition Law of Probability 62
3.5.1 General Addition Law 62
3.5.2 Addition Law of Mutually Exclusive Events 62
3.6 Total Probability Formula and Bayes’ Rule 63
3.6.1 Total Probability Formula 63
3.6.2 Bayes’ Rule 64
3.7 Summary 65
3.8 Exercises 65

4 Discrete Random Variable 69


4.1 Concept of the Random Variable 69
4.2 Probability Distribution of the Discrete Random Variable 70
4.2.1 Probability Mass Function 70
4.2.2 Cumulative Distribution Function 71
4.2.3 Association Between the Probability Distribution and Relative Frequency
Distribution 72
4.3 Numerical Characteristics 73
4.3.1 Expected Value 73
4.3.2 Variance and Standard Deviation 74
4.4 Commonly Used Discrete Probability Distributions 75
4.4.1 Binomial Distribution 75
4.4.2 Multinomial Distribution 80
4.4.3 Poisson Distribution 82
4.5 Summary 87
4.6 Exercises 87

5 Continuous Random Variable 91


5.1 Concept of Continuous Random Variable 92
5.2 Numerical Characteristics 93
5.3 Normal Distribution 94
5.3.1 Concept of the Normal Distribution 94
5.3.2 Standard Normal Distribution 96
5.3.3 Descriptive Methods for Assessing Normality 99
5.4 Application of the Normal Distribution 102
5.4.1 Normal Approximation to the Binomial Distribution 102
5.4.2 Normal Approximation to the Poisson Distribution 105
5.4.3 Determining the Medical Reference Interval 108
5.5 Summary 109
5.6 Exercises 110

ftoc.indd 6 30-03-2022 21:10:37


Contents vii

6 Sampling Distribution and Parameter Estimation 113


6.1 Samples and Statistics 114
6.2 Sampling Distribution of a Statistic 114
6.2.1 Sampling Distribution of the Mean 115
6.2.2 Sampling Distribution of the Variance 120
6.2.3 Sampling Distribution of the Rate (Normal Approximation) 122
6.3 Estimation of One Population Parameter 124
6.3.1 Point Estimation and Its Quality Evaluation 124
6.3.2 Interval Estimation for the Mean 126
6.3.3 Interval Estimation for the Variance 130
6.3.4 Interval Estimation for the Rate (Normal Approximation Method) 131
6.4 Estimation of Two Population Parameters 132
6.4.1 Estimation of the Difference in Means 132
6.4.2 Estimation of the Ratio of Variances 136
6.4.3 Estimation of the Difference Between Rates (Normal Approximation
Method) 139
6.5 Summary 141
6.6 Exercises 141

7 Hypothesis Testing for One Parameter 145


7.1 Overview 145
7.1.1 Concepts and Procedures 146
7.1.2 Type I and Type II Errors 150
7.1.3 One-sided and Two-sided Hypothesis 152
7.1.4 Association Between Hypothesis Testing and Interval Estimation 153
7.2 Hypothesis Testing for One Parameter 155
7.2.1 Hypothesis Tests for the Mean 155
7.2.1.1 Power of the Test 156
7.2.1.2 Sample Size Determination 160
7.2.2 Hypothesis Tests for the Rate (Normal Approximation Methods) 162
7.2.2.1 Power of the Test 163
7.2.2.2 Sample Size Determination 164
7.3 Further Considerations on Hypothesis Testing 164
7.3.1 About the Significance Level 164
7.3.2 Statistical Significance and Clinical Significance 165
7.4 Summary 165
7.5 Exercises 166

8 Hypothesis Testing for Two Population Parameters 169


8.1 Testing the Difference Between Two Population Means: Paired
Samples 170
8.2 Testing the Difference Between Two Population Means: Independent
Samples 173
8.2.1 t-Test for Means with Equal Variances 173
8.2.2 F-Test for the Equality of Two Variances 176
8.2.3 Approximation t-Test for Means with Unequal Variances 178
8.2.4 Z-Test for Means with Large-Sample Sizes 181
8.2.5 Power for Comparing Two Means 182

ftoc.indd 7 30-03-2022 21:10:37


viii Contents

8.2.6 Sample Size Determination 183


8.3 Testing the Difference Between Two Population Rates (Normal
Approximation Method) 185
8.3.1 Power for Comparing Two Rates 186
8.3.2 Sample Size Determination 187
8.4 Summary 188
8.5 Exercises 189

9 One-way Analysis of Variance 193


9.1 Overview 193
9.1.1 Concept of ANOVA 194
9.1.2 Data Layout and Modeling Assumption 195
9.2 Procedures of ANOVA 196
9.3 Multiple Comparisons of Means 204
9.3.1 Tukey’s Test 204
9.3.2 Dunnett’s Test 206
9.3.3 Least Significant Difference (LSD) Test 209
9.4 Checking ANOVA Assumptions 211
9.4.1 Check for Normality 211
9.4.2 Test for Homogeneity of Variances 213
9.4.2.1 Bartlett’s Test 213
9.4.2.2 Levene’s Test 215
9.5 Data Transformations 217
9.6 Summary 218
9.7 Exercises 218

10 Analysis of Variance in Different Experimental Designs 221


10.1 ANOVA for Randomized Block Design 221
10.1.1 Data Layout and Model Assumptions 223
10.1.2 Procedure of ANOVA 224
10.2 ANOVA for Two-factor Factorial Design 229
10.2.1 Concept of Factorial Design 230
10.2.2 Data Layout and Model Assumptions 233
10.2.3 Procedure of ANOVA 234
10.3 ANOVA for Repeated Measures Design 240
10.3.1 Characteristics of Repeated Measures Data 240
10.3.2 Data Layout and Model Assumptions 242
10.3.3 Procedure of ANOVA 243
10.3.4 Sphericity Test of Covariance Matrix 245
10.3.5 Multiple Comparisons of Means 248
10.4 ANOVA for 2 × 2 Crossover Design 251
10.4.1 Concept of a 2 × 2 Crossover Design 251
10.4.2 Data Layout and Model Assumptions 252
10.4.3 Procedure of ANOVA 254
10.5 Summary 256
10.6 Exercises 257

ftoc.indd 8 30-03-2022 21:10:37


Contents ix

11 χ2 Test 261
11.1 Contingency Table 262
11.1.1 General Form of Contingency Table 263
11.1.2 Independence of Two Categorical Variables 264
11.1.3 Significance Testing Using the Contingency Table 265
11.2 χ2 Test for a 2 × 2 Contingency Table 266
11.2.1 Test of Independence 266
11.2.2 Yates’ Corrected χ2 test for a 2 × 2 Contingency Table 269
11.2.3 Paired Samples Design χ2 Test 269
11.2.4 Fisher’s Exact Tests for Completely Randomized Design 272
11.2.5 Exact McNemar’s Test for Paired Samples Design 275
11.3 χ2 Test for R × C Contingency Tables 276
11.3.1 Comparison of Multiple Independent Proportions 276
11.3.2 Multiple Comparisons of Proportions 278
11.4 χ2 Goodness-of-Fit Test 280
11.4.1 Normal Distribution Goodness-of-Fit Test 281
11.4.2 Poisson Distribution Goodness-of-Fit Test 283
11.5 Summary 284
11.6 Exercises 285

12 Nonparametric Tests Based on Rank 289


12.1 Concept of Order Statistics 289
12.2 Wilcoxon’s Signed-Rank Test for Paired Samples 290
12.3 Wilcoxon’s Rank-Sum Test for Two Independent Samples 295
12.4 Kruskal-Wallis Test for Multiple Independent Samples 299
12.4.1 Kruskal-Wallis Test 299
12.4.2 Multiple Comparisons 301
12.5 Friedman’s Test for Randomized Block Design 303
12.6 Further Considerations About Nonparametric Tests 306
12.7 Summary 306
12.8 Exercises 306

13 Simple Linear Regression 311


13.1 Concept of Simple Linear Regression 311
13.2 Establishment of Regression Model 314
13.2.1 Least Squares Estimation of a Regression Coefficient 314
13.2.2 Basic Properties of the Regression Model 316
13.2.3 Hypothesis Testing of Regression Model 317
13.3 Application of Regression Model 321
13.3.1 Confidence Interval Estimation of a Regression
Coefficient 321
13.3.2 Confidence Band Estimation of Regression Model 322
13.3.3 Prediction Band Estimation of Individual Response Values 323
13.4 Evaluation of Model Fitting 325
13.4.1 Coefficient of Determination 325
13.4.2 Residual Analysis 326

ftoc.indd 9 30-03-2022 21:10:37


x Contents

13.5 Summary 327


13.6 Exercises 328

14 Simple Linear Correlation 331


14.1 Concept of Simple Linear Correlation 331
14.1.1 Definition of Correlation Coefficient 331
14.1.2 Interpretation of Correlation Coefficient 334
14.2 Hypothesis Testing of Correlation Coefficient 336
14.3 Confidence Interval Estimation for Correlation Coefficient 338
14.4 Spearman’s Rank Correlation 340
14.4.1 Concept of Spearman’s Rank Correlation Coefficient 340
14.4.2 Hypothesis Testing of Spearman’s Rank Correlation Coefficient 342
14.5 Summary 342
14.6 Exercises 343

15 Multiple Linear Regression 345


15.1 Multiple Linear Regression Model 346
15.1.1 Concept of the Multiple Linear Regression 346
15.1.2 Least Squares Estimation of Regression Coefficient 349
15.1.3 Properties of the Least Squares Estimators 351
15.1.4 Standardized Partial-Regression Coefficient 351
15.2 Hypothesis Testing 352
15.2.1 F-Test for Overall Regression Model 352
15.2.2 t-Test for Partial-Regression Coefficients 354
15.3 Evaluation of Model Fitting 356
15.3.1 Coefficient of Determination and Adjusted Coefficient of
Determination 356
15.3.2 Residual Analysis and Outliers 357
15.4 Other Aspects of Regression 359
15.4.1 Multicollinearity 359
15.4.2 Selection of Independent Variables 361
15.4.3 Sample Size 364
15.5 Summary 364
15.6 Exercises 364

16 Logistic Regression 369


16.1 Logistic Regression Model 370
16.1.1 Linear Probability Model 371
16.1.2 Probability, Odds, and Logit Transformation 371
16.1.3 Definition of Logistic Regression 373
16.1.4 Inference for Logistic Regression 375
16.1.4.1 Estimation of Model Coefficient 375
16.1.4.2 Interpretation of Model Coefficient 378
16.1.4.3 Hypothesis Testing of Model Coefficient 380
16.1.4.4 Interval Estimation of Model Coefficient 382
16.1.5 Evaluation of Model Fitting 385
16.2 Conditional Logistic Regression Model 388

ftoc.indd 10 30-03-2022 21:10:37


Contents xi

16.2.1 Characteristics of Conditional Logistic Regression Model 390


16.2.2 Estimation of Regression Coefficient 390
16.2.3 Hypothesis Testing of Regression Coefficient 393
16.3 Additional Remarks 394
16.3.1 Sample Size 394
16.3.2 Types of Independent Variables 394
16.3.3 Selection of Independent Variables 395
16.3.4 Missing Data 395
16.4 Summary 395
16.5 Exercises 396

17 Survival Analysis 399


17.1 Overview 400
17.1.1 Concept of Survival Analysis 400
17.1.2 Basic Functions of Survival Time 402
17.2 Description of the Survival Process 405
17.2.1 Product Limit Method 405
17.2.2 Life Table Method 408
17.3 Comparison of Survival Processes 410
17.3.1 Log-Rank Test 410
17.3.2 Other Methods for Comparing Survival Processes 413
17.4 Cox’s Proportional Hazards Model 414
17.4.1 Concept and Model Assumptions 415
17.4.2 Estimation of Model Coefficient 417
17.4.3 Hypothesis Testing of Model Coefficient 419
17.4.4 Evaluation of Model Fitting 420
17.5 Other Aspects of Cox’s Proportional Hazard Model 421
17.5.1 Hazard Index 421
17.5.2 Sample Size 421
17.6 Summary 422
17.7 Exercises 423

18 Evaluation of Diagnostic Tests 431


18.1 Basic Characteristics of Diagnostic Tests 431
18.1.1 Sensitivity and Specificity 433
18.1.2 Composite Measures of Sensitivity and Specificity 435
18.1.3 Predictive Values 438
18.1.4 Sensitivity and Specificity Comparison of Two Diagnostic Tests 440
18.2 Agreement Between Diagnostic Tests 443
18.2.1 Agreement of Categorical Data 444
18.2.2 Agreement of Numerical Data 447
18.3 Receiver Operating Characteristic Curve Analysis 448
18.3.1 Concept of an ROC Curve 449
18.3.2 Area Under the ROC Curve 450
18.3.3 Comparison of Areas Under ROC Curves 453
18.4 Summary 456
18.5 Exercises 457

ftoc.indd 11 30-03-2022 21:10:37


xii Contents

19 Observational Study Design 461


19.1 Cross-Sectional Studies 462
19.1.1 Types of Cross-Sectional Studies 462
19.1.2 Probability Sampling Methods 462
19.1.3 Sample Size for Surveys 466
19.1.4 Cross-Sectional Studies for Clues of Etiology 468
19.2 Cohort Studies 469
19.2.1 Measures of Association in Cohort Studies 469
19.2.2 Sample Size for Cohort Studies 470
19.3 Case-Control Studies 472
19.3.1 Measures of Association in Case-Control Studies 472
19.3.2 Sample Size for Case-Control Studies 473
19.4 Summary 474
19.5 Exercises 475

20 Experimental Study Design 477


20.1 Overview 478
20.1.1 Basic Components of an Experimental Study 478
20.1.2 Principles of Experimental Study Design 480
20.1.3 Blinding Procedures in Clinical Trials 482
20.2 Completely Randomized Design 483
20.2.1 Concept of Completely Randomized Design 483
20.2.2 Sample Size for Completely Randomized Design 485
20.3 Randomized Block Design 486
20.3.1 Concepts of Randomized Block Design 486
20.3.2 Sample Size for Randomized Block Design 488
20.4 Factorial Design 489
20.5 Crossover Design 491
20.5.1 Concepts of Crossover Design 491
20.5.2 Sample Size for 2 × 2 Crossover Design 492
20.6 Summary 493
20.7 Exercises 493

Appendix 495
References 549
Index 557

ftoc.indd 12 30-03-2022 21:10:37


xiii

Preface

Over the past few decades, biomedical data have proliferated rapidly, and opportunities
have arisen to use this data to improve human health. Burgeoning methods, such as
machine learning techniques, have emerged to respond to the rapid growth of the
volume of data, and to exploit data in an effective and efficient manner. These methods
were founded on statistical learning theory, which is an expansion of traditional
statistics. Therefore, cultivating basic statistical thinking capability plays an important
and fundamental role in mastering these state-of-the-art methods and embracing the
upcoming big data era, which makes a course of introductory biostatistics an indispens-
able part of the curriculum for medical students. However, as a branch of mathematics,
statistics is characterized by hierarchically organized concepts, but a conceptual under-
standing of statistics is not always intuitive, which makes biostatistics an obstacle that
is regarded as a burden for most medical students. During almost 30 years of teaching
statistics at the Chinese Academy of Medical Sciences & Peking Union Medical College,
China, I have experienced too many occasions on which generations of students, both
undergraduate and postgraduate, have felt that they are struggling to grasp the essence
of statistical concepts and the implications of mathematical formulas, and to master
complex analytical methods. Moreover, their motivation to learn biostatistics has also
been dampened by abstruse formulas and derivation processes. Therefore, a reader-
friendly text that can provide sufficient help for developing statistical thinking and
building propositional knowledge, as well as understanding and mastering analytical
skills, is of great necessity, which was my motivation for writing this book.
Applied Medical Statistics is an introductory-level textbook written for postgraduate
students in the human life-science field, with most topics also being suitable for under-
graduate medical students. The ultimate objective of this book is to provide help in
developing “habits of mind” for statistical thinking, and to establish a trade-off bet-
ween mathematical derivation and know-how application among medical students.
The most distinctive features of this book are summarized as follows: First, emphasis
is placed on the most basic probability theory at the start of the book because, as the
theoretical pillar for almost all statistical methods, strengthening these fundamental
concepts is of great importance for laying a solid theoretical foundation for under-
standing sub­sequent chapters. However, for students to benefit from a practical and
intuitive understanding of principles, rather than presenting abstract concepts, I have
minimized the mathematical sophistication, and introduced content in a user-friendly
style to nurture interest and motivate learning. Second, I have based most of the

Applied Medical Statistics, First Edition. Jingmei Jiang.


© 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc.
Companion website: www.wiley.com\go\jiang\appliedmedicalstatistics

fpref.indd 13 30-03-2022 21:10:59


xiv Preface

working examples on research projects that I have conducted or participated in, and
such real-world settings, in my view, are more helpful for stimulating students’
interest, as well as helping them to learn how to use statistical procedures in practice.
Finally, although this is an elementary applied statistics textbook, it covers some com-
monly used advanced statistical techniques, such as survival analysis and logistic
regression. I also discuss fundamental issues in research design, and the inclusion of
this content will greatly enhance the applicability and benefit to students who need to
reference this book while performing day-to-day medical research.
I have organized the content of this book in a cohesive manner that links all the rel-
evant foundation concepts as building blocks. Chapter 1 starts with an introduction to
the basic concepts of biostatistics, and a section called “statis­tical thinking” strengthens
the importance of statistical thinking in solving real-world problems. Chapter 2 con-
tains an introduction to the basic concepts and application of some fundamental sum-
mary statistics. Moreover, it also covers how to organize data and display data using
graphical methods. Chapters 3 to 5 are compact, and provide background supporting
information to enable students to understand the basic rationale of biostatistics, in
addition to laying a theoretical foundation for subsequent chapters. Chapter 3 con-
tains the development of the basic principles of probability, with suitable examples.
Chapter 4 covers the fundamental concepts of random variables and discrete proba-
bility distribution, including binomial distribution, multinomial probability distribu-
tion, and Poisson distribution. Chapter 5 briefly introduces the most commonly used
continuous probability distributions: mainly normal and standard normal distributions.
Chapter 6 mainly focuses on an introduction to the sampling distribution, as well as
parameter estimation, and plays a unique role in linking descriptive statistics to infer-
ential statistics. This chapter starts the formal discussion of the theoretical background,
as well as the application of inferential statistics. Chapters 7 to 10 contain the basic
principles of hypothesis testing and the elementary parametric hypothesis testing
methods for normally distributed data in two-sample and multiple-sample scenarios,
such as the t-test and analysis of variance methods. The common requirement for
implementing these methods is the assumption that the underlying population should
be normally distributed. Chapter 11 contains an introduction to the fundamental con-
cepts of hypothesis testing methods for categorical data, the chi-square test, and
Fisher’s exact test, which are widely used in statistical analysis. Chapter 12 contains an
overview of some of the most well-known non-parametric tests suitable for scenarios
in which assumptions of normality can be relaxed. Chapters 13 and 15 contain intro-
ductions to extensively used models and techniques for exploring the association bet-
ween risk or predictor factors and continuous response variables. Chapter 13 mainly
focuses on the basic concepts and application of simple linear regression, and Chapter
15 covers its extension: multiple linear regression. Additionally, Chapter 14 contains
an introduction to simple correlation and rank correlation, which measure the strength
of the relationship between two variables. Chapters 16 and 17 contain an introduction
to some essential analysis techniques for modeling the binary and time-to-event
response variables, such as unconditional and conditional logistic regression, and the
Cox proportional hazards model used in processing time-to-event data. Chapter 18
then covers the most commonly used statistical evaluation indices and methods in
diagnostic tests. Chapters 19 and 20, as the concluding chapters of this textbook, con-
tain a discussion of methods for design and sample size estimation issues for observa-
tional and experimental studies.

fpref.indd 14 30-03-2022 21:10:59


xv

Acknowledgments

I am grateful for the support I received from many people and institutions during the
writing of this book. First and foremost, I express my deepest gratitude to an expert
and consultant team, which included professors Youshang Zhou, Songlin Yu, Konglai
Zhang, and Hui Li, all of whom are well-known Chinese statisticians and epidemiolo-
gists. Their unconditional support and encouragement at every stage of the writing of
this book made it possible for me to complete this work.
I am also grateful for the immense help that I received from my colleagues at the
School of Basic Medicine of Peking Union Medical College. Professor Tao Xu and Dr.
Fang Xue deserve special acknowledgement for providing assistance through conduct-
ing a professional review of my work, and their constructive comments greatly
improved the manuscript. I also want to acknowledge help from Doctors Wei Han,
Zixing Wang, Yaoda Hu, and Haiyu Pang, who provided assistance in the production
of this book through copyediting, reviewing, and correcting many subtle errors.
Fruitful discussions with them also improved how the manuscript treated certain
topics.
In particular, I appreciate the help of my post-graduate students in putting together
this book. Peng Wu help me in organizing much of the material and analyzing the data
in the examples; Ning Li and Cuihong Yang produced accurate figures and diagrams;
Yubing Shen and Luwen Zhang checked the accuracy of all the formulae; Yali Chen
and Lei Wang constructed the index and checked the accuracy of terminology and ref-
erence sources; Jin Du and Yujie Zhao checked the answers to exercises; and Wentao Gu
improved the quality of the mathematical formulas. Without their help, this work
would have been far more difficult to complete.
Much of the motivation of writing comes from teaching and supervising post-grad-
uate students at Peking Union Medical College. I am grateful for their inquisitive ques-
tions and useful feedback on a draft version of this manuscript, which allowed me to
improve the final version.
I express my gratitude to the research projects from which I obtained the data and
background for the examples and exercises; these projects were funded by the
National Natural Science Foundation of China, Ministry of Science and Technology
Fund, Chinese Academy of Medical Sciences Inno­vation Fund for Medical Sciences,
Cancer Research UK, UK Medical Research Council, and US National Institutes of
Health.

Applied Medical Statistics, First Edition. Jingmei Jiang.


© 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc.
Companion website: www.wiley.com\go\jiang\appliedmedicalstatistics

flast.indd 15 30-03-2022 21:11:19


xvi Acknowledgments

I would like express my deep and sincere gratitude to managing editor Kimberly
Monroe-Hill for the professional guidance, coordinating effort and continued support
during the entire drafting and publication process. I also wish to thank commissioning
editor James Watson for assisting us in many ways with this book. I would also like to
thank Arthi Kangeyan and Dilip Varma, the content refinement specialists, for the
professional help in getting the manuscript ready for production.
I thank the School of Basic Medicine of Peking Union Medical College for making
available all the support that I needed in the writing process.
Thanks are owed to Dr. Maxine Garcia, Dr. Jennifer Barrett, and the team at Edanz
Group China for their dedicated and professional language editing support.
Finally, I thank my family for their understanding and encouragement while I was
writing this book.

flast.indd 16 30-03-2022 21:11:19


xvii

About the Companion Website

This book is accompanied by a companion website:


www.wiley.com\go\jiang\appliedmedicalstatistics
The website includes the solutions manual and data sets.

Applied Medical Statistics, First Edition. Jingmei Jiang.


© 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc.
Companion website: www.wiley.com\go\jiang\appliedmedicalstatistics

flast.indd 17 30-03-2022 21:11:19


flast.indd 18 30-03-2022 21:11:19
1

What is Biostatistics?

CONTENTS
1.1 Overview 1
1.2 Some Statistical Terminology 2
1.2.1 Population and Sample 2
1.2.2 Homogeneity and Variation 3
1.2.3 Parameter and Statistic 4
1.2.4 Types of Data 4
1.2.5 Error 5
1.3 Workflow of Applied Statistics 6
1.4 Statistics and Its Related Disciplines 6
1.5 Statistical Thinking 7
1.6 Summary 7
1.7 Exercises 8

1.1 Overview

Data are present everywhere in our lives, and almost all types of scientific research
have to deal with the collection, description, or analysis of data. This makes statistics
one of the most powerful methodologies across all disciplines for exploring the
unknown world. Statistics is a discipline on its own and has a wide spectrum of the-
ories, methods, and applications. A prerequisite for discussing the theory and applica-
tion of statistics is the definition and statement of its objectives. According to
Merriam–Webster’s Collegiate Dictionary, statistics is “a branch of mathematics dealing
with the collection, analysis, interpretation, and presentation of masses of numerical
data.” According to the Random House College Dictionary, it is “the science that deals
with the collection, classification, analysis, and interpretation of information or data.”
According to The New Oxford English–Chinese Dictionary, it is “the practice or science
of collecting and analyzing numerical data in large quantities, especially for the purpose
of inferring proportions in a whole from those in a representative sample.” Although
there are some differences among these definitions, each definition implies that
statistics is a science of data and uses the theory of mathematical statistics to make
inferences.

Applied Medical Statistics, First Edition. Jingmei Jiang.


© 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc.
Companion website: www.wiley.com\go\jiang\appliedmedicalstatistics

c01.indd 1 30-03-2022 21:15:52


2 1 What is Biostatistics?

The application of statistical theories and methods to medical research fields is termed
“medical statistics,” or more broadly, biostatistics when applied to life sciences.
There are two branches of biostatistics based on its functions: (i) statistical descrip-
tion is concerned with the organization, summarization, and description of data; and
(ii) statistical inference is concerned with the use of sample data to make inferences
about the characteristics of a larger set of data. This division of descriptive and inferen-
tial statistics helps us to establish a progressive learning framework for statistics.
However, this division is not always necessary in scientific activities where the two
branches complement each other in deepening our knowledge of the real world.
We briefly review the development of biostatistics. In London in 1603, the Bills of
Mortality began to be published weekly, which is generally considered to mark the
beginning of biostatistics. Since then, related theories have continued to emerge, and the
early twentieth century ushered in the peak of development of biostatistics. Several pio-
neers played a crucial role in the development of the theoretical framework and applica-
tions of biostatistics. G.J. Mendel (1822–1884), the father of modern genetics, used
probability rules to discover the basic laws of biogenetics in the 1860s. He is considered to
be one of the first to apply mathematical methods to biology. K. Pearson (1857–1936), the
founding father of modern statistics, established the world’s first department of statistics
at University College London in 1911, and developed several key statistical theories (e.g.,
measure of correlation and χ2 distribution). W.S. Gosset (1876–1937) proposed the t dis-
tribution and t-test in 1908, which laid the foundation for the sampling distribution of
the sample mean, and signified the establishment of small sample theory and method-
ology. R.A. Fisher (1890–1962) developed statistical significance tests, and various
­sampling distributions, and established the experimental design method and related
statistical analysis technique. These were collected in Design of Experiments, which was
first published in 1935. With the efforts of these pioneers and other statisticians, after
hundreds of years, a complete theoretical system of biostatistics had formed.
At the present time, the development of biostatistics is being driven by the unprece-
dented and still growing range of life science applications using advances in com-
puting power and computer technology, and new formats of data that continue to
emerge. Despite this, the ideas of basic statistics have not changed: to make an infer-
ence about a population based on information contained in a sample from that
population and to provide an associated measure of goodness for the inference.

1.2 Some Statistical Terminology

In this text, we aim to explain basic statistical methods commonly applied in biomed-
ical research. Before this, we provide an overview of several statistical terms, which are
the premise for further learning.

1.2.1 Population and Sample


A population (statistical population or target population) is a certain or some character-
istics of study subjects that are our target of interest. Population is usually denoted by
X (also called random variable), and can be viewed as a dataset. The basic unit that
constitutes the population is called the individual.

c01.indd 2 30-03-2022 21:15:52


1.2 Some Statistical Terminology 3

The dataset that defines a population is typically large or conceptual. The former
suggests a finite population because it has a finite number of individuals regardless of
how large it is. For example, the dataset of the heights of all the college freshman boys
in Beijing in 2020 is a finite population (though very large). When the dataset only
exists conceptually, we call it an infinite population, for example, the weights of infants
and the antihypertensive treatment effects of a certain drug. The sampling theory and
statistical inference principle introduced in this text are based on an infinite population.
A sample, denoted by X 1 , X 2 ,…, X n (n is the sample size), is a subset of data selected
from a population. The purpose of obtaining a sample is to infer about the characteris-
tics of its underlying unknown population.
The process of drawing a sample from a population is termed sampling. In practice,
depending on the research objectives and feasibility, samples can be obtained using
random or non-random sampling. A random sample is obtained through probability sam-
pling. In this text, we generally assume the use of a simple random sample in which each
individual in the population has an equal chance of being sampled. Non-random sampling
relies on the subjective judgment of the researcher and is beyond the scope of this text.
Note the following: (i) The concept of population is different in biomedical research
and statistical terminology. In biomedical research, the term “study population” (or
study subject) typically refers to a group of humans or other species of organism, whereas
the characteristics of the study subjects are the population we are interested in statistics.
For example, in a study of blood glucose concentrations among 3-year-old children, all
children of that age are regarded as the study population. However, from a statistical
point of view, all blood glucose concentrations in children of that age constitute the
population of interest. (ii) Although the dataset of a population is typically large, the
essential difference between the population and the sample is not the amount of data we
have, but the objective of the research. If the objective is to provide a description only,
then the data we have can be regarded as a population, regardless of how small it is,
whereas if the objective is to draw an inference, then we need to clarify what population
we are interested in, and consider how to obtain a representative sample, or how good
the sample at hand is. The representativeness of the sample of the population is a very
important basis for a reasonable inference.

1.2.2 Homogeneity and Variation


In statistics, homogeneity means the similarity among individuals within a population.
In fact, without homogeneity, we can rarely define a population. The individual differ-
ences in a homogenous population are termed variation.

Example 1.1 Survey of the height of college freshman boys in Beijing in 2020.
Homogeneity: College freshman boys in Beijing in 2020.
Variation: Individual differences in height.

Example 1.2 Study of the antihypertensive treatment effects of a drug.


Homogeneity: Hypertensive patients taking this drug.
Variation: Individual differences in the treatment effects.

From Examples 1.1 and 1.2, we can see that homogeneity refers to similarities in the
nature, condition, or background of individuals in a population. The mission of statistics

c01.indd 3 30-03-2022 21:15:54


4 1 What is Biostatistics?

can be interpreted as describing the features of a homogenous population and identi-


fying the heterogeneity of different populations. Variation is an inherent attribute of life
sciences, and biomedical researchers should learn to use statistical methods to reveal
the laws of biological phenomena in the context of variation.

1.2.3 Parameter and Statistic


A descriptive measure of the characteristics calculated on a population is called a
population parameter, or simply, a parameter, generally denoted by the Greek letter θ.
For example, in the survey of the height of freshman boys, the population mean (average
height, typically denoted by µ ) is a parameter. However, it is difficult to have data for
the entire population most of the time, so a sample is used instead. Correspondingly, a
descriptive measure based on a sample is called a sample statistic, or simply, a statistic.
For example, if we draw a sample (typically a random sample) from the population and
calculate the average height, the sample mean is a statistic and is typically denoted by
x . The mathematical definition and roles of statistics are elaborated on in Chapter 6.
Because most populations are theoretical, the parameters are constants that are usually
unknown, whereas the statistics are calculated from samples, which are indeterminate,
and the values of statistics could be different for different samples.

1.2.4 Types of Data


Data are the representation or observation of the characteristic population. Data can
be classified as numerical and categorical, depending on their properties:
(1) Numerical data, also known as quantitative data, are the data expressed in num-
bers and are obtained by measuring each research subject’s indices, that is, the
quantity or number of things. Numerical data differentiate themselves from other
number-form data types as a result of the ability to perform arithmetic operations
using these numbers. We can subdivide numerical data into two types:
Continuous data occur when data can be measured on a continuum or scale, i.e.,
there is a possible value between any other two values.
Most numerical data in biomedical research are continuous or can be viewed as con-
tinuous. For instance, if we conduct a survey on the health and nutritional status of
7-year-old boys in a less developed region in 2020, the measurement results of their
heights (cm), weights (kg), and hemoglobin (g/L) can be viewed as continuous data
because their values can assume, in theory, any value in a certain range.
Discrete data occur when the data can only take certain values. The possible values
of discrete data are generally integers. For instance, if we also collect data on the
number of cases of cold (0,1, 2,…) in 2020 for the 7-year-old boys, then they are discrete
data.
(2) Categorical data, also known as qualitative data, include two subtypes:
Unordered categorical data are obtained by dividing research subjects into two or
more unordered groups. For instance, we can denote a man and woman as 1 and 2 for
sex and denote A, B, O, and AB as 1, 2, 3, and 4 for blood type. Unlike numerical data,
the numbers representing different categories do not have mathematical meanings.

c01.indd 4 30-03-2022 21:15:54


1.2 Some Statistical Terminology 5

Individual values do not have a quantitative difference if they belong to the same cat-
egory and have qualitative differences if they belong to different categories.
Ordinal categorical data are obtained by dividing research subjects into orderings of an
attribute. They are not measured; nonetheless, they have a potential ordering. For in-
stance, the treatment effect of a disease can be ordered as cured, effective, improved, inef-
fective, and deteriorated. The laboratory test results of urine protein determination can be
ordered as −, ±, +, + +, and + + +. We can also use numerical values such as 1, 2, 3,… to
represent the potential grades, although the numbers do not have numerical meanings.
Numerical data and categorical data are not set in stone; under certain conditions,
they can be exchanged according to the research objectives and statistical methods
used. For example, in a large survey on hypertension, the blood pressure values col-
lected are numerical data. If we want to estimate the prevalence of hypertension, we
could group survey participants according to whether they are hypertensive (1 for
hypertensive and 0 for not hypertensive), and the data become unordered categorical
data (binary data). If we want to know the degrees of hypertension, the blood pressure
measurements can be reclassified into ordinal categorical data. Conversely, categorical
data can also be changed to numerical data. For example, if we want to compare the
epidemic of hypertension in different regions, we could use binary data to calculate
the hypertension prevalence p, which ranges from 0 to 1 and belongs to the scope of
numerical data. In the study design, we should collect as much raw data (original
data) as possible in numerical form to minimize the loss of information and allow for
flexible transformation.

1.2.5 Error
Error refers to the difference between the observed value and real value (parameter).
The following formula defines the relation between them:
x = θ + ε,  (1.1)
where x denotes the observed value; θ denotes the real value, theoretically; and ε
denotes the error, which can represent a random error or systematic error.
(1) A random error, as the name suggests, is completely random, that is, the magnitude
and sign of ε cannot be predetermined, and the scope ε ∈ (−∞, + ∞) . A random
error is caused by the influence of many uncertain factors in the actual observation or
measurement process.
As shown in Formula 1.1, a random error can be interpreted in many ways. For
example, if x is the measured value in an experiment, then ε = x − θ reflects the
measurement error in the results of each measurement. Additionally, the sampling
error is the most typical type of random error. If x is a sample statistic, then ε = x − θ
reflects the difference between statistic x and the parameter θ resulting from the sam-
pling process, which is fundamental to the study of statistical inference introduced in
Chapter 6.
(2) A systematic error, also known as bias in epidemiology, is another type of error that
has a fixed magnitude and directional systematic deviation from a real number, that
is, ε = a (a ≠ 0), where a is a constant. A systematic error is caused by the influence

c01.indd 5 30-03-2022 21:15:56


6 1 What is Biostatistics?

of certain factors, for example, an uncorrected instrument, the sensory disturbance


of the measurer, or high or low standards in evaluating a treatment effect.
Random errors are unavoidable but could manifest some laws of regularity in
some conditions. The study and application of the law of random errors is one of
the most important elements of statistics. In practice, random and systematic
errors often coexist, both requiring considerations in the study design and data
analysis.

1.3 Workflow of Applied Statistics

The following four steps in applied statistical workflow are indispensable in practice:
Statistical design: This marks the beginning of scientific research, and is directly
responsible for the accuracy and reliability of the research results. Statistical design
should be conducted with specific research objectives and domain knowledge. This
means that good research design is inevitably based on interactions between domain
experts and statisticians. Two categories of research design exist in general, observa-
tional design and experimental design, which we discuss in Chapters 19 and 20,
respectively.
Data collection: Data collection is used to obtain the raw data required by research
through a reasonable and reliable approach. The collection of representative data is impor-
tant for obtaining reliable conclusions. Regardless of which method is used, the accuracy
and integrity of the data should be given high priority.
Statistical analysis: The next step is the management and analysis of the raw data
according to the research objectives and types of data. This step typically includes the
statistical description, statistical inference, and (or) statistical modeling for mining the
information hidden in the data.
Statistical reporting: After all the steps are executed, the analysis results are dis-
played. Appropriate statistical tables and graphs can be used to enhance the presenta-
tion of results. Final conclusions and suggestions are drawn, guided by domain
knowledge. A key feature of statistical reporting is that all conclusions are probabilistic.

1.4 Statistics and Its Related Disciplines

The discipline of statistics does not stand alone. Instead, it is closely related to the
development of other disciplines.
Statistics and medicine: Statistics not only helps to solve practical problems, but
also promotes its own development during the process. Its application to the biomed-
ical sciences is a typical demonstration of this. With the further understanding of data
in the twenty-first century, evidence-based medicine, precision medicine, and other
quantitative methods will provide a broader space for applying statistics.
Statistics and mathematics: Statistics is a branch of mathematics. The
mathematical basis of statistics is the theory of probability and calculus. However, this
does not mean that learning statistics must be based on knowledge of advanced math-
ematics. In fact, the objective of learning statistics is not to master complicated

c01.indd 6 30-03-2022 21:15:56


1.6 Summary 7

mathematical proofs but the application of statistical thinking and methods to solve
problems that arise in scientific research.
Statistics and computer science: Modern statistics cannot be separated from
developments in computer science. The field of statistics has benefited greatly from
advances in computing power. In the digital era, computer science and information
technology are as important to statistics as the theory of probability. Computer software
has become an important auxiliary tool for statistical analysis. The conclusions are
largely the same using different statistical software, even if the numerical results have
minor differences. To avoid any distraction caused by these technical issues in learning
statistical ideas and methods, in this text, we present results mainly using SPSS, among
other alternatives.

1.5 Statistical Thinking

Statistical thinking includes applying rational thinking and statistical science to critically
evaluate data and the resultant correct and false inferences. How does statistical thinking
play its role in scientific research practice? To answer this question, we must note that
inferences based on sample data are almost always subject to error because a sample does
not provide an exact image of the population.
The population is typically a theoretical and conceptual truth of interest. The sci-
ence of statistics helps us to establish a methodological framework or workflow to
draw inferences about the unknown characteristics of the population using the sample
of limited data at hand, based on one or a few assumptions. The statistical inference
process is an important part of the scientific method. Inference based on experimental
or observational data is first used to develop a theory about some phenomenon. Then
the theory is tested against additional sample data.
Errors may occur in the inference process based on a sample. What matters is how we
quantify and evaluate the error. Statistics connects the quantification of errors with the
measurement of the reliability of inference using probability. This connection provides a
solid theoretical basis for reasonable statistical inference.
Statistics builds a bridge between abstract theoretical concepts and the solution of
specific problems. It enables researchers to make inferences (estimates and decisions
about the target population) with a known measurement of reliability. With this
ability, a researcher can make intelligent decisions and inferences from data; that is,
statistics helps researchers to think critically about their results.
We end this chapter with remarks from the famous statistician, C.R. Rao.
All knowledge is, in the final analysis, history.
All sciences are, in the abstract, mathematics.
All judgments are, in their rationale, statistics.

1.6 Summary

The learning objective of this chapter is to understand some basic concepts in


statistics and the role of statistics in biomedical research, which are the basis for
future learning.

c01.indd 7 30-03-2022 21:15:56


8 1 What is Biostatistics?

Statistics is a science about data, and its basic characteristic is that it is a quantitative
science.
Two branches, statistical description and statistical inference, constitute the main
content of statistics.
The application of statistics to biomedical research generally includes the following
four steps: statistical design, data collection, statistical analysis, and statistical
reporting.
Statistical thinking includes the application of rational thinking and statistical sci-
ence to critically evaluate data and make inferences from them.

1.7 Exercises

1. Suppose you were so interested in the waist circumference of your schoolmates


that you prepared a tape measure in a statistics class and measured the waist cir-
cumference of all your classmates who were present. Answer the following
questions:
(a) Decide whether the data you obtained is a sample or population? For what
research objectives should it be considered a sample or population?
(b) If it is considered a sample, what is the population you are drawing an infer-
ence about? How representative of the population is it?
(c) How do you determine the homogeneity of your population? Is there heteroge-
neity? If yes, how can you improve the homogeneity? Is there variation? What
may lead to this variation?
(d) Are there errors in the obtained data? What are the random errors and
systematic errors? Can you tell the difference between them? Can you, and how
do you, minimize the errors?
(e) What steps do you need to follow to complete a report on your survey?
2. Choose a quantitative research article in clinical medicine, basic medicine, public
health, or any biomedical research topic you are interested in and answer the fol-
lowing questions:
(a) What is the population and how is it defined from the perspectives of the
research and statistics, respectively? What are the differences between the con-
cepts of population using different perspectives?
(b) Is the sample presented in the research a random sample? What are the advan-
tages of a random sample and non-random sample?
(c) Illustrate the relationship between the population and sample, and between
homogeneity and variation using your selected paper.
(d) Is there any factor that may lead to random or systematic errors in the research?
How do you distinguish them? How have they been minimized? Can you think
of ways to further minimize the errors?
(e) What data are collected? What are the types of data? How do you determine the
type of data? Which type of data contains more information? Do these types of
data allow for further transformation?
(f) How many steps are involved in the statistical plan? What are the specific roles
of these steps and what is the relation between these steps?

c01.indd 8 30-03-2022 21:15:56


1.7 Exercises 9

(g) Are the conclusions obtained from the research correct? How does the
knowledge of statistics learned from this chapter help you with critical
thinking?
(h) Can you follow the conceptual path as laid out by the research and use statistical
critical thinking to solve a problem that interests you in your daily life? Try to
create a statistical design as you deepen your knowledge and skills through
further learning.

c01.indd 9 30-03-2022 21:15:56


c01.indd 10 30-03-2022 21:15:56
11

Descriptive Statistics

CONTENTS
2.1 Frequency Tables and Graphs 12
2.1.1 Frequency Distribution of Numerical Data 12
2.1.2 Frequency Distribution of Categorical Data 16
2.2 Descriptive Statistics of Numerical Data 17
2.2.1 Measures of Central Tendency 17
2.2.2 Measures of Dispersion 26
2.3 Descriptive Statistics of Categorical Data 31
2.3.1 Relative Numbers 31
2.3.2 Standardization of Rates 34
2.4 Constructing Statistical Tables and Graphs 38
2.4.1 Statistical Tables 38
2.4.2 Statistical Graphs 40
2.5 Summary 47
2.6 Exercises 48

In the previous chapter, we learned that there are two branches in statistics –
­description and inference. Statistical description, as the basis of statistical inference,
provides a way to organize and summarize data in a meaningful and intuitive manner.
In this chapter, we introduce several basic statistical tools for describing data. These
tools include tables and graphs that rapidly convey a concise presentation or visual
picture of the data, as well as numerical measures that describe certain characteristics
of the data. The appropriate tool depends on the type of data (numerical or categorical)
that we want to describe.

Example 2.1 In a survey on the physiological characteristics of school-age children


in a certain region in 2010, 153 10-year-old girls were randomly selected, and several
physiological indicators were measured and recorded. The raw data of girls’ height are
shown in Table 2.1.
The data shown in Table 2.1 are raw data presented in an unorganized manner.
Although it would be easy to find the highest and lowest values in this sample, it
would be very difficult to extract more useful information from this set of data
without organizing them using descriptive statistical techniques.

Applied Medical Statistics, First Edition. Jingmei Jiang.


© 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc.
Companion website: www.wiley.com\go\jiang\appliedmedicalstatistics

c02.indd 11 30-03-2022 21:17:07


12 2 Descriptive Statistics

Table 2.1 Raw data on the height (cm) of 153 10-year-old girls.

139.9 141.6 150.5 146.0 143.0 148.0 142.5 152.0


131.3 147.9 147.6 145.0 150.0 149.0 138.0 137.8
128.8 143.8 138.7 142.0 138.0 142.0 151.0 134.9
128.6 141.8 143.3 131.8 139.0 140.0 149.0 135.1
130.5 140.1 139.8 132.0 142.6 144.0 147.0 147.0
142.2 135.9 149.1 136.7 140.0 137.0 138.2 137.0
144.0 141.3 148.5 132.0 146.0 148.0 148.0 149.6
129.4 143.8 153.5 143.0 145.2 134.6 146.7 146.8
138.5 146.6 143.3 137.4 145.0 143.0 149.0 142.0
140.8 138.0 144.2 126.7 130.0 154.0 138.0 158.0
136.5 141.3 154.0 146.0 154.0 140.0 156.0 129.0
135.1 135.0 146.0 148.5 141.2 145.0 139.0 151.0
140.2 139.5 140.0 157.0 149.0 137.0 152.0 143.0
138.2 141.2 135.0 131.9 134.0 142.9 140.5 142.0
139.0 138.7 138.0 148.0 134.7 153.0 146.0 137.6
143.3 143.6 125.3 132.0 139.5 135.0 154.0 131.0
137.6 142.7 131.0 146.0 132.4 151.0 152.0 147.0
140.5 143.8 138.0 152.0 141.0 157.7 149.0 143.0
134.2 142.0 140.0 134.0 139.0 154.4 134.1 141.3
145.0

2.1 Frequency Tables and Graphs

The frequency distribution table and the frequency distribution diagram are starting
points for summarizing data and provide a basis for observing the characteristics of the
distribution. They are made by grouping observations and obtaining the frequency
distribution by counting the number of observations in each group.

2.1.1 Frequency Distribution of Numerical Data


Most of numerical data can be regarded as continuous, theoretically taking on an infi-
nite number of values. Thus, one is essentially always dealing with a frequency distri-
bution tabulated by group (i.e., the data are grouped into several non-overlapping
intervals). Therefore, the number of observations falling into each interval, which we
call the frequency, can be determined. We use these numbers to construct the fre-
quency distribution table and the frequency distribution diagram.
In the following steps, we use the data shown in Table 2.1 to illustrate the proce-
dures for organizing a frequency table.

c02.indd 12 30-03-2022 21:17:07


2.1 Frequency Tables and Graphs 13

Steps to follow in constructing a frequency distribution table:


(1) Calculate the range of the data. Range ( R) is defined as the difference between the
largest and smallest observation values in the sample
R = x max − x min . (2.1)

For example, using the data shown in Table 2.1, x max = 158.0 and x min = 125.3. The
range is then calculated as

R = 158.0 − 125.3 = 32.7 


Thus, the range of height for the girls is about 33 cm.
(2) Determine the specific group interval. The range is usually divided into 5 to 15
intervals of equal width, and the number of intervals can be determined according
to the sample size. A small number of groups is suitable for a small dataset, and vice
versa. The lowest (or first) interval boundary should be located below the smallest
value, and the interval width should be chosen so that no observation can fall on
the lowest boundary. In practice, the width of the interval is usually set to be 1/10
of the range. Using the data presented in Table 2.1, the width of each interval is
32.7 / 10 = 3.27 , which is approximately equal to 3 cm. Except the last interval, the
intervals are generally denoted by the lower boundary followed by “–”, which indi-
cates that the values within the interval are no smaller than the lower boundary of
that interval but that they are smaller than the lower boundary of the next interval.
For instance, “125.0–” means that the range is [125.0, 128.0) . The last interval is
closed (i.e., “155.0–158.0,” corresponding to a range of [155.0, 158.0] ).
(3) For each group interval, count the number of observations that fall in that group.
The construction of the frequency distribution table, which consists of the group
interval and the count corresponding to each group (the first two columns of Table
2.2), is then completed.
For convenience of comparison, we also include the relative frequency (proportion)
in Table 2.2.

frequency
relative frequency = . (2.2)
total number of measurements

Because the denominators used to calculate the relative frequencies (sample size n)
are the same for all groups, the distribution shapes of frequency and relative frequency
are similar.
Table 2.2 shows the frequencies and relative frequencies for the heights of the girls
in Example 2.1. The total of all the frequencies is n, and the total of all the relative
frequencies is 100%.
The highest frequency (28) is in the “140.0–” interval, and the relative frequency for
this interval is also the highest (18.3%). Moreover, we could easily find that the relative
frequency distribution is symmetric around the “140.0–” group. The advantage of the
relative frequency is that it allows us to freely compare frequency distributions across
two or more groups of individuals.

c02.indd 13 30-03-2022 21:17:10


14 2 Descriptive Statistics

Table 2.2 Frequency distribution for height (cm) of


153 10-year-old girls in Example 2.1.

Group Frequency Relative


frequency (%)
(1) (2) (3)

125.0– 2 1.3
128.0– 6 3.9
131.0– 9 5.9
134.0– 15 9.8
137.0– 26 17.0
140.0– 28 18.3
143.0– 20 13.1
146.0– 20 13.1
149.0– 12 7.8
152.0– 11 7.2
155.0–158.0 4 2.6
Total 153 100.0

Figure 2.1 Histogram for the height of the 153 10-year-old girls in Example 2.1.

After the frequencies (or relative frequencies) have been obtained, a more intuitive
way to depict the frequency (or relative frequency) distribution is to plot a histogram.
Figure 2.1 presents a histogram using the data shown in Table 2.2.
The x-axis in Figure 2.1, which shows the height of the 10-year-old girls, is divided
into group intervals commencing with the interval of 125.0– and proceeding in inter-
vals of equal size (3.0 cm). The y-axis (frequency) gives the number of the 153 readings
that fall in each interval. The data appear to have a bell-shaped distribution. About 28
of the 153 girls, or 18.3%, had a height of 140.0–142.9 cm. This group interval contains

c02.indd 14 30-03-2022 21:17:10


2.1 Frequency Tables and Graphs 15

the highest frequency, and the intervals tend to contain smaller numbers of measure-
ments as height gets smaller or larger.
Histograms can be used to display either the frequency or the relative frequency of
the measurements falling into the group intervals. The graph is generally constructed
by dividing the x-axis into intervals of equal width. A rectangle is then drawn over
each interval, such that the height of the rectangle is proportional to the fraction of
the total number of measurements falling in each interval. Notice that when con-
structing a frequency table (or a histogram), using many intervals with a small amount
of data results in little summarization and presents a picture very similar to the data
in their original form. Now, computer software can be used to produce any desired
histograms. These software packages all produce histograms that conform to widely
agreed-upon constraints on scaling, the number of intervals used, the widths of inter-
vals, and the like.

Example 2.2 Referring to Example 2.1, the triglyceride levels of the 153 10-year-
old girls were also measured. The distribution of the triglyceride data is shown in
Table 2.3.
Table 2.3 could be constructed in the same way described for building Table 2.2,
except for the newly added cumulative relative frequency, which is obtained simply by
adding all previous relative frequencies to the relative frequency for the current group.
For example, if we wished to know what percentage of the girls had a triglyceride level
under 1.2 mmol/L, we would sum all frequencies from the smallest group 0.3– through
the third group 0.9–. We would then arrive at the answer of 71.2%.
Cumulative relative frequency is useful in determining medians, percentiles, and
other quantiles, which will be used frequently in subsequent analysis.
Likewise, we could convert the frequency distribution table to the corresponding histo-
gram, as shown in Figure 2.2.
Figure 2.2 shows the same pattern observed in the frequency table, characterized by
a peak that is off center toward the left side and a long tail on the right side. We call

Table 2.3 Frequency distribution of triglycerides (mmol/L) in 153 10-year-old girls.

Group Frequency Relative frequency Cumulative relative


(%) frequency (%)
(1) (2) (3) (4)

0.3– 22 14.4 14.4


0.6– 42 27.4 41.8
0.9– 45 29.4 71.2
1.2– 26 17.0 88.2
1.5– 11 7.2 95.4
1.8– 4 2.6 98.0
2.1– 2 1.3 99.3
2.4–2.7 1 0.7 100.0
Total 153 100.0 —

c02.indd 15 30-03-2022 21:17:10


16 2 Descriptive Statistics

Figure 2.2 Histogram of triglycerides in 153 10-year-old girls.

distributions with such characteristics positively (or right) skewed distributions.


Conversely, if the peak of the distribution is pulled to the right and the long tail extends
to the left, the distribution is said to be negatively (or left) skewed.

2.1.2 Frequency Distribution of Categorical Data


For categorical data with finite distinct values, we first need to define classes according
to the research objective and the characteristics of the data. Each observed value
belongs to only one class. The frequency table and frequency diagram can then be
constructed by counting the exact number of values within each class.

Example 2.3 A total of 1668 adult (≥30 years old) Kazakh residents in the Altay
region of Xinjiang, China, were surveyed on their blood pressure in 2013. According
to the “Chinese Guidelines for the Prevention and Treatment of Hypertension,” the
subjects were classified into five distinct groups: normal, high normal, and grades 1,
2, and 3 hypertension. Using this classification, a frequency table of blood pressure
status was constructed, as shown in Table 2.4.
In Table 2.4, we see that more than one-third (34.8%) of the Kazakh residents partic-
ipating in the survey had high normal blood pressure, and 44.0% of them had grade
1–3 hypertension. The corresponding bar chart is shown in Figure 2.3.
Figure 2.3 is an example of the depiction of the frequency distribution of categorical
data. In a bar chart, the frequencies (or relative frequencies) of all groups are repre-
sented by the height of bars of equal width. The bar chart can be vertical or horizontal,
which does not change the meaning.
The frequency distribution tables and diagrams for both numerical data and
categorical data have clear advantages, especially when the sample size is large:
(1) The distribution of the data can be intuitively presented, which is very important
because this distribution plays a determinant role in choosing the subsequent
statistical analysis methods.

c02.indd 16 30-03-2022 21:17:11


2.2 Descriptive Statistics of Numerical Data 17

Table 2.4 Frequency distribution of blood pressure group for 1668 adult Kazakh residents.

Blood pressure Frequency Relative Cumulative Cumulative relative


group frequency (%) frequency frequency (%)
(1) (2) (3) (4) (5)

Normal 353 21.2 353 21.2


High normal 580 34.8 933 56.0
Grade 1 389 23.3 1322 79.3
hypertension
Grade 2 212 12.7 1534 92.0
hypertension
Grade 3 134 8.0 1668 100.0
hypertension
Total 1668 100.0 — —

Figure 2.3 Bar chart of blood pressure group for 1668 adult Kazakh residents.

(2) Central tendency and dispersion, which are two important characteristics of data
distribution, can be shown, and outliers can also be observed.
(3) When the sample size is large, the relative frequency of each group can be used to
estimate probability (the concept of probability will be introduced in Chapter 3).

2.2 Descriptive Statistics of Numerical Data

2.2.1 Measures of Central Tendency


Although frequency tables and diagrams can be used to visually depict the characteris-
tics of a distribution, they are usually not adequate for the purpose of making inferences.
Before making inferences about a population on the basis of information contained in a
sample and measuring how good the inferences are, we need to rigorously define the

c02.indd 17 30-03-2022 21:17:12


18 2 Descriptive Statistics

quantities used to summarize information about a sample. In this section, we introduce


three commonly used measures to describe the location or center of a sample. They are
the arithmetic mean, median, and geometric mean.
1. Arithmetic Mean
The most widely utilized measure of central tendency is the arithmetic mean (or simply
the mean).
(1) Calculating the mean with an original dataset (the direct method)

Definition 2.1 Let x1 , x 2 ,…, x n be a set of n measurements. The sample mean is


calculated as follows:

1 1 n
x= ( x 1 + x 2 +  + x n ) = ∑x i ,  (2.3)
n n i=1

where x refers to the sample mean, Σ means “the sum of,” and the subscripts and
superscripts to Σ indicate that we sum the values from i = 1 to i = n.
1
This formula can also be written as x = ∑x i .
n i
Usually, we cannot measure the population mean µ , which is the unknown constant
that we want to estimate using the sample mean x .

Example 2.4 A physician collected an initial measurement of hemoglobin (g/L)


after the admission of 10 inpatients to a hospital’s department of cardiology. The
hemoglobin measurements were
139, 158, 120, 112, 122, 132, 97, 104, 159, and 129.
Calculate the mean hemoglobin level.
Solution
Based on Definition 2.1 and the hemoglobin records shown above, we have:

1 1 1272
x i = ×(139 + 158 +  + 129) =
n∑
x= = 127.2
i
10 10

Therefore, the mean level of the first hemoglobin measurement after admission for
the 10 inpatients was 127.2 g/L.
(2) Calculating the mean with a frequency-table dataset
When large datasets are organized into frequency tables or presented as grouped data,
there is a shortcut method to calculate the mean:

f1 xm1 + f2 xm2 +  + fk xmk +  + f g xmg ∑ fk xmk


x= = k =1g , (2.4)
f1 + f2 +  + fk +  + f g
∑ fk
k =1

c02.indd 18 30-03-2022 21:17:14


2.2 Descriptive Statistics of Numerical Data 19

where g means the number of groups fk is the frequency, and xmk is the midpoint value
of the kth group (i.e., (upper bound of the kth group + lower bound of the kth group) / 2),
and ∑ fk = n .
k
Formula 2.4 is also called the weighted method, and the weight is fk / n .
Obviously, with larger group frequencies, the midpoint value of the group makes a
greater contribution in the calculation of the mean. The method in Formula 2.3 can
actually be considered a special case of the weighted method, where the weight of the
observed values is 1 / n .

Example 2.5 Calculate the mean height of the 153 10-year-old girls using the
weighted method with data from frequency table in Example 2.1.
Solution
We first determine the midpoint value for each group (Column 3 of Table 2.5). Then,
the mean can be obtained using Formula 2.4:

∑ fk xmk 2 ×126.5 + 6 ×129.5 +  + 4 ×156.5 21778.5


k
x= = = = 142.3
∑ fk 2 + 6 + + 4 153
k

Thus, the mean height of the girls is 142.3 cm.


If we calculate the mean directly using the raw data (Formula 2.3), the result is
142.0—a difference of 0.3. The reason for this difference is that the weighted method
assumes that the midpoint value for each group in the frequency table is exactly the
average of that group, which, in most cases, is merely an approximation. In spite of
this, such a trivial difference is negligible in practice. Note that statistical analysis

Table 2.5 Calculation of the mean height (cm) of 153 10-year-old


girls, using the weighted method.

Group k Frequency f k Midpoint value x mk f k x mk


(1) (2) (3) (4)

125.0– 2 126.5 253.0


128.0– 6 129.5 777.0
131.0– 9 132.5 1192.5
134.0– 15 135.5 2032.5
137.0– 26 138.5 3601.0
140.0– 28 141.5 3962.0
143.0– 20 144.5 2890.0
146.0– 20 147.5 2950.0
149.0– 12 150.5 1806.0
152.0– 11 153.5 1688.5
155.0–158.0 4 156.5 626.0
Total 153 — 21,778.5

c02.indd 19 30-03-2022 21:17:19


20 2 Descriptive Statistics

packages employ the direct method to calculate the mean even if the sample size is
large, which can greatly improve the calculation accuracy. Nevertheless, knowing the
frequency distribution of the data is important because capturing the characteristics of
the distribution through the frequency table is crucial for subsequent analyses.
The advantages of the mean are: (i) using all the data values in the calculation; (ii)
being algebraically defined and thus mathematically manageable; and (iii) having a
known sampling distribution (Chapter 6). The disadvantage of the mean is also
obvious: it may be distorted by outliers and by skewed data.
2. Median
The arithmetic mean is very sensitive to very large or very small values. When these
values are present, the mean may not be representative of the location of the great
majority of the sample points. The median, another measure of central tendency, is a
value that divides all ordered values equally into two parts, thus more accurately
reflects the central position of an ordered list of observations. The formula to calculate
the median is presented below.
(1) Calculating the median with an original dataset

Definition 2.2 Let x1 , x 2 ,…, x n be a set of n measurements that is arranged in


ascending order. The sample median is calculated as follows:

 M = x n+1 /2 , if n is odd,
 ( )
  (2.5)
 M = 1 ( x + x
 n /2 n /2+1 ), if n is even.
2

Obviously, if the number of observations is odd, the median is the value in the
middle. When the number of values is even, strictly, there is no median. However,
we usually calculate the median in this case as the arithmetic mean of the two
middle observations ( x n /2 + x n /2+1 ) in the ordered set. The rationale for these defi-
nitions is to ensure an equal frequency on both sides of the sample median.

Example 2.6 The incubation period (day) for 11 patients with a certain infectious
disease was recorded as
4, 2, 3, 6, 17, 8, 4, 15, 2, 10, and 8.
Find the median incubation period for these patients.
Solution
First, arrange the sample values in ascending order:
2, 2, 3, 4, 4, 6, 8, 8, 10, 15, and 17.
Based on Definition 2.2, because n is odd (n = 11), the median is calculated as
M = x(11+1)/2 = x 6 = 6

Thus, the median incubation period of the infectious disease in this group was
6 days, indicating the half of the observations are smaller than 6.

c02.indd 20 30-03-2022 21:17:21


2.2 Descriptive Statistics of Numerical Data 21

(2) Calculating the median with a frequency-table dataset

i  n 
M = L+  − ∑ f L .  (2.6)
fM  2 

In Formula 2.6, i denotes the interval width of the group, L represents the lower
bound of the group interval that the median falls into, and f M is the frequency of this
group. ∑ f L is the cumulative frequency prior to this group.

Example 2.7 The incubation period (hour) for 84 patients with food poisoning is
shown in Table 2.6. Find the median.
Solution
The incubation period values have been sorted in ascending order in Table 2.6. Because
the median lies in the first group where the cumulative relative frequency is > 50% (or
frequency > n/2), we have

L = 10, i = 2, f M = 7, n = 84, ∑ f L = 36

Using Formula 2.6, the median is

i  n  2  84 
M = L+  − ∑ f L  = 10 + × − 36 = 11.7
fM  2  7 2 

Thus, the median incubation period of food poisoning among these patients was
11.7 hours.
We know from Formula 2.6 that the calculation of the median is determined only by
the median value and is not susceptible to the influence of extreme values. For data with
a heavily skewed distribution, the median might be the only choice for describing the
central location of the data. The main weakness of the median is that it is determined
mainly by the middle points in a sample and is less sensitive to the actual numerical
values of the remaining data points. Thus, it ignores most of the numerical information.
3. Geometric Mean
Many biomedical indicators, such as antibody titers tested in the laboratory, bacterial
counts, concentrations of certain drugs, and the incubation period of certain infectious
diseases, are distributed asymmetrically. These data often exhibit geometric progres-
sion or obvious right skewness. In such cases, the arithmetic mean is not appropriate
for measuring central tendency because this statistic would poorly represent the
average level of the data. One solution is to convert the original values before calcula-
tion, and many options exist for converting the data; the most commonly used of these
is the logarithm transformation. Because data with geometric progression and most
data with a right-skewed distribution approximate the normal distribution after
logarithmic transformation, we call such skewed distributions log-normal distributions.
The calculation of the geometric mean also involves either the direct method or the
weighted method.

c02.indd 21 30-03-2022 21:17:23


22 2 Descriptive Statistics

Table 2.6 Incubation period (hour) for 84 patients with food poisoning.

Group k Frequency f k Cumulative frequency Cumulative relative


∑ fk frequency (%)
(1) (2) (3) (4)

4– 5 5 6.0
6– 12 17 20.2
8– 19 36 42.9
10– 7 43 51.2
12– 7 50 59.5
14– 6 56 66.7
16– 7 63 75.0
18– 5 68 81.0
20– 4 72 85.7
22– 4 76 90.5
24– 3 79 94.0
26– 1 80 95.2
28– 0 80 95.2
30– 1 81 96.4
32– 1 82 97.6
34– 0 82 97.6
36– 0 82 97.6
38– 0 82 97.6
40– 1 83 98.8
42– 0 83 98.8
44– 0 83 98.8
46–48 1 84 100.0
Total 84 — —

(1) Calculating the geometric mean with an original dataset

Definition 2.3 Let x1 , x 2 ,…, x n be a set of n measurements, x i > 0 (i = 1, 2,…, n).


The sample geometric mean is calculated as follows

G = n x1 × x 2 ×× x n . (2.7)

After the logarithmic transformation, the calculation formula is

 n 

 ∑ log x i 
 log x1 + log x 2 +⋅⋅⋅ + log x n   
G = log−1   = log−1  i=1 . (2.8)
 n 
  n 
 
 

c02.indd 22 30-03-2022 21:17:25


2.2 Descriptive Statistics of Numerical Data 23

The natural constant e or 10 is usually chosen as the base of the logarithm in


Formula 2.8, but other options also exist and may be appropriate in practice.

Example 2.8 A physician measured serum hepatitis B surface antibody titer in 8


hepatitis B patients, and the measured values were
1:8, 1:16, 1:16, 1:32, 1:64, 1:128, 1:256, and 1:512.
Find the average titer.
Solution
Because the reciprocal value of the antibody titer increases in geometric series (the
common ratio is 2), we calculate the geometric mean based on Definition 2.3. For
convenience of calculation, the reciprocal value of the titer is substituted into Formula 2.7

G = 8 x1 × x 2 ×× x 8 = 8 8×16 ×16 ×32 × 64 ×128× 256 ×512 = 53.8

Here, we could also use Formula 2.8

 lg x 
 ∑ i
 lg 8 + lg 16 +  + lg 512 
−1  i 
G = lg    = lg−1  
 n   8 
 

 13.8474 
= lg−1   = lg−1 (1.7309) = 53.8.
 8 

The results are the same. The average hepatitis B surface antibody titer in the 8
hepatitis B patients was about 1:54.
(2) Calculating the geometric mean with a frequency-table dataset
Similar to the calculation of the arithmetic mean, the geometric mean can be calcu-
lated in the following manner:

 g 
 
 ∑ fk log xmk 
 
G = log−1  k =1 g .  (2.9)
 


 ∑ fk 

k =1

Formula 2.9 is self-evident because log-normal distributed data approximate the


normal distribution after logarithmic transformation and can then be processed as
normally distributed data. That is, first, the logarithmic mean is calculated, and the
geometric mean is then obtained by taking the antilog transformation.

Example 2.9 Calculate the average triglyceride levels of the 153 10-year-old girls
using the data shown in the frequency table in Example 2.2.
Solution
The distribution of the data is right-skewed, as shown in the frequency distribution
table, suggesting that the geometric mean is more appropriate than the arithmetic

c02.indd 23 30-03-2022 21:17:26


24 2 Descriptive Statistics

Table 2.7 Calculation of the geometric mean of triglyceride levels (mmol/L) for 153
10-year-old girls, using the weighted method.

Group k Frequency f k Midpoint value x mk ln x mk f k ln x mk


(1) (2) (3) (4) (5)

0.3– 22 0.45 –0.799 –17.567


0.6– 42 0.75 –0.288 –12.083
0.9– 45 1.05 0.049 2.196
1.2– 26 1.35 0.300 7.803
1.5– 11 1.65 0.501 5.509
1.8– 4 1.95 0.668 2.671
2.1– 2 2.25 0.811 1.622
2.4–2.7 1 2.55 0.936 0.936
Total 153 — — –8.913

mean for representing the average triglyceride level. The specific solution process is
shown in Table 2.7.
Using the data shown in Table 2.7 and Formula 2.9, we have

1 1
ln G = ∑
n k
fk ln xmk =
153
× (−17.567 − 12.083 +  + 0.936)

= −8.913 / 153 = −0.0583.

After taking the antilog,

G = ln−1 (−0.0583) = 0.9434

Thus, the average triglyceride level for these girls was 0.94 mmol/L.
4. Comparison of Mean, Median, and Geometric Mean
In many cases, the mean, median, and geometric mean can be used to assess the sym-
metry of a distribution. Figure 2.4 presents an example of this.
In Example 2.1, because the distribution of the girls’ heights appeared reasonably
symmetrical, the three measures of the “average” all took similar values (Figure
2.4(a)). In particular, the arithmetic mean is approximately the same as the median. If
a distribution is positively skewed, as was the case for the triglyceride data in Example
2.2, the arithmetic mean shows a higher “average” than either the median or the
geometric mean (Figure 2.4(b)). Given a right skewed distribution, when the median
and geometric mean are approximately equal, these two statistics usually give a better
description of the central location. Figure 2.4(c) provides another example, which
presents the distribution of the Patient Health Questionnaire-9 (PHQ-9) scores of 1072
depressed patients. The distribution is negatively skewed. In this case, only the median
can provide a good expression of the central location.
The best measure of central tendency for a dataset depends on the type of descrip-
tive information you want. Most of the inferential statistical methods discussed in

c02.indd 24 30-03-2022 21:17:27


2.2 Descriptive Statistics of Numerical Data 25

Figure 2.4 Comparison of three “averages” using the height data from Example 2.1
(a), the triglyceride data from Example 2.2 (b), and data on the Patient Health
Questionnaire-9 (PHQ-9) scores of 1072 depressed patients (c).

this text are based, theoretically, on bell-shaped distributions of data with little or no
skewness. With such data, the mean and the median will be, for all practical pur-
poses, the same. Because the mean has nicer mathematical properties than the
median, the mean is the preferred measure of central tendency for these inferential
techniques.

c02.indd 25 30-03-2022 21:17:29


26 2 Descriptive Statistics

2.2.2 Measures of Dispersion


To describe data adequately, we must also define measures of dispersion. We will begin
with an illustrative example.

Example 2.10 ( )
The platelet counts ×109 / L of 12 patients in two wards were mea-
sured. The resultant values were as follows:
Ward A: 186, 191, 199, 200, 209, and 215 x A = 200
Ward B: 160, 185, 190, 204, 217, and 244 x B = 200
Both of the groups had a mean platelet count of 200×109 / L. However, there was a
large difference between the two groups in terms of the dispersion of the data. That is,
the platelet counts for patients in ward B were more widely spread out compared with
those for patients in ward A.
In this section, we introduce several commonly used descriptive statistics that convey
information regarding the degree of dispersion present in a set of data: range, percentile and
interquartile range, variance and standard deviation, and the coefficient of variation. These
statistics, combined with those describing central tendency, have great value in describing
the distribution characteristics of a dataset.
1. Range
The definition of range was previously provided in Formula 2.1. Calculating the range
from Example 2.10, we have
Range for ward A: RA = x max − x min = 215 − 186 = 29
Range for ward B: RB = x max − x min = 244 − 160 = 84
The results indicate that the range of platelet count in ward B was larger than that in
ward A, although the two wards had the same mean.
A prime advantage of range is that it is easy to calculate. Moreover, range is mea-
sured in the same units as the original data; thus, range has a direct interpretation.
However, range also has several clear disadvantages: (i) It takes into account only two
values (the smallest and the largest), ignoring the variation caused by other observa-
tions. (ii) It is very sensitive to variation in sample size; as the sample size increases,
the range tends to become larger. (iii) Compared with other dispersion measures, the
sampling error (see Chapter 6) of range is relatively large, and therefore the stability of
range is poorer.

2. Percentile and Interquartile Range


Formula 2.5 shows that the median divides all ordered values equally into two parts.
Similarly, the values can be equally divided into a larger number of parts if desired.

Definition 2.4 If the observations are sorted from smallest to largest and equally
divided into 100 parts, the corresponding value of the proportion p % (0 ≤ p ≤ 100) is
called the percentile, which is denoted by the symbol X p% . For frequency-table data,
X p% is calculated as

i
X p% = L +
fp%
(n × p% − ∑ fL ). (2.10)

c02.indd 26 30-03-2022 21:17:33


Another random document with
no related content on Scribd:
“She didn’t like it.”
“No ... she thought the boys stupid.”
“They’re very much like all boys of their age. It’s not an interesting
time.”
Sybil frowned a little. “Thérèse doesn’t think so. She says all they have
to talk about is their clubs and drinking ... neither subject is of very much
interest.”
“They might have been, if you’d lived here always ... like the other girls.
You and Thérèse see it from the outside.” The girl didn’t answer, and Olivia
asked: “You don’t think I was wrong in sending you to France to school?”
Quickly Sybil looked up. “Oh, no ... no,” she said, and then added with
smoldering eagerness, “I wouldn’t have changed it for anything in the
world.”
“I thought you might enjoy life more if you saw a little more than one
corner of it.... I wanted you to be away from here for a little time.” (She did
not say what she thought—“because I wanted you to escape the blight that
touches everything at Pentlands.”)
“I’m glad,” the girl replied. “I’m glad because it makes everything
different.... I can’t explain it.... Only as if everything had more meaning
than it would have otherwise.”
Suddenly Olivia kissed her daughter and said: “You’re a clever girl;
things aren’t wasted on you. And now go along to bed. I’ll stop in to say
good-night.”
She watched the girl as she moved away through the big empty hall past
the long procession of Pentland family portraits, thinking all the while that
beside them Sybil seemed so fresh and full of warm eager life; and when at
last she turned, she encountered her father-in-law and old Mrs. Soames
moving along the narrow passage that led from the writing-room. It struck
her sharply that the gaunt, handsome old John Pentland seemed really old
to-night, in a way he had never been before, old and a little bent, with
purplish circles under his bright black eyes.
Old Mrs. Soames, with her funny, intricate, dyed-black coiffure and
rouged cheeks and sagging chin supported by a collar of pearls, leaned on
his arm—the wreck of a handsome woman who had fallen back upon such
silly, obvious tricks as rouge and dye—a vain, tragic old woman who never
knew that she was a figure of fun. At sight of her, there rose in Olivia’s
mind a whole vista of memories—assembly after assembly with Mrs.
Soames in stomacher and tiara standing in the reception line bowing and
smirking over rites that had survived in a provincial fashion some darker,
more barbaric, social age.
And the sight of the old man walking gently and slowly, out of deference
to Mrs. Soames’ infirmities, filled Olivia with a sudden desire to weep.
John Pentland said, “I’m going to drive over with Mrs. Soames, Olivia
dear. You can leave the door open for me.” And giving his daughter-in-law
a quick look of affection he led Mrs. Soames away across the terrace to his
motor.
It was only after they had gone that Olivia discovered Sabine standing in
the corridor in her brilliant green dress watching the two old people from
the shadow of one of the deep-set windows. For a moment, absorbed in the
sight of John Pentland helping Mrs. Soames with a grim courtliness into the
motor, neither of them spoke, but as the motor drove away down the long
drive under the moon-silvered elms, Sabine sighed and said, “I can
remember her as a great beauty ... a really great beauty. There aren’t any
more like her, who make their beauty a profession. I used to see her when I
was a little girl. She was beautiful—like Diana in the hunting-field. They’ve
been like that for ... for how long.... It must be forty years, I suppose.”
“I don’t know,” said Olivia quietly. “They’ve been like that ever since I
came to Pentlands.” (And as she spoke she was overcome by a terrible
feeling of sadness, of an abysmal futility. It had come to her more and more
often of late, so often that at times it alarmed her lest she was growing
morbid.)
Sabine was speaking again in her familiar, precise, metallic voice. “I
wonder,” she said, “if there has ever been anything....”
Olivia, divining the rest of the question, answered it quickly, interrupting
the speech. “No ... I’m sure there’s never been anything more than we’ve
seen.... I know him well enough to know that.”
For a long time Sabine remained thoughtful, and at last she said: “No ... I
suppose you’re right. There couldn’t have been anything. He’s the last of
the Puritans.... The others don’t count. They go on pretending, but they
don’t believe any more. They’ve no vitality left. They’re only hypocrites
and shadows.... He’s the last of the royal line.”
She picked up her silver cloak and, flinging it about her fine white
shoulders, said abruptly: “It’s almost morning. I must get some sleep. The
time’s coming when I have to think about such things. We’re not as young
as we once were, Olivia.”
On the moonlit terrace she turned and asked: “Where was O’Hara? I
didn’t see him.”
“No ... he was asked. I think he didn’t come on account of Anson and
Aunt Cassie.”
The only reply made by Sabine was a kind of scornful grunt. She turned
away and entered her motor. The ball was over now and the last guest gone,
and she had missed nothing—Aunt Cassie, nor old John Pentland, nor
O’Hara’s absence, nor even Higgins watching them all in the moonlight
from the shadow of the lilacs.
The night had turned cold as the morning approached and Olivia,
standing in the doorway, shivered a little as she watched Sabine enter her
motor and drive away. Far across the meadows she saw the lights of John
Pentland’s motor racing along the lane on the way to the house of old Mrs.
Soames; she watched them as they swept out of sight behind the birch
thicket and reappeared once more beyond the turnpike, and as she turned
away at last it occurred to her that the life at Pentlands had undergone some
subtle change since the return of Sabine.
CHAPTER II

It was Olivia’s habit (and in some way every small action at Pentlands
came inevitably to be a habit) to go about the house each night before
climbing the paneled stairs, to see that all was in order, and by instinct she
made the little tour as usual after Sabine had disappeared, stopping here and
there to speak to the servants, bidding them to go to bed and clear away in
the morning. On her way she found that the door of the drawing-room,
which had been open all the evening, was now, for some reason, closed.
It was a big square room belonging to the old part of the house that had
been built by the Pentland who made a fortune out of equipping privateers
and practising a sort of piracy upon British merchantmen—a room which in
the passing of years had come to be a museum filled with the relics and
souvenirs of a family which could trace its ancestry back three hundred
years to a small dissenting shopkeeper who had stepped ashore on the bleak
New England coast very soon after Miles Standish and Priscilla Alden. It
was a room much used by all the family and had a worn, pleasant look that
compensated for the monstrous and incongruous collection of pictures and
furniture. There were two or three Sheraton and Heppelwhite chairs and a
handsome old mahogany table, and there were a plush sofa and a vast
rocking-chair of uncertain ancestry, and a hideous bronze lamp that had
been the gift of Mr. Longfellow to old John Pentland’s mother. There were
two execrable water-colors—one of the Tiber and the Castle San Angelo
and one of an Italian village—made by Miss Maria Pentland during a tour
of Italy in 1846, and a stuffed chair with tassels, a gift from old Colonel
Higginson, a frigid steel engraving of the Signing of the Declaration which
hung over the white mantelpiece, and a complete set of Woodrow Wilson’s
History of the United States given by Senator Lodge (whom Aunt Cassie
always referred to as “dear Mr. Lodge”). In this room were collected
mementoes of long visits paid by Mr. Lowell and Mr. Emerson and General
Curtis and other good New Englanders, all souvenirs which Olivia had left
exactly as she found them when she came to the big house as the bride of
Anson Pentland; and to those who knew the room and the family there was
nothing unbeautiful or absurd about it. The effect was historical. On
entering it one almost expected a guide to step forward and say, “Mr.
Longfellow once wrote at this desk,” and, “This was Senator Lodge’s
favorite chair.” Olivia knew each tiny thing in the room with a sharp sense
of intimacy.
She opened the door softly and found that the lights were still burning
and, strangest of all, that her husband was sitting at the old desk surrounded
by the musty books and yellowed letters and papers from which he was
compiling laboriously a book known as “The Pentland Family and the
Massachusetts Bay Colony.” The sight of him surprised her, for it was his
habit to retire punctually at eleven every night, even on such an occasion as
this. He had disappeared hours earlier from the ball, and he still sat here in
his dinner coat, though it was long after midnight.
She had entered the room so softly that he did not hear her and for a
moment she remained silently looking down at him, as if undetermined
whether to speak or to go quietly away. He sat with his back to her so that
the sloping shoulders and the thin, ridged neck and partly bald head stood
outlined against the white of the paneling. Suddenly, as if conscious of
being watched, he turned and looked at her. He was a man of forty-nine
who looked older, with a long horse-face like Aunt Cassie’s—a face that
was handsome in a tired, yellow sort of way—and small, round eyes the
color of pale-blue porcelain. At the sight of Olivia the face took on a
pouting expression of sourness ... a look which she knew well as one that he
wore when he meant to complain of something.
“You are sitting up very late,” she observed quietly, with a deliberate air
of having noticed nothing unusual.
“I was waiting to speak to you. I want to talk with you. Please sit down
for a moment.”
There was an odd sense of strangeness in their manner toward each
other, as if there had never been, even years before when the children were
babies, any great intimacy between them. On his part there was, too, a sort
of stiff and nervous formality, rather quaint and Victorian, and touched by
an odd air of timidity. He was a man who would always do not perhaps the
proper thing, but the thing accepted by his world as “proper.”
It was the first time since morning that the conversation between them
had emerged from the set pattern which it had followed day after day for so
many years. When he said that he wanted to speak to her, it meant usually
that there was some complaint to be made against the servants, more often
than not against Higgins, whom he disliked with an odd, inexplicable
intensity.
Olivia sat down, irritated that he should have chosen this hour when she
was tired, to make some petty comment on the workings of the house. Half
without thinking and half with a sudden warm knowledge that it would
annoy him to see her smoking, she lighted a cigarette; and as she sat there,
waiting until he had blotted with scrupulous care the page on which he had
been writing, she became conscious slowly of a strange, unaccustomed
desire to be disagreeable, to create in some way an excitement that would
shatter for a moment the overwhelming sense of monotony and so relieve
her nerves. She thought, “What has come over me? Am I one of those
women who enjoys working up scenes?”
He rose from his chair and stood, very tall and thin, with drooping
shoulders, looking down at her out of the pale eyes. “It’s about Sybil,” he
said. “I understand that she goes riding every morning with this fellow
O’Hara.”
“That’s true,” replied Olivia quietly. “They go every morning before
breakfast, before the rest of us are out.”
He frowned and assumed almost mechanically a manner of severe
dignity. “And you mean to say that you have known about it all along?”
“They meet down in the meadows by the old gravel-pit because he
doesn’t care to come up to the house.”
“He knows, perhaps, that he wouldn’t be welcome.”
Olivia smiled a little ironically. “I’m sure that’s the reason. That’s why
he didn’t come to-night, though I asked him. You must know, Anson, that I
don’t feel as you do about him.”
“No, I suppose not. You rarely do.”
“There’s no need to be unpleasant,” she said quietly.
“You seem to know a great deal about it.”
“Sybil tells me everything she does. It is much better to have it that way,
I think.”
Watching him, it gave her a faint, warm sense of satisfaction to see that
Anson was annoyed by her calmness, and yet she was a little ashamed, too,
for wanting the excitement of a small scene, just a tiny scene, to make life
seem a little more exciting. He said, “But you know how Aunt Cassie and
my father feel about O’Hara.”
Then, for the first time, Olivia began to see light in the darkness. “Your
father knows all about it, Anson. He has gone with them himself on the red
mare, once or twice.”
“Are you sure of that?”
“Why should I make up such a ridiculous lie? Besides, your father and I
get on very well. You know that.” It was a mild thrust which had its
success, for Anson turned away angrily. She had really said to him, “Your
father comes to me about everything, not to you. He is not the one who
objects or I should have known.” Aloud she said, “Besides, I have seen him
with my own eyes.”
“Then I will take it on my own responsibility. I don’t like it and I want it
stopped.”
At this speech Olivia’s brows arched ever so slightly with a look which
might have been interpreted either as one of surprise or one of mockery or
perhaps a little of both. For a moment she sat quite still, thinking, and at last
she said, “Am I right in supposing that Aunt Cassie is at the bottom of
this?” When he made no reply she continued, “Aunt Cassie must have
gotten up very early to see them off.” Again a silence, and the dark little
devil in Olivia urged her to say, “Or perhaps she got her information from
the servants. She often does, you know.”
Slowly, while she was speaking, her husband’s face had grown more and
more sour. The very color of the skin seemed to have changed so that it
appeared faintly green in the light from the Victorian luster just above his
narrow head.
“Olivia, you have no right to speak of my aunt in that way.”
“We needn’t go into that. I think you know that what I said was the
truth.” And a slow warmth began to steal over her. She was getting beneath
his skin. After all those long years, he was finding that she was not entirely
gentle.
He was exasperated now and astonished. In a more gentle voice he said,
“Olivia, I don’t understand what has come over you lately.”
She found herself thinking, wildly, “Perhaps he is going to soften.
Perhaps there is still a chance of warmth in him. Perhaps even now, after so
long, he is going to be pleasant and kind and perhaps ... perhaps ... more.”
“You’re very queer,” he was saying. “I’m not the only one who finds you
so.”
“No,” said Olivia, a little sadly. “Aunt Cassie does, too. She’s been
telling all the neighborhood that I seem to be unhappy. Perhaps it’s because
I’m a little tired. I’ve not had much rest for a long time now ... from Jack,
from Aunt Cassie, from your father ... and ... from her.” At the last word
she made a curious little half-gesture in the direction of the dark north wing
of the big house.
She watched him, conscious that he was shocked and startled by her
mentioning in a single breath so many things which they never discussed at
Pentlands, things which they buried in silence and tried to destroy by
pretending that they did not exist.
“We ought to speak of those things, sometimes,” she continued sadly.
“Sometimes when we are entirely alone with no one about to hear, when it
doesn’t make any difference. We can’t pretend forever that they don’t
exist.”
For a time he was silent, groping obviously, in a kind of desperation for
something to answer. At last he said feebly, “And yet you sit up all night
playing bridge with Sabine and old Mrs. Soames and Father.”
“That does me good. You must admit that it is a change at least.”
But he only answered, “I don’t understand you,” and began to pace up
and down in agitation while she sat there waiting, actually waiting, for the
thing to work itself up to a climax. She had a sudden feeling of victory, of
intoxication such as she had not known in years, not since she was a young
girl; and at the same time she wanted to laugh, wildly, hysterically, at the
sight of Anson, so tall and thin, prancing up and down.
Opposite her he halted abruptly and said, “And I can see no good in
inviting Mrs. Soames here so often.”
She saw now that the tension, the excitement between them, was greater
even than she had imagined, for Anson had spoken of Mrs. Soames and his
father, a thing which in the family no one ever mentioned. He had done it
quite openly, of his own free will.
“What harm can it do now? What difference can it make?” she asked. “It
is the only pleasure left to the poor battered old thing, and one of the few
left to your father.”
Anson began to mutter in disgust. “It is a silly affair ... two old ... old....”
He did not finish the sentence, for there was only one word that could have
finished it and that was a word which no gentleman and certainly no
Pentland ever used in referring to his own father.
“Perhaps,” said Olivia, “it is a silly affair now.... I’m not so sure that it
always was.”
“What do you mean by that? Do you mean....” Again he fumbled for
words, groping to avoid using the words that clearly came into his mind. It
was strange to see him brought face to face with realities, to see him grow
so helpless and muddled. “Do you mean,” he stammered, “that my father
has ever behaved ...” he choked and then added, “dishonorably.”
“Anson ... I feel strangely like being honest to-night ... just for once ...
just for once.”
“You are succeeding only in being perverse.”
“No ...” and she found herself smiling sadly, “unless you mean that in
this house ... in this room....” She made a gesture which swept within the
circle of her white arm all that collection of Victorian souvenirs, all the
mementoes of a once sturdy and powerful Puritan family, “...in this room to
be truthful and honest is to be perverse.”
He would have interrupted her here, angrily, but she raised her hand and
continued, “No, Anson; I shall tell you honestly what I think ... whether you
want to hear it or not. I don’t hope that it will do any good.... I do not know
whether, as you put it, your father has behaved dishonorably or not. I hope
he has.... I hope he was Mrs. Soames’ lover in the days when love could
have meant something to them.... Yes ... something fleshly is exactly what I
mean.... I think it would have been better. I think they might have been
happy ... really happy for a little time ... not just living in a state of
enchantment when one day is exactly like the next.... I think your father, of
all men, has deserved that happiness....” She sighed and added in a low
voice, “There, now you know!”
For a long time he simply stood staring at the floor with the round, silly
blue eyes which sometimes filled her with terror because they were so like
the eyes of that old woman who never left the dark north wing and was
known in the family simply as she, as if there was very little that was
human left in her. At last he muttered through the drooping mustache, as if
speaking to himself, “I can’t imagine what has happened to you.”
“Nothing,” said Olivia. “Nothing. I am the same as I have always been,
only to-night I have come to the end of saying ‘yes, yes’ to everything, of
always pretending, so that all of us here may go on living undisturbed in our
dream ... believing always that we are superior to every one else on the
earth, that because we are rich we are powerful and righteous, that because
... oh, there is no use in talking.... I am just the same as I have always been,
only to-night I have spoken out. We all live in a dream here ... a dream that
some day will turn sharply into a nightmare. And then what will we do?
What will you do ... and Aunt Cassie and all the rest?”
In her excitement her cheeks grew flushed and she stood up, very tall
and beautiful, leaning against the mantelpiece; but her husband did not
notice her. He appeared to be lost in deep thought, his face contorted with a
kind of grim concentration.
“I know what has happened,” he said presently. “It is Sabine. She should
never have come back here. She was like that always ... stirring up trouble
... even as a little girl. She used to break up our games by saying: ‘I won’t
play house. Who can be so foolish as to pretend muddy water is claret! It’s
a silly game.’ ”
“Do you mean that she is saying it again now ... that it’s a silly game to
pretend muddy water is claret?”
He turned away without answering and began again to pace up and down
over the enormous faded roses of the old Victorian carpet. “I don’t know
what you’re driving at. All I know is that Sabine ... Sabine ... is an evil
woman.”
“Do you hate Sabine because she is a friend of mine?”
She had watched him for so many years disliking the people who were
her friends, managing somehow to get rid of them, to keep her from seeing
them, to force her into those endless dinners at the houses of the safe men
he knew, the men who had gone to his college and belonged to his club, the
men who would never do anything that was unexpected. And in the end she
had always done as he wanted her to do. It was perhaps a manifestation of
his resentment toward all those whom he could not understand and even
(she thought) feared a little—the attitude of a man who will not allow others
to enjoy what he could not take for himself. It was the first time she had
ever spoken of this dog-in-the-manger game, but she found herself unable
to keep silent. It was as if some power outside her had taken possession of
her body. She had a strange sensation of shame at the very moment she
spoke, of shame at the sound of her own voice, a little strained and
hysterical.
There was something preposterous, too, in the sight of Anson prancing
up and down the old room filled with all the souvenirs of that decayed
respectability in which he wrapped himself ... prancing up and down with
all his prejudices and superstitions bristling. And now Olivia had dragged
the truth uncomfortably into the light.
“What an absurd thing to say!” he said bitterly.
Olivia sighed. “No, I don’t think so.... I think you know exactly what I
mean.” (She knew the family game of pretending never to understand a
truthful, unpleasant statement.)
But this, too, he refused to answer. Instead, he turned to her, more savage
and excited than she had ever seen him, so moved that he seemed for a
second to attain a pale flash of power and dignity. “And I don’t like that Fiji
Islander of a daughter of hers, who has been dragged all over the world and
had her head filled with barbaric ideas.”
At the sight of him and the sound of his voice Olivia experienced a
sudden blinding flash of intuition that illuminated the whole train of their
conversation, indeed, the whole procession of the years she had spent here
at Pentlands or in the huge brownstone house in Beacon Street. She knew
suddenly what it was that frightened Anson and Aunt Cassie and all that
intricate world of family. They were terrified lest the walls, the very
foundations, of their existence be swept away leaving them helpless with all
their little prides and vanities exposed, stripped of all the laws and
prejudices which they had made to protect them. It was why they hated
O’Hara, an Irishman and a Roman Catholic. He had menaced their security.
To be exposed thus would be a calamity, for in any other world save their
own, in a world where they stood unprotected by all that money laid away
in solid trust funds, they would have no existence whatever. They would
suddenly be what they really were.
She saw sharply, clearly, for the first time, and she said quietly, “I think
you dislike Thérèse for reasons that are not fair to the girl. You distrust her
because she is different from all the others ... from the sort of girls that you
were trained to believe perfect. Heaven knows there are enough of them
about here ... girls as like as peas in a pod.”
“And what about this boy who is coming to stay with Sabine and her
daughter ... this American boy with a French name who has never seen his
own country until now? I suppose he’ll be as queer as all the others. Who
knows anything about him?”
“Sabine,” began Olivia.
“Sabine!” he interrupted. “Sabine! What does she care who he is or
where he comes from? She’s given up decent people long ago, when she
went away from here and married that Levantine blackguard of a husband.
Sabine!... Sabine would only like to bring trouble to us ... the people to
whom she belongs. She hates us.... She can barely speak to me in a civil
fashion.”
Olivia smiled quietly and tossed her cigarette into the ashes beneath the
cold steel engraving of the Signing. “You are beginning to talk nonsense,
Anson. Let’s stick to facts, for once. I’ve met the boy in Paris.... Sybil knew
him there. He is intelligent and handsome and treats women as if they were
something more than stable-boys. There are still a few of us left who like to
be treated thus ... as women ... a few of us even here in Durham. No, I don’t
imagine you’ll care for him. He won’t belong to your club or to your
college, and he’ll see life in a different way. He won’t have had his opinions
all ready made, waiting for him.”
“It’s my children I’m thinking of.... I don’t want them picking up with
any one, with the first person who comes along.”
Olivia did not smile. She turned away now and said softly, “If it’s Jack
you’re worrying about, you needn’t fuss any longer. He won’t marry
Thérèse. I don’t think you know how ill he is.... I don’t think, sometimes,
that you really know anything about him at all.”
“I always talk with the doctors.”
“Then you ought to know that they’re silly ... the things you’re saying.”
“All the same, Sabine ought never to have come back here....”
She saw now that the talk was turning back into the inevitable channel of
futility where they would go round and round, like squirrels in a cage,
arriving nowhere. It had happened this way so many times. Turning with an
air of putting an end to the discussion, she walked over to the fireplace ...
pale once more, with faint, mauve circles under her dark eyes. There was a
fragility about her, as if this strange spirit which had flamed up so suddenly
were too violent for the body.
“Anson,” she said in a low voice, “please let’s be sensible. I shall look
into this affair of Sybil and O’Hara and try to discover whether there is
anything serious going on. If necessary, I shall speak directly to both of
them. I don’t approve, either, but not for the same reason. He is too old for
her. You won’t have any trouble. You will have to do nothing.... As to
Sabine, I shall continue to see as much of her as I like.”
In the midst of the speech she had grown suddenly, perilously, calm in
the way which sometimes alarmed her husband and Aunt Cassie. Sighing a
little, she continued, “I have been good and gentle, Anson, for years and
years, and now, to-night ... to-night I feel as if I were coming to the end of
it.... I only say this to let you know that it can’t go on forever.”
Picking up her scarf, she did not wait for him to answer her, but moved
away toward the door, still enveloped in the same perilous calm. In the
doorway she turned. “I suppose we can call the affair settled for the
moment?”
He had been standing there all the while watching her out of the round
cold blue eyes with a look of astonishment as if after all those years he had
seen his wife for the first time; and then slowly the look of astonishment
melted into one of slyness, almost of hatred, as if he thought, “So this is
what you really are! So you have been thinking these things all these years
and have never belonged to us at all. You have been hating us all the while.
You have always been an outsider—a common, vulgar outsider.”
His thin, discontented lips had turned faintly gray, and when he spoke it
was nervously, with a kind of desperation, like a small animal trapped in a
corner. The words came out from the thin lips in a sharp, quick torrent, like
the rush of white-hot steel released from a cauldron ... words spoken in a
voice that was cold and shaken with hatred.
“In any case,” he said, “in any case ... I will not have my daughter marry
a shanty Irishman.... There is enough of that in the family.”
For a moment Olivia leaned against the door-sill, her dark eyes wide
with astonishment, as if she found it impossible to believe what she had
heard. And then quietly, with a terrible sadness and serenity in her voice,
she murmured almost to herself, “What a rotten thing to say!” And after a
little pause, as if still speaking to herself, “So that is what you have been
thinking for twenty years!” And again, “There is a terrible answer to that....
It’s so terrible that I shan’t say it, but I think you ... you and Aunt Cassie
know well enough what it is.”
Closing the door quickly, she left him there, startled and exasperated,
among all the Pentland souvenirs, and slowly, in a kind of nightmare, she
made her way toward the stairs, past the long procession of Pentland
ancestors—the shopkeeping immigrant, the witch-burner, the professional
evangelist, the owner of clipper ships, and the tragic, beautiful Savina
Pentland—and up the darkened stairway to the room where her husband
had not followed her in more than fifteen years.

Once in her own room she closed the door softly and stood in the
darkness, listening, listening, listening.... There was at first no sound save
the blurred distant roar of the surf eating its way into the white dunes and
the far-off howling of a beagle somewhere in the direction of the kennels,
and then, presently, there came to her the faint sound of soft, easy breathing
from the adjoining room. It was regular, easy and quiet, almost as if her son
had been as strong as O’Hara or Higgins or that vigorous young de Cyon
whom she had met once for a little while at Sabine’s house in Paris.
The sound filled her with a wild happiness, so that she forgot even what
had happened in the drawing-room a little while before. As she undressed in
the darkness she stopped now and then to listen again in a kind of fierce
tension, as if by wishing it she could keep the sound from ever dying away.
For more than three years she had never once entered this room free from
the terror that there might only be silence to welcome her. And at last, after
she had gone to bed and was falling asleep, she was wakened sharply by
another sound, quite different, the sound of a wild, almost human cry ...
savage and wicked, and followed by the thud thud of hoofs beating
savagely against the walls of a stall, and then the voice of Higgins, the
groom, cursing wickedly. She had heard it before—the sound of old John
Pentland’s evil, beautiful red mare kicking the walls of her stall and
screaming wildly. There was an unearthly, implacable hatred between her
and the little apelike man ... and yet a sort of fascination, too. As she sat up
in her bed, listening, and still startled by the wild sound, she heard her son
saying:
“Mama, are you there?”
“Yes.”
She rose and went into the other room, where, in the dim light from the
night-lamp, the boy was sitting up in bed, his pale blond hair all rumpled,
his eyes wide open and staring a little.
“You’re all right, Jack?” she whispered. “There’s nothing the matter?”
“No—nothing. I had a bad dream and then I heard the red mare.”
He looked pale and ill, with the blue veins showing on his temples; yet
she knew that he was stronger than he had been for months. He was fifteen,
and he looked younger than his age, rather like a boy of thirteen or fourteen,
but he was old, too, in the timeless fashion of those who have always been
ill.
“Is the party over?... Have they all gone?” he asked.
“Yes, Jack.... It’s almost daylight. You’d better try to sleep again.”
He lay down without answering her, and as she bent to kiss him good-
night, she heard him say softly, “I wish I could have gone to the party.”
“You will, Jack, some day—before very long. You’re growing stronger
every day.”
Again a silence, while Olivia thought bitterly, “He knows that I’m lying.
He knows that what I’ve said is not the truth.”
Aloud she said, “You’ll go to sleep now—like a good boy.”
“I wish you’d tell me about the party.”
Olivia sighed. “Then I must close Nannie’s door, so we won’t waken
her.” And she closed the door leading to the room where the old nurse slept,
and seating herself on the foot of her son’s bed, she began a recital of who
had been at the ball, and what had happened there, bit by bit, carefully and
with all the skill she was able to summon. She wanted to give him, who had
so little chance of living, all the sense of life she was able to evoke.
She talked on and on, until presently she noticed that the boy had fallen
asleep and that the sky beyond the marshes had begun to turn gray and rose
and yellow with the rising day.
CHAPTER III

When Olivia first came to the old house as the wife of Anson Pentland,
the village of Durham, which lay inland from Pentlands and the sea, had
been invisible, lying concealed in a fold of the land which marked the faint
beginnings of the New Hampshire mountains. There had been in the view a
certain sleepy peacefulness: one knew that in the distant fold of land
surmounted by a single white spire there lay a quiet village of white
wooden houses built along a single street called High Street that was
dappled in summer with the shadows of old elm-trees. In those days it had
been a country village, half asleep, with empty shuttered houses here and
there falling into slow decay—a village with fewer people in it than there
had been a hundred years before. It had stayed thus sleeping for nearly
seventy-five years, since the day when a great migration of citizens had
robbed it of its sturdiest young people. In the thick grass that surrounded the
old meeting-house there lay a marble slab recording the event with an
inscription which read:
From this spot on the fourteenth day of August, eighteen
hundred and eighteen, the Reverend Josiah Milford, Pastor of this
Church, with one hundred and ninety members of his congregation
—men, women and children—set out, secure in their faith in
Almighty God, to establish His Will and Power in the Wilderness of
the Western Reserve.
Beneath the inscription were cut the names of those families who had
made the journey to found a new town which had since surpassed sleepy
Durham a hundred times in wealth and prosperity. There was no Pentland
name among them, for the Pentlands had been rich even in the year
eighteen hundred and eighteen, and lived in winter in Boston and in
summer at Durham, on the land claimed from the wilderness by the first of
the family.
From that day until the mills came to Durham the village sank slowly
into a kind of lethargy, and the church itself, robbed of its strength, died
presently and was changed into a dusty museum filled with homely early
American furniture and spinning-wheels—a place seldom visited by any
one and painted grudgingly every five years by the town council because it
was popularly considered an historical monument. The Pentland family
long ago had filtered away into the cold faith of the Unitarians or the more
compromising and easy creeds of the Episcopal church.
But now, nearly twenty years after Olivia had come to Pentlands, the
village was alive again, so alive that it had overflowed its little fold in the
land and was streaming down the hill on the side next to the sea in straight,
plain columns of ugly stucco bungalows, each filled with its little family of
Polish mill-workers. And in the town, across High Street from the white-
spired old meeting-house, there stood a new church, built of stucco and
green-painted wood and dedicated to the great Church of Rome. In the old
wooden houses along High Street there still lingered remnants of the old
families ... old Mrs. Featherstone, who did washing to support four sickly
grandchildren who ought never to have been born; Miss Haddon, a queer
old woman who wore a black cape and lived on a dole from old John
Pentland as a remote cousin of the family; Harry Peckhan, the village
carpenter; old Mrs. Malson, living alone in a damp, gaunt and beautiful old
house filled with bits of jade and ivory brought back from China by her
grandfather’s clippers; Miss Murgatroyd, who had long since turned her
bullfinch house into a shabby tea-room. They remained here and there, a
few worn and shabby-genteel descendants of those first settlers who had
come into the country with the Pentlands.
But the mills had changed everything, the mills which poured wealth
into the pockets of a dozen rich families who lived in summer within a few
miles of Durham.
Even the countryside itself had changed. There were no longer any of the
old New Englanders in possession of the land. Sometimes in riding along
the lanes one encountered a thin, silly-faced remnant of the race sitting on a
stone wall chewing a bit of grass; but that was all; the others had been
swallowed up long ago in the mills of Salem and Lynn or died away, from
too much inbreeding and too little nourishment. The few farms that
remained fell into the hands of Poles and Czechs, solid, square people who
were a little pagan in their closeness to the earth and the animals which
surrounded them, sturdy people, not too moral, who wrought wonders with
the barren, stony earth of New England and stood behind their walls staring
wide-eyed while the grand people like the Pentlands rode by in pink coats
surrounded by the waving nervous tails of foxhounds. And, one by one,
other old farms were being turned back into a wilderness once more so that
there would be plenty of room for the horses and hounds to run after foxes
and bags of aniseed.
It had all changed enormously. From the upper windows of the big
Georgian brick house where the Pentlands lived, one could see the record of
all the changes. The windows commanded a wide view of a landscape
composed of grubby meadows and stone walls, thickets of pine and white
birches, marshes, and a winding sluggish brown river. Sometimes in the late
autumn the deer wandered down from the mountains of New Hampshire to
spoil the fox-hunting by leading the hounds astray after game that was far
too fleet for them.
And nearer at hand, nestled within a turn of the river, lay the land where
Sabine Callender had been born and had lived until she was a grown
woman—the land which she had sold carelessly to O’Hara, an Irish
politician and a Roman Catholic, come up from nowhere to take possession
of it, to clip its hedges, repair its sagging walls, paint its old buildings and
put up gates and fences that were too shiny and new. Indeed, he had done it
so thoroughly and so well that the whole place had a little the air of a
suburban real estate development. And now Sabine had returned to spend
the summer in one of his houses and to be very friendly with him in the face
of Aunt Cassie and Anson Pentland, and a score of others like them.
Olivia knew this wide and somberly beautiful landscape, every stick and
stone of it, from the perilous gravel-pit, half-hidden by its fringe of elder-
bushes, to the black pine copse where Higgins had discovered only a day or
two before a new litter of foxes. She knew it on gray days when it was cold
and depressing, on those bright, terribly clear New England days when
every twig and leaf seemed outlined by light, and on those damp, cold days
when a gray fog swept in across the marshes from the sea to envelop all the
countryside in gray darkness. It was a hard, uncompromising, stony country
that was never too cheerful.
It was a country, too, which gave her an old feeling of loneliness ... a
feeling which, strangely enough, seemed to increase rather than diminish as
the years passed. She had never accustomed herself to its occasional
dreariness. In the beginning, a long while ago, it had seemed to her green
and peaceful and full of quiet, a place where she might find rest and peace
... but she had come long since to see it as it was, as Sabine had seen it
while she stood in the window of the writing-room, frightened by the
sudden queer apparition of the little groom—a country beautiful, hard and
cold, and a little barren.
2

There were times when the memories of Olivia’s youth seemed to


sharpen suddenly and sweep in upon her, overwhelming all sense of the
present, times when she wanted suddenly and fiercely to step back into that
far-off past which had seemed then an unhappy thing, and these were the
times when she felt most lonely, the times when she knew how completely,
with the passing of years, she had drawn into herself; it was a process of
protection like a tortoise drawing in its head. And all the while, in spite of
the smiles and the politeness and the too facile amiability, she felt that she
was really a stranger at Pentlands, that there were certain walls and barriers
which she could never break down, past which she could never penetrate,
certain faiths in which it was impossible for her to believe.
It was difficult now for her to remember very clearly what had happened
before she came to Durham; it all seemed lost, confused, buried beneath the
weight of her devotion to the vast family monument of the Pentlands. She
had forgotten the names of people and places and confused the days and the
years. At times it was difficult for her to remember the endless confusing
voyages back and forth across the Atlantic and the vast, impersonal,
vacuous hotels which had followed each other in the bleak and unreal
procession of her childhood.
She could remember with a certain pitiful clarity two happy years spent
at the school in Saint-Cloud, where for months at a time she had lived in a
single room which she might call her own, where she had rested, free from
the terror of hearing her mother say, “We must pack to-day. We are leaving
to-morrow for St. Petersburg or London or San Remo or Cairo....”
She could scarcely remember at all the immense house of chocolate-
colored stone fitted with fantastic turrets and balconies that overlooked
Lake Michigan. It had been sold and torn down long ago, destroyed like all
else that belonged to the far-off past. She could not remember the father
who had died when she was three; but of him there remained at least a
yellowing photograph of a great, handsome, brawny man with a humorous
Scotch-Irish face, who had died at the moment when his name was coming
to be known everywhere as a power in Washington. No, nothing remained
of him save the old photograph, and the tenuous, mocking little smile which
had come down to her, the way she had of saying, “Yes! Yes!” pleasantly
when she meant to act in quite the contrary fashion.
There were times when the memory of her own mother became vague
and fantastic, as if she had been no more than a figure out of some absurd
photograph of the early nineteen hundreds ... the figure of a pretty woman,
dressed fashionably in clothes that flowed away in both directions, from a
wasp waist. It was like a figure out of one of those old photographs which
one views with a kind of melancholy amusement. She remembered a vain,
rather selfish and pretty woman, fond of flattery, who had been shrewd
enough never to marry any one of those gallant dark gentlemen with high-
sounding titles who came to call at the eternal changeless hotel sitting-
room, to take her out to garden parties and fêtes and races. And always in
the background of the memory there was the figure of a dark little girl,
overflowing with spirits and a hunger for friends, who was left behind to
amuse herself by walking out with the Swiss governess, to make friends
among the children she encountered in the parks or on the beaches and the
boulevards of whatever European city her mother was visiting at the
moment ... friends whom she saw to-day and who were vanished to-morrow
never to be seen again. Her mother, she saw now, belonged to the America
of the nineties. She saw her now less as a real person than a character out of
a novel by Mrs. Wharton.
But she had never remarried; she had remained the rich, pretty Mrs.
McConnel of Chicago until that tragic day (the clearest of all Olivia’s
memories and the most terrible) when she had died of fever abruptly in a
remote and squalid Italian village, with only her daughter (a girl of
seventeen), a quack doctor and the Russian driver of her motor to care for
her.
The procession of confused and not-too-cheerful memories came to a
climax in a gloomy, red brick house off Washington Square, where she had
gone as an orphan to live with a rigid, bejetted, maternal aunt who had
believed that the whole world revolved about Lenox, the Hudson River
Valley and Washington Square—an aunt who had never spoken to Olivia’s
father because she, like Anson and Aunt Cassie, had a prejudice against

You might also like