You are on page 1of 53

Environmental and Ecological Statistics

with R Second Edition Song S. Qian


Visit to download the full and correct content document:
https://textbookfull.com/product/environmental-and-ecological-statistics-with-r-second
-edition-song-s-qian/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Spatial analysis with R: statistics, visualization, and


computational methods Second Edition Oyana

https://textbookfull.com/product/spatial-analysis-with-r-
statistics-visualization-and-computational-methods-second-
edition-oyana/

China s Environmental Governing and Ecological


Civilization 1st Edition Jiahua Pan (Auth.)

https://textbookfull.com/product/china-s-environmental-governing-
and-ecological-civilization-1st-edition-jiahua-pan-auth/

Statistics in Engineering With Examples in MATLAB and R


Second Edition Chapman Hall CRC Texts in Statistical
Science Andrew Metcalfe

https://textbookfull.com/product/statistics-in-engineering-with-
examples-in-matlab-and-r-second-edition-chapman-hall-crc-texts-
in-statistical-science-andrew-metcalfe/

Learn R for Applied Statistics: With Data


Visualizations, Regressions, and Statistics 1st Edition
Eric Goh Ming Hui

https://textbookfull.com/product/learn-r-for-applied-statistics-
with-data-visualizations-regressions-and-statistics-1st-edition-
eric-goh-ming-hui/
Ecological and environmental physiology of mammals 1st
Edition Bozinovic

https://textbookfull.com/product/ecological-and-environmental-
physiology-of-mammals-1st-edition-bozinovic/

Business Statistics with Solutions in R 1st Edition


Mustapha Abiodun Akinkunmi

https://textbookfull.com/product/business-statistics-with-
solutions-in-r-1st-edition-mustapha-abiodun-akinkunmi/

Mathematical Statistics With Applications in R (Third


Edition) Kandethody M. Ramachandran

https://textbookfull.com/product/mathematical-statistics-with-
applications-in-r-third-edition-kandethody-m-ramachandran/

Matrix Differential Calculus with Applications in


Statistics and Econometrics 3rd Edition Jan R. Magnus

https://textbookfull.com/product/matrix-differential-calculus-
with-applications-in-statistics-and-econometrics-3rd-edition-jan-
r-magnus/

Robust statistical methods with R Second Edition


Jure■ková

https://textbookfull.com/product/robust-statistical-methods-with-
r-second-edition-jureckova/
ENVIRONMENTAL
AND ECOLOGICAL
STATISTICS WITH R
Second Edition
CHAPMAN & HALL/CRC
APPLIED ENVIRONMENTAL STATISTICS
University of North Carolina
Series Editors
TATISTICS
Doug Nychka Richard L. Smith Lance Waller
Institute for Mathematics Department of Statistics & Department of Biostatistics
Applied to Geosciences Operations Research Rollins School of
National Center for University of North Carolina Public Health
Atmospheric Research Chapel Hill, USA Emory University
Boulder, CO, USA Atlanta, GA, USA

Published Titles

Michael E. Ginevan and Douglas E. Splitstone, Statistical Tools for


Environmental Quality
Timothy G. Gregoire and Harry T. Valentine, Sampling Strategies for Natural
Resources and the Environment
Daniel Mandallaz, Sampling Techniques for Forest Inventory
Bryan F. J. Manly, Statistics for Environmental Science and Management,
Second Edition
Bryan F. J. Manly and Jorge A. Navarro Alberto, Introduction to Ecological
Sampling
Steven P. Millard and Nagaraj K. Neerchal, Environmental Statistics with
S Plus
Wayne L. Myers and Ganapati P. Patil, Statistical Geoinformatics for Human
Environment Interface
Nathaniel K. Newlands, Future Sustainable Ecosystems: Complexity, Risk
and Uncertainty
Éric Parent and Étienne Rivot, Introduction to Hierarchical Bayesian
Modeling for Ecological Data
Song S. Qian, Environmental and Ecological Statistics with R,
Second Edition
Thorsten Wiegand and Kirk A. Moloney, Handbook of Spatial Point-Pattern
Analysis in Ecology
ENVIRONMENTAL
AND ECOLOGICAL
STATISTICS WITH R
Second Edition

Song S. Qian
The University of Toledo
Ohio, USA

Boca Raton London New York

CRC Press is an imprint of the


Taylor & Francis Group, an informa business
A CHAPMAN & HALL BOOK
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2017 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper


Version Date: 20160825

International Standard Book Number-13: 978-1-4987-2872-0 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
In memory of my grandmother 张一贯,mother 仲泽庆, and father 钱拙.
Contents

Preface xiii

List of Figures xvii

List of Tables xxiii

I Basic Concepts 1
1 Introduction 3

1.1 Tool for Inductive Reasoning . . . . . . . . . . . . . . . . . . 3


1.2 The Everglades Example . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Statistical Issues . . . . . . . . . . . . . . . . . . . . . 10
1.3 Effects of Urbanization on Stream Ecosystems . . . . . . . . 14
1.3.1 Statistical Issues . . . . . . . . . . . . . . . . . . . . . 15
1.4 PCB in Fish from Lake Michigan . . . . . . . . . . . . . . . 16
1.4.1 Statistical Issues . . . . . . . . . . . . . . . . . . . . . 16
1.5 Measuring Harmful Algal Bloom Toxin . . . . . . . . . . . . 17
1.6 Bibliography Notes . . . . . . . . . . . . . . . . . . . . . . . 18
1.7 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 A Crash Course on R 19

2.1 What is R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Getting Started with R . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 R Commands and Scripts . . . . . . . . . . . . . . . . 21
2.2.2 R Packages . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.3 R Working Directory . . . . . . . . . . . . . . . . . . . 22
2.2.4 Data Types . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.5 R Functions . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Getting Data into R . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.1 Functions for Creating Data . . . . . . . . . . . . . . . 29
2.3.2 A Simulation Example . . . . . . . . . . . . . . . . . . 31
2.4 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.1 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . 35
2.4.1.1 Missing Values . . . . . . . . . . . . . . . . . 36

vii
viii Contents

2.4.2 Subsetting and Combining Data . . . . . . . . . . . . 36


2.4.3 Data Transformation . . . . . . . . . . . . . . . . . . . 38
2.4.4 Data Aggregation and Reshaping . . . . . . . . . . . . 38
2.4.5 Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Statistical Assumptions 47

3.1 The Normality Assumption . . . . . . . . . . . . . . . . . . . 48


3.2 The Independence Assumption . . . . . . . . . . . . . . . . . 54
3.3 The Constant Variance Assumption . . . . . . . . . . . . . . 55
3.4 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . 56
3.4.1 Graphs for Displaying Distributions . . . . . . . . . . 57
3.4.2 Graphs for Comparing Distributions . . . . . . . . . . 59
3.4.3 Graphs for Exploring Dependency among Variables . . 61
3.5 From Graphs to Statistical Thinking . . . . . . . . . . . . . . 69
3.6 Bibliography Notes . . . . . . . . . . . . . . . . . . . . . . . 72
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4 Statistical Inference 77

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Estimation of Population Mean and Confidence Interval . . . 78
4.2.1 Bootstrap Method for Estimating Standard Error . . . 86
4.3 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . 90
4.3.1 t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.2 Two-Sided Alternatives . . . . . . . . . . . . . . . . . 98
4.3.3 Hypothesis Testing Using the Confidence Interval . . . 99
4.4 A General Procedure . . . . . . . . . . . . . . . . . . . . . . 101
4.5 Nonparametric Methods for Hypothesis Testing . . . . . . . 102
4.5.1 Rank Transformation . . . . . . . . . . . . . . . . . . 102
4.5.2 Wilcoxon Signed Rank Test . . . . . . . . . . . . . . . 103
4.5.3 Wilcoxon Rank Sum Test . . . . . . . . . . . . . . . . 104
4.5.4 A Comment on Distribution-Free Methods . . . . . . 106
4.6 Significance Level α, Power 1 − β, and p-Value . . . . . . . . 109
4.7 One-Way Analysis of Variance . . . . . . . . . . . . . . . . . 116
4.7.1 Analysis of Variance . . . . . . . . . . . . . . . . . . . 117
4.7.2 Statistical Inference . . . . . . . . . . . . . . . . . . . 119
4.7.3 Multiple Comparisons . . . . . . . . . . . . . . . . . . 121
4.8 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.8.1 The Everglades Example . . . . . . . . . . . . . . . . 127
4.8.2 Kemp’s Ridley Turtles . . . . . . . . . . . . . . . . . . 128
4.8.3 Assessing Water Quality Standard Compliance . . . . 134
4.8.4 Interaction between Red Mangrove and Sponges . . . 137
4.9 Bibliography Notes . . . . . . . . . . . . . . . . . . . . . . . 142
Contents ix

4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

II Statistical Modeling 147


5 Linear Models 149

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 149


5.2 From t-test to Linear Models . . . . . . . . . . . . . . . . . . 152
5.3 Simple and Multiple Linear Regression Models . . . . . . . . 154
5.3.1 The Least Squares . . . . . . . . . . . . . . . . . . . . 154
5.3.2 Regression with One Predictor . . . . . . . . . . . . . 156
5.3.3 Multiple Regression . . . . . . . . . . . . . . . . . . . 158
5.3.4 Interaction . . . . . . . . . . . . . . . . . . . . . . . . 160
5.3.5 Residuals and Model Assessment . . . . . . . . . . . . 162
5.3.6 Categorical Predictors . . . . . . . . . . . . . . . . . . 170
5.3.7 Collinearity and the Finnish Lakes Example . . . . . . 174
5.4 General Considerations in Building a Predictive Model . . . 185
5.5 Uncertainty in Model Predictions . . . . . . . . . . . . . . . 189
5.5.1 Example: Uncertainty in Water Quality Measurements 191
5.6 Two-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . 193
5.6.1 ANOVA as a Linear Model . . . . . . . . . . . . . . . 193
5.6.2 More Than One Categorical Predictor . . . . . . . . . 195
5.6.3 Interaction . . . . . . . . . . . . . . . . . . . . . . . . 198
5.7 Bibliography Notes . . . . . . . . . . . . . . . . . . . . . . . 200
5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

6 Nonlinear Models 209

6.1 Nonlinear Regression . . . . . . . . . . . . . . . . . . . . . . 209


6.1.1 Piecewise Linear Models . . . . . . . . . . . . . . . . . 220
6.1.2 Example: U.S. Lilac First Bloom Dates . . . . . . . . 226
6.1.3 Selecting Starting Values . . . . . . . . . . . . . . . . 229
6.2 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
6.2.1 Scatter Plot Smoothing . . . . . . . . . . . . . . . . . 240
6.2.2 Fitting a Local Regression Model . . . . . . . . . . . . 243
6.3 Smoothing and Additive Models . . . . . . . . . . . . . . . . 245
6.3.1 Additive Models . . . . . . . . . . . . . . . . . . . . . 245
6.3.2 Fitting an Additive Model . . . . . . . . . . . . . . . . 248
6.3.3 Example: The North American Wetlands Database . . 250
6.3.4 Discussion: The Role of Nonparametric Regression
Models in Science . . . . . . . . . . . . . . . . . . . . 254
6.3.5 Seasonal Decomposition of Time Series . . . . . . . . 259
6.3.5.1 The Neuse River Example . . . . . . . . . . 261
6.4 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . 267
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
x Contents

7 Classification and Regression Tree 271

7.1 The Willamette River Example . . . . . . . . . . . . . . . . . 272


7.2 Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . 275
7.2.1 Growing and Pruning a Regression Tree . . . . . . . . 277
7.2.2 Growing and Pruning a Classification Tree . . . . . . 285
7.2.3 Plotting Options . . . . . . . . . . . . . . . . . . . . . 289
7.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
7.3.1 CART as a Model Building Tool . . . . . . . . . . . . 293
7.3.2 Deviance and Probabilistic Assumptions . . . . . . . . 297
7.3.3 CART and Ecological Threshold . . . . . . . . . . . . 298
7.4 Bibliography Notes . . . . . . . . . . . . . . . . . . . . . . . 300
7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

8 Generalized Linear Model 303

8.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . 305


8.1.1 Example: Evaluating the Effectiveness of UV as a
Drinking Water Disinfectant . . . . . . . . . . . . . . 306
8.1.2 Statistical Issues . . . . . . . . . . . . . . . . . . . . . 307
8.1.3 Fitting the Model in R . . . . . . . . . . . . . . . . . . 308
8.2 Model Interpretation . . . . . . . . . . . . . . . . . . . . . . 309
8.2.1 Logit Transformation . . . . . . . . . . . . . . . . . . 310
8.2.2 Intercept . . . . . . . . . . . . . . . . . . . . . . . . . 310
8.2.3 Slope . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
8.2.4 Additional Predictors . . . . . . . . . . . . . . . . . . 312
8.2.5 Interaction . . . . . . . . . . . . . . . . . . . . . . . . 314
8.2.6 Comments on the Crypto Example . . . . . . . . . . . 315
8.3 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
8.3.1 Binned Residuals Plot . . . . . . . . . . . . . . . . . . 316
8.3.2 Overdispersion . . . . . . . . . . . . . . . . . . . . . . 316
8.3.3 Seed Predation by Rodents: A Second Example of
Logistic Regression . . . . . . . . . . . . . . . . . . . . 319
8.4 Poisson Regression Model . . . . . . . . . . . . . . . . . . . . 332
8.4.1 Arsenic Data from Southwestern Taiwan . . . . . . . . 332
8.4.2 Poisson Regression . . . . . . . . . . . . . . . . . . . . 333
8.4.3 Exposure and Offset . . . . . . . . . . . . . . . . . . . 340
8.4.4 Overdispersion . . . . . . . . . . . . . . . . . . . . . . 341
8.4.5 Interactions . . . . . . . . . . . . . . . . . . . . . . . . 344
8.4.6 Negative Binomial . . . . . . . . . . . . . . . . . . . . 351
8.5 Multinomial Regression . . . . . . . . . . . . . . . . . . . . . 353
8.5.1 Fitting a Multinomial Regression Model in R . . . . . 354
8.5.2 Model Evaluation . . . . . . . . . . . . . . . . . . . . . 358
8.6 The Poisson-Multinomial Connection . . . . . . . . . . . . . 361
8.7 Generalized Additive Models . . . . . . . . . . . . . . . . . . 367
Contents xi

8.7.1 Example: Whales in the Western Antarctic Peninsula 369


8.7.1.1 The Data . . . . . . . . . . . . . . . . . . . . 371
8.7.1.2 Variable Selection Using CART . . . . . . . 371
8.7.1.3 Fitting GAM . . . . . . . . . . . . . . . . . . 374
8.7.1.4 Summary . . . . . . . . . . . . . . . . . . . . 378
8.8 Bibliography Notes . . . . . . . . . . . . . . . . . . . . . . . 380
8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

III Advanced Statistical Modeling 385


9 Simulation for Model Checking and Statistical Inference 387

9.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388


9.2 Summarizing Regression Models Using Simulation . . . . . . 390
9.2.1 An Introductory Example . . . . . . . . . . . . . . . . 390
9.2.2 Summarizing a Linear Regression Model . . . . . . . . 392
9.2.2.1 Re-transformation Bias . . . . . . . . . . . . 396
9.2.3 Simulation for Model Evaluation . . . . . . . . . . . . 397
9.2.4 Predictive Uncertainty . . . . . . . . . . . . . . . . . . 405
9.3 Simulation Based on Re-sampling . . . . . . . . . . . . . . . 408
9.3.1 Bootstrap Aggregation . . . . . . . . . . . . . . . . . . 410
9.3.2 Example: Confidence Interval of the CART-Based
Threshold . . . . . . . . . . . . . . . . . . . . . . . . . 411
9.4 Bibliography Notes . . . . . . . . . . . . . . . . . . . . . . . 414
9.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414

10 Multilevel Regression 417

10.1 From Stein’s Paradox to Multilevel Models . . . . . . . . . . 417


10.2 Multilevel Structure and Exchangeability . . . . . . . . . . . 421
10.3 Multilevel ANOVA . . . . . . . . . . . . . . . . . . . . . . . 425
10.3.1 Intertidal Seaweed Grazers . . . . . . . . . . . . . . . 426
10.3.2 Background N2 O Emission from Agriculture Fields . . 431
10.3.3 When to Use the Multilevel Model? . . . . . . . . . . 434
10.4 Multilevel Linear Regression . . . . . . . . . . . . . . . . . . 436
10.4.1 Nonnested Groups . . . . . . . . . . . . . . . . . . . . 447
10.4.2 Multiple Regression Problems . . . . . . . . . . . . . . 453
10.4.3 The ELISA Example—An Unintended Multilevel Modeling
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 464
10.5 Nonlinear Multilevel Models . . . . . . . . . . . . . . . . . . 465
10.6 Generalized Multilevel Models . . . . . . . . . . . . . . . . . 469
10.6.1 Exploited Plant Monitoring—Galax . . . . . . . . . . 470
10.6.1.1 A Multilevel Poisson Model . . . . . . . . . . 471
10.6.1.2 A Multilevel Logistic Regression Model . . . 474
xii Contents

10.6.2 Cryptosporidium in U.S. Drinking Water—A Poisson


Regression Example . . . . . . . . . . . . . . . . . . . 478
10.6.3 Model Checking Using Simulation . . . . . . . . . . . 482
10.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . 486
10.8 Bibliography Notes . . . . . . . . . . . . . . . . . . . . . . . 489
10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489

11 Evaluating Models Based on Statistical Significance Testing 493

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 493


11.2 Evaluating TITAN . . . . . . . . . . . . . . . . . . . . . . . . 495
11.2.1 A Brief Description of TITAN . . . . . . . . . . . . . 496
11.2.2 Hypothesis Testing in TITAN . . . . . . . . . . . . . . 498
11.2.3 Type I Error Probability . . . . . . . . . . . . . . . . . 499
11.2.4 Statistical Power . . . . . . . . . . . . . . . . . . . . . 503
11.2.5 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . 511
11.2.6 Community Threshold . . . . . . . . . . . . . . . . . . 512
11.2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 513
11.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514

Bibliography 515

Index 529
Preface

I learned statistics from Bayesian statisticians. As a result, I do not pay


attention to hypothesis testing and p-values in my work. Likewise, I do
not emphasize the use of them in my teaching. However, most students
from my classes remember the term “statistically significant” (or p < 0.05)
better than anything and check the R2 value when evaluating a regression
model. I have talked to many of them on their experiences in learning
and using statistics to understand why they seem to be naturally drawn
to these numbers that few can explain clearly in plain language. I came to
a satisfactory explanation around 2007 when I read slides of a presentation
given by Dick De Veaux of Williams College entitled “Math is Music; Statistics
is Literature.” (This presentation is now available on YouTube.) According
to Dr. De Veaux, statistics is challenging to both students and instructors
alike, because we want to teach not only the mechanical part of statistics,
but also the process of making a judgment. As a statistics course is always
counted as a quantitative methods class, students naturally view statistics as
a mathematics class. But statistics is not mathematics. In a typical statistical
class for environmental/ecological graduate students, we typically use very
simple (but often tedious) mathematics. Students expect to learn statistics as
they learn mathematics. However, the mode of inference in mathematics is
deduction while the mode of inference of statistics is induction. As a result,
statistics cannot be learned by remembering rules and formulae. The process
of making a judgment requires putting the analysis in the context, combining
information from multiple sources, using logic and common sense. Learning
statistics is not about learning rules (as in mathematics) but more about
interpretation and synthesis, which requires experience (as in literature).
When deciding to write this book, I wanted to put together some examples
to illustrate the process of making a judgment and integrate these examples
to illustrate the iterative process of statistical inference. This process will
inevitably include more than one statistical topic. As a result, many examples
included in this book are used in multiple chapters. For example, I used the
PCB in fish example as an example of a two-sample t-test in Chapter 4, simple
and multiple regressions in Chapter 5, and an example of nonlinear regression
in Chapter 6. With these examples, I try to illuminate the difference between
how we learn statistics and how we use statistics. In learning statistics, we
learn by topics (e.g., from t-test to ANOVA to linear regression, and so on).
By the end of the class, students often see statistics as a collection of unrelated

xiii
xiv Preface

methods. When using statistics, we first must decide what is the nature of the
problem before deciding what statistical tools to use. This first step is not
always taught in a statistics class.
Using the PCB in fish example, I want to illustrate the iterative nature
of a statistical inference problem. We may not be able to identify the most
appropriate model at first. Through repeated effort on proposing the model,
identifying flaws of the proposed model, and revising the model, we hope to
reach a sensible conclusion. As a result, a statistical analysis must have subject
matter context. It is a process of sifting through data to find useful information
to achieve a specific objective. The basic problem of the PCB in fish example
is the risk of PCB exposure from consuming fish from Lake Michigan. The
initial use of the data showed a large difference between large and small fish
PCB concentrations. However, Figure 5.1 suggests that the difference between
small and large fish PCB concentrations cannot be adequately described by the
simple two sample t-test model. Throughout Chapter 5, I used this example
to discuss how a linear regression model should be evaluated and updated. In
Chapter 6, some alternative models are presented to summarize the attempts
made in the literature to correct the inadequacies of the linear models. But I
left Chapter 6 without a satisfactory model. In Chapter 9, I used this example
again to illustrate the use of simulation for model evaluation. While writing
Chapter 9, I discovered the length imbalance. In a way, this example shows
the typical outcome of a statistical analysis — no matter how hard we try, the
outcome is always not completely satisfactory. There are always more “what
if”s. However, the ability to ask “what if” is not easy to teach and learn,
because of the “seven unnatural acts of statistical thinking” required by a
statistical analysis: think critically, be skeptical, think about variation (rather
than about center), focus on what we don’t know, perfect the process, and
think about conditional probabilities and rare events [De Veaux and Velleman,
2008]. By examining the same problem from different angles, I hope to bring
home the essential message: statistical analysis is more than reporting a p-
value.
Since the publication of the first edition, I have learned more about the
problem of using statistical hypothesis testing. One part of these problems
lies in the terminology we use in statistical hypothesis testing. The term
“statistically significant” is particularly corruptive. The term has a specific
meaning with respect to the null hypothesis. But by declaring our result
to be “significant” without further explanation, we often mislead not only
the consumer of the result but also ourselves. In this edition, I removed the
term “statistically significant” whenever possible. Instead, I try to use plain
language to describe the meaning of a “significant” result. As I explained in
a guest editorial for the journal Landscape Ecology, a statistical result should
be measured by the MAGIC criteria of Abelson [1995]: a statistical inference
should be a principled argument and the strength of the inference should
be measured by Magnitude, Articulation, Generality, Interestingness, and
Credibility, not just a p-value or R2 or any other single statistic. Throughout
Preface xv

the book, I emphasize the interpretation of a fitted model and making


conclusions based on the context of the problem. I have followed the following
rules in all examples:
• Verbal description of a model – a clear description of the model using
nonstatistical terms should be a first step. When describing the model in
clear scientific terms, we can better judge whether the model is sensible
and whether the real world can be reasonably represented by the model.
Even for a simple model such as a t-test or ANOVA, a verbal description
can be helpful.
• Verifying model assumptions – plots, plots, and more plots.
• Verbal description of estimated model coefficients – before finalizing the
model, we should describe the estimated model coefficients in words.
This should be done even in a simple two-sample t-test.
The American Statistical Association issued a statement on p-values
[Wasserstein and Lazar, 2016]. The statement emphasizes that the use of
statistics should include the context of the problem, the process of data
collection and model formulation, and the purpose of the analysis. I will use
the statement as a required reading in my class during the first and last weeks
of the semester.
Major changes made in this edition include:
• New and revised Chapters and Sections

– Sections 1.2–1.5 describe main examples used in more than one


chapter.
– Chapter 2 is rewritten with a brief introduction to R and the use
of R for data manipulation.
– Section 5.1 is rewritten to use the PCB in fish example as the lead
for linear regression model.
– New section 5.3.1 introduces the ELISA data collected during the
Toledo water crisis in 2014.
– New section 6.1.3 presents the use of a self-starter function for
nonlinear regression.
– Sections 8.5–8.6 present the multinomial regression and the
connection between multinomial and Poisson models.
– Section 9.2 is revised to include nonlinear regression simulation.
– Two-way ANOVA is removed from section 10.3.
– Section 10.4.3 is added to introduce the ELISA example as a
multilevel modeling problem.
– Section 10.5 is added to introduce nonlinear multilevel models.
xvi Preface

– Section 10.6.1 uses new examples for generalized multilevel models.


– Chapter 11 is added to discuss the use of simulation in evaluating
hypothesis testing based methods. This chapter demonstrates the
importance of putting a statistical test in the context of a real-
world problem. We should ask: what is the scientific problem at
hand, what is the null hypothesis in the context of the problem,
what alternatives are supported when the null is rejected? Once
these questions are answered, we often have a better understanding
of the problem and can be better prepared for making a sound
judgment.
• Exercises are added to the end of each chapter.
• Online materials (data and R code) are at GitHub (https://github.
com/songsqian/eesR).

Song S. Qian
Sylvania, Ohio, USA
July 2016
List of Figures

1.1 Map of WCA2A . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 RStudio screenshot when opened for the first time. . . . . . 20


2.2 RStudio screenshot with the R script file of this book open. 22
2.3 An example stream networks . . . . . . . . . . . . . . . . . 40

3.1 The standard normal distribution . . . . . . . . . . . . . . . 49


3.2 Everglades background TP concentration distribution . . . . 50
3.3 Normal Q-Q plot of the Everglades background TP
concentration . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4 TP in Lake Erie as a function of distance to Maumee . . . . 55
3.5 Comparing standard deviations using S-L plot . . . . . . . . 56
3.6 Histograms of Everglades TP concentrations . . . . . . . . . 57
3.7 An example quantile plot . . . . . . . . . . . . . . . . . . . . 58
3.8 Explaining the boxplot . . . . . . . . . . . . . . . . . . . . . 59
3.9 Additive versus multiplicative shift in Q-Q plot . . . . . . . 60
3.10 Bivariate scatter plot . . . . . . . . . . . . . . . . . . . . . . 62
3.11 Scatter plot matrix . . . . . . . . . . . . . . . . . . . . . . . 63
3.12 Scatter plot of North American Wetland Database . . . . . 64
3.13 Power transformation for normality . . . . . . . . . . . . . . 65
3.14 Daily PM2.5 concentrations in Baltimore . . . . . . . . . . . 67
3.15 Seasonal patterns of daily PM2.5 in Baltimore . . . . . . . . 68
3.16 Conditional plot of the air quality data . . . . . . . . . . . . 68
3.17 The iris data . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.1 Simulating the Central Limit Theorem . . . . . . . . . . . . 83


4.2 Distribution of sample standard deviation . . . . . . . . . . 84
4.3 Distribution of the 75th percentile of Everglades background
TP concentration . . . . . . . . . . . . . . . . . . . . . . . . 85
4.4 The t-distribution . . . . . . . . . . . . . . . . . . . . . . . . 93
4.5 Relationships between α, β, and p-value . . . . . . . . . . . 94
4.6 A two-sided test . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.7 Factors affecting statistical power . . . . . . . . . . . . . . . 110
4.8 Residuals from an ANOVA model . . . . . . . . . . . . . . . 120
4.9 S-L plot of residuals from an ANOVA model . . . . . . . . . 121
4.10 ANOVA residuals . . . . . . . . . . . . . . . . . . . . . . . . 122

xvii
xviii List of Figures

4.11 Normal quantile plot of ANOVA residuals . . . . . . . . . . 123


4.12 Annual precipitation in the Everglades National Park . . . . 128
4.13 Yearly variation in Everglades TP concentrations . . . . . . 129
4.14 Statistical power is a function of sample size. . . . . . . . . 136
4.15 Boxplots of the mangrove-sponge interaction data . . . . . . 138
4.16 Normal Q-Q plots of the mangrove-sponge interaction data 139
4.17 Pairwise comparison of the mangrove-sponge data . . . . . . 140

5.1 Q-Q plot comparing PCB in large and small fish . . . . . . 153
5.2 PCB in fish versus fish length . . . . . . . . . . . . . . . . . 154
5.3 Temporal trend of fish tissue PCB concentrations . . . . . . 157
5.4 Simple linear regression of the PCB example . . . . . . . . . 159
5.5 Multiple linear regression of the PCB example . . . . . . . . 160
5.6 Normal Q-Q plot of PCB model residuals . . . . . . . . . . 166
5.7 PCB model residuals vs. fitted . . . . . . . . . . . . . . . . . 167
5.8 S-L plot of PCB model residuals . . . . . . . . . . . . . . . . 168
5.9 Cook’s distance of the PCB model . . . . . . . . . . . . . . 169
5.10 The rfs plot of the PCB model . . . . . . . . . . . . . . . . . 170
5.11 Modified PCB model residuals vs. fitted . . . . . . . . . . . 173
5.12 Finnish lakes example: bivariate scatter plots . . . . . . . . 175
5.13 Conditional plot: chlorophyll a against TP conditional on TN
(no interaction) . . . . . . . . . . . . . . . . . . . . . . . . . 178
5.14 Conditional plot: chlorophyll a against TN conditional on TP
(no interaction) . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.15 Finnish lakes example: interaction plots (no interaction) . . 180
5.16 Conditional plot: chlorophyll a against TP conditional on TN
(positive interaction) . . . . . . . . . . . . . . . . . . . . . . 182
5.17 Conditional plot: chlorophyll a against TN conditional on TP
(positive interaction) . . . . . . . . . . . . . . . . . . . . . . 183
5.18 Finnish lakes example: interaction plots (positive interaction) 184
5.19 Finnish lakes example: interaction plots (negative interaction) 184
5.20 Box–Cox likelihood plot for response variable transformation 188
5.21 ELISA standard curve and prediction uncertainty . . . . . . 193

6.1 Nonlinear PCB model . . . . . . . . . . . . . . . . . . . . . 211


6.2 Nonlinear PCB model residuals normal Q-Q plot . . . . . . 212
6.3 Nonlinear PCB model residuals vs. fitted PCB . . . . . . . . 213
6.4 Nonlinear PCB model residuals S-L plot . . . . . . . . . . . 214
6.5 Nonlinear PCB model residuals distribution . . . . . . . . . 214
6.6 Four nonlinear PCB models . . . . . . . . . . . . . . . . . . 219
6.7 Simulated % PCB reduction from 2000 to 2007 . . . . . . . 219
6.8 The hockey stick model . . . . . . . . . . . . . . . . . . . . . 222
6.9 The piecewise linear regression model . . . . . . . . . . . . . 223
6.10 The estimated piecewise linear regression model for selected
years . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
List of Figures xix

6.11 First bloom dates of lilacs in North America . . . . . . . . . 227


6.12 All first bloom dates of lilacs in North America . . . . . . . 230
6.13 Toledo ELISA standard curve data . . . . . . . . . . . . . . 231
6.14 Toledo ELISA model diagnostics 1 . . . . . . . . . . . . . . 238
6.15 Toledo ELISA model diagnostics 2 . . . . . . . . . . . . . . 239
6.16 A moving average smoother . . . . . . . . . . . . . . . . . . 242
6.17 A loess smoother . . . . . . . . . . . . . . . . . . . . . . . . 244
6.18 Graphical presentation of a multiple linear regression model 246
6.19 Graphical presentation of a multiple linear regression model
with log-transformation . . . . . . . . . . . . . . . . . . . . . 247
6.20 Graphical presentation of a multiple linear regression model
with log-transformation . . . . . . . . . . . . . . . . . . . . . 247
6.21 Additive model of PCB in the fish . . . . . . . . . . . . . . . 248
6.22 Effects of smoothing parameter . . . . . . . . . . . . . . . . 250
6.23 The North American Wetlands Database . . . . . . . . . . . 252
6.24 The effluent concentration–loading rate relationship . . . . . 253
6.25 Fitted additive model using mgcv default . . . . . . . . . . . 253
6.26 Contour plot of a two-variable smoother fitted using gam . . 256
6.27 Three-dimensional perspective plot of a two variable smoother
fitted using gam . . . . . . . . . . . . . . . . . . . . . . . . . 257
6.28 The one-gram rule model . . . . . . . . . . . . . . . . . . . . 258
6.29 Fitted additive model using user-selected smoothness parameter
value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
6.30 CO2 time series from Mauna Loa, Hawaii . . . . . . . . . . 259
6.31 Fecal coliform time series from the Neuse River . . . . . . . 264
6.32 STL model of fecal coliform time series from the Neuse River 265
6.33 STL model of total phosphorus time series from the Neuse
River . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
6.34 Long-term trend of TKN in the Neuse River . . . . . . . . . 268

7.1 A classification tree of the iris data . . . . . . . . . . . . . . 274


7.2 Classification rules for the iris data . . . . . . . . . . . . . . 275
7.3 Diuron concentrations in the Willamette River Basin . . . . 278
7.4 First diuron CART model . . . . . . . . . . . . . . . . . . . 280
7.5 Cp-plot of the diuron CART model . . . . . . . . . . . . . . 282
7.6 Pruned diuron CART model 1 . . . . . . . . . . . . . . . . . 283
7.7 Pruned diuron CART model 2 . . . . . . . . . . . . . . . . . 284
7.8 Quantile plot of diuron data . . . . . . . . . . . . . . . . . . 286
7.9 First diuron CART classification model . . . . . . . . . . . . 288
7.10 Cp-plot of the diuron classification model . . . . . . . . . . 289
7.11 Pruned diuron classification model . . . . . . . . . . . . . . 290
7.12 CART plot option 1 . . . . . . . . . . . . . . . . . . . . . . 291
7.13 CART plot option 2 . . . . . . . . . . . . . . . . . . . . . . 292
7.14 CART plot option 3 . . . . . . . . . . . . . . . . . . . . . . 294
7.15 Alternative diuron classification models . . . . . . . . . . . . 296
xx List of Figures

8.1 A dose-response curve . . . . . . . . . . . . . . . . . . . . . 310


8.2 Logit transformation . . . . . . . . . . . . . . . . . . . . . . 311
8.3 Mice infectivity data . . . . . . . . . . . . . . . . . . . . . . 313
8.4 Logistic regression residuals . . . . . . . . . . . . . . . . . . 317
8.5 The binned residual plot . . . . . . . . . . . . . . . . . . . . 317
8.6 Seed predation versus seed weight . . . . . . . . . . . . . . . 320
8.7 Seed predation over time . . . . . . . . . . . . . . . . . . . . 323
8.8 Time varying seed predation rate . . . . . . . . . . . . . . . 324
8.9 Probability of predation by time and seed weight . . . . . . 325
8.10 Probability of seed predation as a function of seed weight . 328
8.11 Seed weight and topographic class interaction . . . . . . . . 330
8.12 Binned residual plot of the seed predation model . . . . . . 331
8.13 Arsenic in drinking water data 1 . . . . . . . . . . . . . . . . 336
8.14 Arsenic in drinking water data 2 . . . . . . . . . . . . . . . . 337
8.15 Arsenic in drinking water data 3 . . . . . . . . . . . . . . . . 338
8.16 Arsenic in drinking water data 4 . . . . . . . . . . . . . . . . 339
8.17 Raw versus standardized residuals of an additive Poisson
model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
8.18 Fitted overdispersed Poisson model . . . . . . . . . . . . . . 347
8.19 Fitted overdispersed Poisson model with age as a covariate . 350
8.20 Residuals of a Poisson model . . . . . . . . . . . . . . . . . . 351
8.21 Tolerance group multinomial model 1 . . . . . . . . . . . . . 357
8.22 Tolerance group multinomial model 2 . . . . . . . . . . . . . 359
8.23 Multinomial residual plot . . . . . . . . . . . . . . . . . . . . 361
8.24 The Poisson-multinomial connection . . . . . . . . . . . . . 364
8.25 Independent Poisson models for tolerance groups . . . . . . 365
8.26 Independent Poisson models of mayfly taxa . . . . . . . . . 366
8.27 Comparing mayfly taxa models . . . . . . . . . . . . . . . . 367
8.28 Antarctic whale survey locations . . . . . . . . . . . . . . . 370
8.29 Antarctic whale survey data scatter plots . . . . . . . . . . . 372
8.30 Antarctic whale survey CART model Cp plot . . . . . . . . 373
8.31 Antarctic whale survey CART (regression) model . . . . . . 373
8.32 Antarctic whale survey CART (classification) model . . . . 374
8.33 Antarctic whale survey Poisson GAM . . . . . . . . . . . . . 376
8.34 Residuals from GAM show overdispersion . . . . . . . . . . 378
8.35 Antarctic whale survey logistic GAM . . . . . . . . . . . . . 379

9.1 Fish tissue PCB reduction from 2002 to 2007 . . . . . . . . 398


9.2 Fish size versus year . . . . . . . . . . . . . . . . . . . . . . 398
9.3 Residuals as a measure of goodness of fit . . . . . . . . . . . 400
9.4 Simulation for model evaluation . . . . . . . . . . . . . . . . 401
9.5 Tail areas of selected PCB statistics . . . . . . . . . . . . . . 402
9.6 Cape Sable seaside sparrow population temporal trend . . . 403
9.7 Cape Sable seaside sparrow model simulation . . . . . . . . 404
9.8 ELISA test uncertainty . . . . . . . . . . . . . . . . . . . . . 409
List of Figures xxi

9.9 Bootstrapping for threshold confidence interval . . . . . . . 412

10.1 Seaweed grazer example comparing lm and lmer . . . . . . . 430


10.2 Comparisons of three data pooling methods in the N2 O
emission example . . . . . . . . . . . . . . . . . . . . . . . . 432
10.3 Logit transformation of soil carbon . . . . . . . . . . . . . . 434
10.4 N2 O emission as a function of soil carbon . . . . . . . . . . 435
10.5 The EUSE example data . . . . . . . . . . . . . . . . . . . . 437
10.6 EUSE example linear model coefficients . . . . . . . . . . . 440
10.7 Comparison of linear and multilevel regression . . . . . . . . 443
10.8 Multilevel model with a group level predictor . . . . . . . . 446
10.9 Antecedent agriculture land-use as a group level predictor . 448
10.10 Antecedent agriculture land-use and temperature as group-
level predictors . . . . . . . . . . . . . . . . . . . . . . . . . 450
10.11 Antecedent agriculture land-use and temperature interaction 452
10.12 Lake type-level multilevel model coefficients . . . . . . . . . 455
10.13 Conditional plots of oligotrophic lakes (TP) . . . . . . . . . 456
10.14 Conditional plots of oligotrophic lakes (TN) . . . . . . . . . 457
10.15 Conditional plots of eutrophic lakes (TP) . . . . . . . . . . . 458
10.16 Conditional plots of eutrophic lakes (TN) . . . . . . . . . . 459
10.17 Conditional plots of oligotrophic (P limited) lakes (TP) . . . 460
10.18 Conditional plots of oligotrophic (P limited) lakes (TN) . . 461
10.19 Conditional plots of oligotrophic/mesotrophic lakes (TP) . . 462
10.20 Conditional plots of oligotrophic/mesotrophic lakes (TN) . . 463
10.21 Random effects of ELISA model coefficients using SSfpl2 . 467
10.22 Random effects of ELISA model coefficients using SSfpl . . 469
10.23 Random effects (sites) of the Galax model . . . . . . . . . . 473
10.24 Large leaf density of the Galax model . . . . . . . . . . . . . 474
10.25 Large leaf proportion random effects . . . . . . . . . . . . . 476
10.26 Large leaf proportions . . . . . . . . . . . . . . . . . . . . . 477
10.27 System means of cryptosporidium in U.S. drinking water
systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
10.28 System mean distribution of cryptosporidium in the United
States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
10.29 Simulating cryptosporidium in U.S. drinking water systems 485

11.1 IV and z-score under the null model . . . . . . . . . . . . . 502


11.2 Permutation µ and σ under the null model . . . . . . . . . . 502
11.3 TITAN’s underlying models . . . . . . . . . . . . . . . . . . 505
11.4 IV and z-score under a linear model . . . . . . . . . . . . . 507
11.5 IV and z-score under a hockey stick model . . . . . . . . . . 508
11.6 IV and z-score under a step function model . . . . . . . . . 509
11.7 IV and z-score under a sigmoidal model . . . . . . . . . . . 509
11.8 IV and z-score under a sigmoidal model . . . . . . . . . . . 510
11.9 IV and z-score under a sigmoidal model . . . . . . . . . . . 510
List of Tables

2.1 An example data file . . . . . . . . . . . . . . . . . . . . . . 39


2.2 An example data frame . . . . . . . . . . . . . . . . . . . . . 39
2.3 Date formats in R date-time classes . . . . . . . . . . . . . . 43

3.1 Model-based percentiles versus data percentiles . . . . . . . 51

4.1 ANOVA table . . . . . . . . . . . . . . . . . . . . . . . . . . 119


4.2 Everglades data sample sizes . . . . . . . . . . . . . . . . . . 128

5.1 ANOVA table of a linear model . . . . . . . . . . . . . . . . 164


5.2 Linear model coefficients with two categorical predictors . . 197
5.3 Galton’s peas data . . . . . . . . . . . . . . . . . . . . . . . 203

6.1 Estimated piecewise linear model coefficients (and their


standard error) for the data used in Figure 6.11 . . . . . . . 228

8.1 Seed predation model intercepts . . . . . . . . . . . . . . . . 326


8.2 The arsenic in drinking water example data . . . . . . . . . 334
8.3 The arsenic standard effect in cancer death rates . . . . . . 341
8.4 Interactions between gender and cancer type . . . . . . . . . 345

10.1 Finnish lake type definition . . . . . . . . . . . . . . . . . . 455

xxiii
Part I

Basic Concepts

1
Chapter 1
Introduction

1.1 Tool for Inductive Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3


1.2 The Everglades Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Statistical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Effects of Urbanization on Stream Ecosystems . . . . . . . . . . . . . . . . . . 14
1.3.1 Statistical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 PCB in Fish from Lake Michigan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4.1 Statistical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Measuring Harmful Algal Bloom Toxin . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Bibliography Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.1 Tool for Inductive Reasoning


We learn from data, both experimental and observational data. Scientists
propose hypotheses about the underlying mechanism of the subject under
study. These hypotheses are then tested by comparing the logic consequences
of these hypotheses to the observed data. A hypothesis is a model about the
realworld. The logical consequence is what the model predicts. Comparing
model predictions and observations is to decide whether the proposed model
is likely to produce the observed data. A positive result provides evidence sup-
porting the proposed model, while a negative result is evidence against the
model. This process is a typical scientific inference process. The proper han-
dling of the uncertainty in data and in the model is often the difficulty in this
process. The role of statistics in scientific research is to provide quantitative
tools for bridging the gap between observed data and proposed models.
The foundation of modern statistics was laid down partly by R.A. Fisher
in his 1922 paper “On the Mathematical Foundations of Theoretical Statis-
tics” [Fisher, 1922]. In this paper, Fisher launched “the first large-scale attack
on the problem of estimation” [Bennett, 1971], and introduced a number of
influential new concepts, including the level of significance and the paramet-
ric model. These concepts and terms became part of the scientific lexicon
routinely used in environmental and ecological literature. The philosophical
contribution of the 1922 essay is Fisher’s conception of inference logic, the
“logic of inductive inference.” At the center of this inference logic is the role

3
4 Environmental and Ecological Statistics

of “models” – what is to be understood by a model, and how models are to


be embedded in the logic of inference. Fisher’s definition of the purpose of
statistics is perhaps the best description of the role of a model in statistical
inference:
In order to arrive at a distinct formulation of statistical problems,
it is necessary to define the task which the statistician sets him-
self: briefly, and in its most concrete form, the object of statistical
methods is the reduction of data. A quantity of data, which usu-
ally by its mere bulk is incapable of entering the mind, is to be
replaced by relatively few quantities which shall adequately repre-
sent the whole, or which, in other words, shall contain as much as
possible, ideally the whole, of the relevant information contained
in the original data.
This object is accomplished by constructing a hypothetical infinite
population, of which the actual data are regarded as constituting a
random sample. The law of distribution of this hypothetical popu-
lation is specified by relatively few parameters, which are sufficient
to describe it exhaustively in respect of all qualities under discus-
sion.
In other words, the objective of statistical methods is to find a parametric
model with a limited number of parameters that can be used to represent the
information contained in the observed data. A model serves both as a sum-
mary of the information in the data and a representation of a mathematical
generalization of the real problem. Once a model is established, it replaces
the data. Also in his 1922 essay, Fisher divided statistical problems into three
types:
1. Problems of specification – how to specify a model
2. Problems of estimation – how to estimate model parameters

3. Problems of distribution – how to describe probability distributions of


statistics derived from data.
Applications of statistics can be summarized as a three-step process of ad-
dressing these three problems.
The first step of problem solving is to propose a working model (or hypoth-
esis). The model inevitably has unknown parameters, which will be estimated
based on collected data. Once these parameters are estimated, we have a quan-
tified model to describe the variation of the variable of interest. As there can
be many alternative models, the quantified model must be verified.
Model specification requires scientific knowledge. Applications of statisti-
cal methods cannot be isolated from the real-world problems. Consequently,
applications of statistical methods must consider the characteristics of the real-
world problem and data on the one hand, and the mathematical properties of
Another random document with
no related content on Scribd:
The Project Gutenberg eBook of Laulu
vaakalinnusta
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.

Title: Laulu vaakalinnusta


Runoja

Author: Martti Haavio

Release date: November 29, 2023 [eBook #72259]

Language: Finnish

Original publication: Porvoo: WSOY, 1927

*** START OF THE PROJECT GUTENBERG EBOOK LAULU


VAAKALINNUSTA ***
LAULU VAAKALINNUSTA

Runoja

Kirj.

P. MUSTAPÄÄ [Martti Haavio]

Porvoossa, Werner Söderström Osakeyhtiö, 1927.

SISÄLLYS:
I

Laulu vaakalinnusta
Ruissalo
Kuuvannokassa
Tapaaminen
Haapsalun valkea nainen
Punaista ja kultaa
Viimeisestä illasta

II
Atlantis
Piilipuitten alla
Luuranko metsässä puhuu
Vainaja

III

Kaunas 1.
2. (Hämärä metsä)
Tivoli
Fürstenhof
Wintergarten

IV

Tripolis
1. Miss Annabel
2. Fatiman kuolo
Lasisesta silmästä
Yxi walitettava wirsi
Suomalainen ryijy
Puer natus in Betlehem
Vaatimattomasta hautauksesta
Legenda väsyneistä naisista
Sotamiehen hautauksesta
Mikko Puhtisesta

Dominus Krabbe
1. Pappilan nuorenherran kertomus
2. Piika Amandan kertomus
3. Renki Epramin kertomus
4. Kirkonvartija Optaatuksen kertomus
5. Pappilan neitien kertomus
6. Vanhan haudankaivajan kertomus

VI

Laulu esi-isilleni, laukunkantajille


Daidalos ja Ikaros
1. Ikaros
2. Daidalos ja Ikaros
Suvinen ilta
1. Laulu esi-isille
2. Työn laulu
3. Suvivirsi
4. Laulu isänmaasta
5. Kuoleman laulu

I
Se on eriskummallinen lintu, joka
ei ole kokonaan lintu. Mutta se
istuu kaukaisessa viheriäisessä
metsässä kaukaisen vaskivuoren
takana ja vartioitsee aarteita.
Sinä et voi nähdä sitä, oi
Duryodhana, mutta sen äänen
kuulet hiljaisina iltoina. Mutta
kun vaakalinnun äänen kuulet,
vyötä silloin kupeesi, oi
Duryodhana, ja kätkeydy, sillä
onni on saita.

Intialainen satu.
LAULU VAAKALINNUSTA

Vanhanaikainen runo

Ruissalo

Armas, sun kiharoissasi liehtoi suvinen tuuli, kun sinut


tapasin viileessä toukokuussa vuokkojen kukkimisaikaan,
sunnuntaina auringonläikkeessä Ruissalon rantatiellä.

Armas, näin, miten korvies pienet helmet


mustien hiustesi varjoon piiloittuivat,
sitten näin sinun silmäs, ja niitten kaivoon
upposi äkkiä sieluni jääden sinne.

Tammien alla me kuljimme iltaan asti.


Kuulimme peipposlintujen suloisen laulun.
Nousimme vuorille. Toukomettisen huilu
helähti laaksosta, miltei jalkaimme alta.

Poimimme alhosta vuokkoja silloin tällöin


ujoina, arkoina toistemme silmiä etsein.
Olimme hartaita. Turusta kuljetti tuuli
tuomiokirkosta suurien kellojen pauhun.

Itkimme melkein. Olimme onnellisia.


Silloin me kuulimme metsästä tumman soinnun:
kaukana jossain, oksien holviston alla
outo lintu valitti kummallisesti…

Kuuvannokassa

Katso me pakenimme, ja Kuuvannokkaan,


kiville, joita vaaleat laineet huuhtoo,
päädyimme viimein. Kahden, onnellisina
seisoimme katsoen
kauaksi Airistolle.

Näimme harmaitten lokkien kiirivän yllä


vaahtoharjaisten. Näimme hanhien aurat,
näimme jahtien myötätuulessa-lennon
ohitse aution
Viitakarin ja Kuuvan.

Meidän selkämme takana, hongistossa


Japhyx-tuuli helkytti kanteletta.
Kuuntelimme, kun Juhana herttua ajoi
raudikko-orhilla
ylitse samettiketoin.

Kauriitten hengitys kuului arkana silloin


mättäitten takaa ja saksanpeurojen laumat
kavahti säikkyen pähkinäpensaikosta
hurjien koirien
innosta läähättäessä.

Nihdit ja huovit syöksivät koirien jälkeen.


Katariina! Ja ehtoossa kellastuvassa
kutsui Juhana herttuatarta ja sitten
suutelivat he
hellästi mäntyjen alla.

Mutta rannalla syöksivät mustat laineet


jalkoihis asti, armas. Ja kuule, kuule!
Hongan latvassa valitti outo lintu.
Miten se huusi
ehtoossa himmentyvässä.

Tapaaminen

Odotin vanhan majurin haudan luona, heisipensaan


katveessa kirkkomaalla. Naakat hymyili kirkon harjalla, jota
puolipäivän kiihkeä aurinko paahtoi.

Alhaalla suitsusi kasteheinissä tuuli.


Jumala saarnasi kirkossa syntisille
milloin tuomiten heitä, milloin armoa jakain,
vihdoin keräten kaikki helmainsa turviin.

Urut syttyivät. Lukkarin ankara basso


nosti taivaitten luojalle ylistysvirren.
Sitten vaikeni kaikki, kun rahvas vaipui
siunaten itsensä syvälle penkkien ylle.

Isännät tulivat ensiksi kankein jaloin


suuresta ovesta, sitten emännät myöskin
hitaasti, jutellen, liinoja tiukalle solmein,
jollakin poskella vielä kyynelen uurto…

Ja kun ma nousin, näin sinun tulevan sieltä.


Silloin äkkiä kirkon perustus järkkyi.
Tanner hypähti. Vihreä valkeus syttyi.
Tornin huipussa veisasi viirikukko.

Haapsalun valkea nainen.

Me tulemme Ristin metsistä lävitse Läänemaan.


Me istumme Haapsalun valleille hetkeksi huoahtamaan.
Me kuulemme äänien sorinaa. Me näemme yllämme kuun,
joka sarvilyhdyllä vilkuttaa vanhaan Haapsaluun.

Ja pienet neidit vaeltavat teitä puistikon,


joka rauniolinnan ympärillä syyssuven-vehmas on.
Ja kun me nostamme katseemme, me näemme hämmästyen
linnan tuuliviirin päässä oudon linnun sen…

Oi, kuule, nuori neiti, luen selvästi kohtalos:


tulet onnelliseks, jos kuolet, ja onnettomaks, jos
elät rinnallani, sillä olen Poika Onneton!
Minun sydämessäni kaipuu tuntematon on.
Ja me lähdemme ääneti valleilta ja liitymme vaeltaviin.
Minä tunnen käsivarrestasi, että nyyhkytät niin
kuin elämäs olisit kadottanut. — Nainen valkoinen
on noussut kappelin ikkunaan armoa rukoillen.

Ja laaksossa huilu valittaa. Niin yksin se valittaa


ja kertoo Haapsalun valkean naisen kauhutarinaa.
Hän rakasti munkki-ritaria. Hänet kiinni muurattiin
ihan säälimättä linnankirkon Maarian-kappeliin.

Punaista ja kultaa

Tulipunainen aurinko vavahti äkkiä palamaan. Minä


unestani havahduin enkä noussut kumminkaan.

Minun ikkunani edessä


oli koivu kultainen.
Sen oksalta kuulin linnun
alakuloisen.

Minä uneksin hetken vielä:


näin prinsessan kumartuvan
ja suovan kerjuripojalle
jäähyväissuudelman.

Minun herätyskelloni soitti.


Minun oli mentävä nyt.
Tätä syksyaamua ajatellen
olin monesti värissyt.
Viimeisestä illasta

Istuimme kahden tanssilavan luona.


Ja sinä olit kalpea ja kaunis.
Ja hehkui pihlajissa lyhdyt tanssilavan luona.
Ja viulu yksin soitti, viulu soitti:
Olet kaunis, Marguerita.

Istuimme kahden tanssilavan luona.


Elämä katsoi meitä silmin hämmentävin
ja puhui meille rakkaudesta tanssilavan luona.
Ja viulu yksin soitti, viulu soitti:
Olet kaunis, Marguerita.

Istuimme kahden tanssilavan luona.


Yö takanamme oli niinkuin Suuri Murhe
ja itki meitä molempia tanssilavan luona.
Ja viulu yksin soitti, viulu soitti:
Olet kaunis, Marguerita.

Istuimme kahden tanssilavan luona.


Me emme voineet auttaa toisiamme.
Me nousimme ja vaelsimme tanssilavan luota
ja viulu yksin soitti, viulu soitti:
Olet kaunis, Marguerita.
II
ATLANTIS

Tule, anna kätesi mulle, tule, älä vapise niin. Minä olen sua
kauan odottanut. Saariin onnellisiin tänä päivänä lähtee
laivani. Jos tahdot mukaan, on sinun varalles hankittu piletti ja
hytti maksuton.

Ja jos sinä pelkäät, katso, miten väkevä olen ma.


Minä olen niin monta kertaa seilannut. Kuolema
on minusta ollut hupsu ja naurettava mies,
kun se on keikkunut märssykorissa. Ja, voi, sen, Herra ties,

olen monta kertaa nähnyt, kun se on tarrannut tyyrihin ja


ajanut laivan karille. Ja katso, kumminkin tänä päivänä seisoo
edessäsi kippari Peloton, joka viidestä laivarikosta pelastunut
on.

Niin lienee Luojan tarkoitus, että sinäkin nähdä saat,


mitä näkivät vuosisatoja sitten miehet onnekkaat:
meren takana, tyrskyjen takana saaren Atlantiin.
Se on viiniä täynnä ja hunajaa. Oi, älä vapise niin.

Miks vedät nuoren kätes pois mun karkeesta kourastain?


Minä lienen kyllä raaka mies ja kippariksi vain
mua sanovat merimieheni. Vaan kippari Peloton
kai hiukan toista sentään on kuin kippari Pelkuri on.

Sinä olet nuori ja kiharapää ja suloinenkin oot minun


mielestäni. Ja kernaasti nuo toiset sanokoot, ett' olet
liikkuvaluontoinen ja sydämet miehien olet tottunut leikkien
murskaamaan. Vaan, kuulehan, minä en

ole oppinut merta pelkäämään. Miks naista pelkäisin siis?


Olen yhdeksän merta kokenut ja laivarikkoa viis…
PIILIPUITTEN ALLA

(William Butler Yeatsin mukaan.)

Ma piilipuitten alta löysin sydänystäväin.


Hän kulki piilipuitten alla lumiliinapäin.
Hän halas, ett' ois rakkautemme kevyt lehti vain.
Vaan olin lapsellinen enkä käsittänyt lain.

Ja kun me virran kaltaalla taas käytiin kahdestaan,


hän koski lumikädellään mua olkaan kumaraan.
Hän halas, ett' ois elämämme kevyt kukkanen.
Vaan olin lapsellinen, kuljin sydän itkien.
LUURANKO METSÄSSÄ PUHUU

Oi kuule, Jumalan kasvot ovat meitä tähyilleet molemmin


silmin terävin ja kauaskantavin. Ne näkivät minut tylyksi ja
sinut lempeäks: ja kumminkin sinä et rakastanut, vaan minä
rakastin.

Oi kuule, Jumalan kädet ovat meihin tarttuneet:


ne kuljetti sinut kauas keskelle erämaan
ja minut aarniometsään, jossa oli kamalaa.
Ja sitten hän käski meitä luo toistemme vaeltamaan.

Oi kuule, Jumalan sanat ovat meidät tavoittaneet:


ne olivat hirvittävät, kun ne kiisivät maailmaan:
Te luulitte elämän helpoksi. Vaan kaikki on vaikeaa.
Joka askel tuo joko itkun tai kuoleman mukanaan.

Oi kuule, minä olin tyly ja kivisydäminen.


Minä lähdin ja eksyin ja kaaduin ja tässä olen nyt.
Ylt'ympärillä on metsä ja yö ja huuhkajat.
Minä olen kai kuollut ja yksin. Olen hyvin hämmästynyt.

Oi kuule, Jumala vaikeni ja istui synkistyen.


Hän näki minut. Ja sinut. Hän näki molemmat.
Minä toivoin, että hän tietäis, että olin väsynyt.
Ja kun hän sen tietäis, hän salleis, että tänne vaellat.

Oi, kuule, muttei hän sallinut, vaikka hän on hyvä mies,


mun kohtaavan sua milloinkaan silmin näkevin.
Hän huomas näet minut tylyks. Vaan sinut lempeäks.
Ja kumminkin sinä et rakastanut. Vaan minä rakastin.

You might also like