Professional Documents
Culture Documents
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:28:30.
Recipes for Science
KEY FEATURES
• Contemporary and historical examples of science from many fields of physical, life,
and social sciences.
• Visual aids to clarify and illustrate ideas.
• Text boxes to explore related topics.
• Plenty of exercises to ensure full student engagement and mastery of the information.
• Annotated ‘Further Reading’ sections at the end of each chapter.
• Final glossary with helpful definitions of key terms.
• A companion website with author-developed and crowdsourced materials, including
syllabi for courses using this textbook, bibliography of additional resources and
online materials, sharable PowerPoint presentations and lecture notes, and additional
exercises and extended projects.
Angela Potochnik is Associate Professor of Philosophy and Director of the Center for
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Matteo Colombo is Assistant Professor in the Tilburg Center for Logic, Ethics, and
Philosophy of Science, and in the Department of Philosophy at Tilburg University, the
Netherlands.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:28:30.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:28:30.
Recipes for Science
An Introduction to Scientific
Methods and Reasoning
Angela Potochnik
Matteo Colombo
Cory Wright
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:28:45.
First published 2019
by Routledge
711 Third Avenue, New York, NY 10017
and by Routledge
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2019 Taylor & Francis
The right of Angela Potochnik, Matteo Colombo, and Cory Wright to be identified
as authors of this work has been asserted by them in accordance with sections 77
and 78 of the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced or utilised
in any form or by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying and recording, or in any information
storage or retrieval system, without permission in writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or registered
trademarks and are used only for identification and explanation without intent
to infringe.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book has been requested
ISBN: 978-1-138-92072-9 (hbk)
ISBN: 978-1-138-92073-6 (pbk)
ISBN: 978-1-315-68687-5 (ebk)
Typeset in Berling
by Apex CoVantage, LLC
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:28:57.
For all the excellent teachers from whom we’ve learned our love of science
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:29:11.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:29:11.
Contents
1 What Is Science? 7
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:29:43.
viii Contents
Glossary 310
References 322
Index 327
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:29:43.
Figures and Tables
FIGURES
1.1 Notable early scientists studying carbon dioxide (CO2) and climate 9
1.2 Keeling curve: ongoing increase in atmospheric concentrations of CO2 10
1.3 Ice core data from Antarctica 10
1.4 Unprecedented increases in atmospheric CO2 in the past century 11
1.5 Scientists in the Persian Golden Age 18
1.6 Appearance of retrograde motion 19
1.7 (a) Schematic flowchart of simple falsificationism; (b) Karl Popper 26
1.8 Clever Hans and Wilhelm von Osten 34
1.9 Reorientation from geocentrism to heliocentrism 43
2.1 Illustrations of two crosses between pea plants 47
2.2 Western Electric’s Hawthorne factory illumination study 51
2.3 Isaac Newton’s illustration of his two-prism experiment 52
2.4 William Herschel’s experimental setup to test the relationship between
the color and temperature of light 54
2.5 Three scientists who contributed to our knowledge of light 55
2.6 Headlines reporting on Arthur Eddington’s observations during the 1919
eclipse, which confirmed Albert Einstein’s theory of general relativity 65
2.7 Mars Curiosity rover selfie taken on Mount Sharp (Aeolis Mons) on
Mars in 2015 74
2.8 Cholera epidemic, close-up of Snow’s Broad Street map 79
2.9 Phineas Gage posing with the rod that passed through his skull 81
2.10 Isaac Newton’s cannon thought experiment 86
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:30:07.
x List of Figures and Tables
4.3 (a) Flint Michigan water crisis (b) Lee Anne Walters, the Flint
citizen-scientist who initially requested water-testing 151
4.4 The black swan of the family (Black Australian swan surrounded by
Bewick’s swans) 154
4.5 (a) The Earth’s landmasses fit together a bit like puzzle pieces;
(b) Marie Tharp and Bruce Heezen 157
4.6 The pan-African dawn of Homo sapiens 162
5.1 Visualization of the conditional probability of rolling a number less than
four given that you roll an odd number 179
5.2 (a) Pie chart of a coffeeshop’s sales; (b) Bar chart of per capita national
beer consumption 185
5.3 (a) Histogram of a unimodal grade distribution; (b) Histogram of a
bimodal grade distribution 186
5.4 Examples of (a) uniform, (b) 艛-symmetric, and (c) 艚-symmetric
distributions; (d) Examples of asymmetric distributions 188
5.5 (a) Histogram of the Quiz 1 grade distribution in Table 5.2;
(b) Histogram of the Quiz 2 grade distribution in Table 5.3 193
5.6 Standard deviation in a normal distribution 195
5.7 An imagined scatterplot of the relationship between alcohol
consumption and decibel level in bars 196
5.8 A regression analysis of Galton’s data on the diameter of sweet pea seeds 198
5.9 Scatterplots depicting correlational strength and direction 199
5.10 Francis Galton 200
5.11 Visualizations for Exercise 5.17: (a) Average expenditure per dollar of
Indiana property tax, 2013; (b) Composite score GRE and academic
major; (c) Iris petal length; (d) Number of digs performed and
amphorae found 203
6.1 (a) Probability distribution of heads for 100 coin tosses; (b) Example of
normal distribution for a continuous variable 211
6.2 Diagram of the 68%–95%–99.7% rule for standard deviations 214
6.3 Four histograms of roughly normal distributions 219
6.4 Fabiola Gianotti, project leader and spokesperson for the ATLAS
experiment at CERN involved in the discovery of the Higgs boson
in July 2012 222
6.5 R. A. Fisher 225
Copyright © 2018. Taylor & Francis Group. All rights reserved.
6.6 Probability distribution of the number of guesses your friend will get
correct if she is randomly guessing 227
6.7 Thomas Bayes 235
7.1 Annual seismic activity in Oklahoma 1978–2017 243
7.2 USGS map showing locations of wells related to seismic activity
2014–2015 243
7.3 Visualization of the correlation between per capita consumption of
cheese and number of people who died from getting tangled in their
bedsheets 248
7.4 Generic causal graph with nodes representing variables of interest and
arrows representing direct causal relationships 264
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:30:07.
List of Figures and Tables xi
TABLES
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:30:07.
Acknowledgments
Many people have contributed to this book in a variety of ways. Thanks to Gila Sher,
who made possible Cory’s initial conversations with Senior Editor Andrew Beck with
Routledge. Without Gila’s encouragement, there never would have been a book proposal.
Andy’s initial vision for the book was crucial for framing the project, and his later edi-
torial guidance and support was matched only by his enduring patience and flexibility.
Thanks also to Routledge Development Editor Alison Daltroy and Editorial Assistants
Vera Lochtefeld and Emma Starr, along with the dozens of anonymous reviewers of both
the original proposal and the later completed manuscript draft. Their feedback left an
indelible imprint on what went into the book, as well as the final product that resulted.
Several students provided helpful research assistance. Nathan Sollenberger, Alejandro
Garcia, and Karina Laigo from the undergraduate research program at Cal State Long
Beach helped kick off the book proposal, and Christopher Laplante provided very helpful
editorial assistance at the end stages of production. Micah Freeman and Sahar Heydari
Fard at the University of Cincinnati provided valuable comments on the whole manu-
script and assistance with glossary compilation. Several colleagues provided extremely
helpful feedback on parts of the manuscript, including Zvi Biener, Vanessa Carbonell, Jan
Sprenger, Naftali Weinberger, and Nellie Wieland.
Angela owes a further debt of gratitude to Zvi Biener for working with her to design
the University of Cincinnati course How Science Works, which inspired her contributions
to the book. More generally, she deeply appreciates her colleagues and friends at the
University of Cincinnati, inside and outside of philosophy. She also thanks her family for
their patience during the periods of time when she was a bit lost to this project.
Cory is grateful to his family for their patience, and is looking forward to making up
for lost time. He would also like to thank Henk, whose unfailing devotion to this project
Copyright © 2018. Taylor & Francis Group. All rights reserved.
and daily emotional support and encouragement was as great as any hound’s could be.
Matteo would like to thank his colleagues and friends at the Tilburg Center for Logic,
Ethics, and Philosophy of Science (TiLPS), his family, and Chiara, for their encouragement,
inspiration, and care. During this project, he was generously supported by the Deutsche
Forschungsgemeinschaft (DFG) as part of the priority program New Frameworks of
Rationality [SPP1516], and by the Alexander von Humboldt Foundation. He would also
like to acknowledge Zio P.’s apt reminders of the quote constanter et non trepide.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:30:23.
Introduction
Science and Your Everyday Life
What do the American president Franklin D. Roosevelt, the Mexican painter Frida Kahlo,
and the Jamaican reggae trio Israel Vibration have in common? Many people today can’t
guess the correct answer: they all suffered from polio (or poliomyelitis), which can cause
paralysis and even death. This can be hard to guess because scientists and doctors have
successfully turned polio from a global health problem to mostly just a part of history.
Many other people throughout human history have suffered from this crippling infec-
tious disease—most of them young children. In 1952 alone, the polio epidemic ravaged
nearly 60,000 Americans. In 1955, led by the virologist Jonas Salk, a team of scientists
discovered a vaccine for polio. Thanks to the introduction of mass vaccination programs
immediately thereafter, polio cases have decreased worldwide by over 99%. Today, there
are only three countries where polio still exists: Pakistan, Afghanistan, and Nigeria. As of
2016, there were only 37 known cases remaining.
The eradication of polio counts among the most important human—and scientific—
achievements. Vaccination provides you with immunity, which protects you for life. Going
unvaccinated, in contrast, is a serious risk since polio is highly infectious and human
migration is rapid. Outbreaks are still possible. It’s a no-brainer that people should demand
that they and their children be vaccinated.
And yet, many people today are not vaccinated for polio. In wealthier countries, like
the US, UK, Italy, Australia, France, and Russia, the biggest challenges to vaccination come
from skeptics opposed to vaccination for ideological reasons and from mere complacency.
In other countries, like Nigeria, Pakistan, and Afghanistan, political and religious challenges
intertwine with issues of marginalization and feasibility. It is harder to deliver vaccina-
Copyright © 2018. Taylor & Francis Group. All rights reserved.
tions to at-risk communities, which might suffer from extreme poverty and lack needed
infrastructure. In any nation, communication of the effectiveness, safety, and public health
value of vaccination benefits from a sound understanding of the science of vaccines.
Unlike polio, HPV (human papilloma virus) is extremely widespread, with roughly
40% of the world’s population infected; it’s the most frequently sexually transmitted
disease in the world. Among other effects, HPV substantially increases the risk of vari-
ous types of cancer. There’s also a vaccine for HPV. It was first available in 2006, after
thorough testing for safety and efficacy. The World Health Organization (WHO) recom-
mends HPV vaccines as part of routine vaccinations in all countries.
This discussion of vaccination is meant to illustrate that understanding what’s involved
in good science and scientific reasoning is of extreme importance. At some point in their
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:30:40.
2 Introduction
lives, many people need to make a decision about whether to be vaccinated for some
disease or whether to have their children vaccinated. And sometimes vaccine skeptics
have louder voices than doctors and other vaccination advocates, so it can seem difficult
to get a clear account of vaccines’ safety, effectiveness, and necessity for public health.
The polio vaccine has undergone thorough testing for safety and efficacy, initially in a
study involving 1.2 million children and in many other studies since. The same is true for
other vaccinations, including the HPV vaccine. And claims from vaccine skeptics about
substantive risks of vaccination have been thoroughly debunked.
But don’t take it from us. Learn about scientific experiments so you can assess the
quality of vaccination studies. Learn sound and problematic forms of inference in order
to assess the scientific inferences supporting the use of vaccines (and the problems with
vaccine skeptics’ attempts to sow fear). Study causal reasoning so that you can critically
assess the weight of the evidence against claims about vaccination causing autism (which
has in fact been thoroughly debunked). These topics and others important for the critical
assessment of scientific findings and their public reception are the focus of this book.
As the case of vaccination suggests, scientific findings, and the public’s reactions to them,
dramatically shape our world. More than this, science also regularly and dramatically influ-
ences your life, whether or not you want it to. If this is not immediately apparent, that
may be because of the extensiveness of science’s reach. One way or another, everybody
is impacted by science.
The reach of science means that you have a lot to gain from being able to understand
and assess scientific reasoning. This enables you to make educated decisions about your own
and your family’s medical care. It also makes it possible for you to critically evaluate reports
of scientific findings and the credentials of experts in order to decide what to believe.
This ability is important, since so much of our daily life is impacted by scientific findings.
Here’s another example of unavoidable science related to health. Peanut allergies are
serious and develop early in life, and rates of this allergy are on the rise. In 2015, medi-
cal recommendations regarding when to introduce peanut products to babies changed
radically in the US from waiting until at least one year of age to introducing as early as
possible. Both waiting and then introducing early were said to reduce the risk of allergic
reaction. Should you follow this new advice for your baby, if and when you have one?
Copyright © 2018. Taylor & Francis Group. All rights reserved.
If the medical researchers were (apparently) wrong with the last recommendation, why
should you follow this new recommendation?
A sophisticated user of science is also well positioned to make judgments about sci-
ence more globally. Is it good for the government to fund basic scientific research? Is the
level of funding for medical research adequate? Should we worry if private corporations
fund science, given that governmental and university funding is in too short of supply?
Answers to these questions require a view about the status of the scientific enterprise as
a whole, how it should relate to society, and whether and how funding sources matter.
Scientists are, of course, the main practitioners of science. Other researchers have as
their primary focus understanding what science—and scientists—are up to. This latter
group is interested in understanding what science is and how it works, its pitfalls and
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:30:40.
Introduction 3
limitations, and its relationship with society. These topics are what this book is all about.
Several disciplines investigate science in this way; primary among these are history, phi-
losophy, and sociology. Historians have worked hard to make sense of the history of
science—how the events unfolded that contributed to making science what it is today.
Sociologists also study science, especially the social and cultural influences on how science
works and what it produces. This book draws from the history and sociology of science,
but its main approach is philosophical. There’s a simple reason for that: we, its authors,
are philosophers of science.
If you haven’t studied philosophy of science, it may sound obscure. But philosophy
of science is just a way of thinking hard about the scientific enterprise. It focuses espe-
cially on questions of what science should be like in order to be a trustworthy route to
knowledge and to achieve the other ends we want it to have, such as usefulness to society.
Although written from a philosophical perspective, this book does not dwell on philoso-
phers’ debates about science. Instead, we aim to use philosophical insights about science
without getting bogged down in controversies, technical terminology, or intricate details.
The title of this book, Recipes for Science, is meant to evoke two ideas about science. First,
recipes for baked and cooked items like bread, pies, and stir-fry come in lots of different
versions. Some differences are rather trivial, like whether measurements are in weight or
volume. Others are substantial, like whether a bread is leavened with yeast or with bak-
ing soda and powder. Enough substantial differences can result in very different products,
even products that go by the same name but contain entirely different ingredients. Science
is also like this. It proceeds in many different ways, and there’s no magical ingredient or
essential list of ingredients that guarantees good science.
At the same time, a recipe is a formula intended to lead to a specific outcome, with
an intentional combination of ingredients and use of methods to achieve that outcome.
Different recipes for a given type of food have certain elements in common, even if many
of their other features vary. So, for example, breads generally incorporate grain of some
kind as a major ingredient, most have a leavening agent of some kind, and they are cooked,
usually but not always by baking in the oven. There are family resemblances among dif-
ferent breads and the recipes used to make them, even if there’s no simple definition of
bread and no one recipe required to make bread.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Science is like this as well. Even as it proceeds in different ways, and even as there’s
no one overarching set of instructions or mechanical procedures that guarantees good
science, there are certain generalizations that can be made about how good science is
conducted. Many different activities count as science, and there are also differences in
how each of these activities is carried out. But there are also family resemblances among
instances of science, just as there are among breads.
This book aims to facilitate a clear understanding of the key elements of science and
why those elements are significant, even as it illustrates the tremendous variety of projects
that count as science.
The first three chapters address the nature of science and its key methods. Chapter 1
surveys what is distinctive and important about science while also showing how elusive
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:30:40.
4 Introduction
the very concept of science can be. We suggest a checklist approach to distinguishing
science from non-science and fake science and suggest—in lieu of a single, one-size-fits-
all method—that there are various recipes for science. Chapter 2 outlines the role of
experimentation in science and the features of a perfectly controlled experiment. Then
the chapter catalogues a range of methods for experimental and non-experimental studies
and discusses the advantages and disadvantages of each. Chapter 3 focuses on scientific
models: how they are constructed and used, and the main varieties in which they come.
The chapter ends by discussing the relationship between modeling and experimentation
and asking the question of what features of models contribute to their scientific value.
The next four chapters focus on scientific reasoning. Chapter 4 describes the primary
patterns of inference in science: deductive, inductive, and abductive reasoning. The chap-
ter starts with patterns of deductive inference and their use in scientific hypothesis-testing,
moves to the importance of and challenges with inductive inference, and then turns to
the scientific significance of abductive reasoning, also known as inference to the best
explanation. Chapter 5 surveys basic statistical methods, beginning with their basis in
probability theory and proceeding through descriptive statistics. Chapter 6 expands on
that discussion to outline inferential statistics, including sampling and hypothesis-testing.
The chapter ends by introducing the Bayesian approach to statistics and discussing some
of its differences from the classical approach. Chapter 7 engages with causal reasoning in
science. Topics include the nature of causation, the relationship between causal reasoning
and statistical reasoning, testing causal hypotheses, and causal modeling.
Finally, Chapter 8 examines the purpose of science and its relationship to society. We
address the nature of scientific explanation and scientific theories, how theory change and
progress in science occur, and how society and values influence science. The book closes
with a consideration of the current challenges facing science.
The intended audience for this book includes anyone who wants to have a more sophis-
ticated understanding of the nature of science and a stronger basis for assessing scientific
reasoning.
This book is not just for students of philosophy or science majors. Indeed, the primary
audience we had in mind as we developed this book is an undergraduate student in a
general education course, who may not take any additional science courses in college. We
Copyright © 2018. Taylor & Francis Group. All rights reserved.
asked ourselves, what would that student most benefit from knowing about how science
works? What episodes from historical and current science would that student be interested
to read about and contemplate?
That said, we expect this book will also be useful for some more specialized or more
advanced courses. These include science education courses, especially those that focus on
the nature of science and scientific reasoning. These also include introductory philosophy
of science courses, especially if supplemented with more canonical readings or readings
that address some of the major philosophical controversies about science. We also expect
this book to be of use in introductory science courses, especially methods courses, when
supplemented with appropriate material specific to the particular scientific field of study.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:30:40.
Introduction 5
This textbook was designed to be usable in its entirety in a standard 15-week semester.
Students spend one-third of the semester learning about the nature of science, including
the key features of science, experimentation, and scientific modeling (Chapters 1–3).
Most of the remaining semester is then spent learning about scientific reasoning, includ-
ing deductive, inductive, and abductive reasoning patterns; probability and statistics; and
causal reasoning (Chapters 4–7). The final unit of the course addresses the scientific suc-
cesses of explanations and theories and science’s relationship with society (Chapter 8).
Given the range of course levels and disciplines for which this book is appropriate,
and given the reality that different instructors have different teaching goals, we have also
designed the textbook to be usable in a variety of ways. The textbook is modular; each
chapter can be used independently from the others. Instructors (or independent readers)
can thus choose to use only the chapters that suit their needs. Each section may rely on
information provided in earlier sections of the same chapter but does not presume facility
with information from other chapters. Instructors may choose not to assign later sections
in some chapters that seem overly specialized or too difficult given the focus of their
course. Finally, some material that is more difficult or philosophical is separated from the
main text in boxes. Here, too, instructors can choose whether to assign material in boxes.
Here are a few examples of how this might play out in different courses. A critical
reasoning course focused on science may limit its attention to Chapters 4–7—deductive,
inductive, and abductive inference patterns, probability and statistics, and causal reasoning.
A science education course on the nature of science may use Chapters 1–3 and 8, address-
ing the key features of science, experimentation, modeling, and theories and explanations.
An introductory philosophy of science course might make use of Chapters 1–4 and 8,
supplemented with primary philosophical texts. Other introductory courses might use
the full book except for some of the more difficult sections, like 6.3 on Bayesian statistics
and 7.3 on causal modeling.
SUPPLEMENTARY MATERIALS
Each section in this book ends with a list of exercises. We have tried to provide exercises
that will solidify understanding or challenge students to apply what they have learned.
We encourage instructors to make use of these exercises for in-class group or individual
activities, homework, and exam questions. Individuals who are working through this book
independently might also benefit from completing some of the exercises.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
There is a list of suggested further reading at the end of each chapter, which provide
inroads into a more in-depth investigation of individual topics covered. The further read-
ing selections thus provide some options for instructors and individual readers who want
to focus on specific topics in more depth. At the end of the book, there is a glossary of
technical terms and other specialized vocabulary that students can consult as needed.
Terms defined in the glossary are indicated in the main text with bold and italics, as
‘philosophy of science’ was earlier in this introduction.
Finally, there is a website to accompany this textbook: www.routledge.com/
cw/potochnik. The website includes example syllabi for different kinds of courses
utilizing this text, additional exercises, and links to content available on the internet that
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:30:40.
6 Introduction
will enrich readers’ experience with the topics covered in this book. Because introductory
scientific reasoning courses are not yet offered at some universities, and in even fewer
philosophy departments, the website also provides information and links to information
about the value of such courses and why philosophy departments are good places to
house them.
EXERCISES
0.1 What do you expect to learn from this textbook and the course you’re reading it for?
0.2 What most concerns you about this textbook and the course you’re reading it for?
0.3 How do you think you will benefit from learning more about the nature of science and
scientific reasoning? Why or why not?
0.4 What do you think is most valuable about learning about science and scientific
reasoning?
0.5 Describe your relationship to science. To help you get started, you might consider the
following questions. Have you taken many courses in science or read about science
on your own? If so, on what topics? Do you know any scientists? Do you think there
are reasons to distrust or dislike science? If so, what are the reasons?
FURTHER READING
For more on HPV and vaccines, see World Health Organization (2017, May). Human
papillomavirus vaccines: WHO position paper. 92, 241–268. Retrieved from http://
apps.who.int/iris/bitstream/10665/255353/1/WER9219.pdf?ua=1
For a concise explanation of myths surrounding vaccines, see PublicHealth.org (2018, May).
Understanding vaccines: Vaccine myths debunked. Retrieved from https://www.public
health.org/public-awareness/understanding-vaccines/vaccine-myths-debunked/
For a concise overview of global health and vaccination, see Greenwood, B. (2014). The
contribution of vaccination to global health: past, present and future. Philosophical
Transactions of the Royal Society B, 369(1645), 20130433.
For a thorough treatment of immunization and vaccination, see World Health Organi-
zation, Research and development. Retrieved from www.who.int/immunization/
Copyright © 2018. Taylor & Francis Group. All rights reserved.
documents/research/en/
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:30:40.
CHAPTER 1
What Is Science?
greenhouse gases trap some of the heat in the atmosphere; this blanket of radiant heat
warms the planet’s surface, making it hospitable to life. But increasing amounts of green-
house gases trap increasing amounts of heat. As a result, mountain glaciers are shrinking
and ice sheets are melting in the Arctic, Greenland, and Antarctica; sea levels are rising;
precipitation patterns across seasons are more unstable; more droughts and heat waves
are occurring; and the blooming times of flowers and plants are shifting. All these changes
are consequences of global warming.
Second, the changing climate has other downstream effects. The rise in global tem-
perature and resulting climate changes threaten to push some animal and plant species to
extinction, collapse ecosystems, and make extreme weather more frequent. It also threat-
ens to destabilize social conditions. Drinking water will become scarcer and droughts more
frequent and severe; crop yields may decrease. Coastal cities and island nations are at risk
of serious floods and devastating hurricanes. In this way, climate change is also affecting
global health, poverty, hunger, and various nations’ security. Ultimately, global warming
will make the Earth less hospitable for all creatures, including humans, and probably also
a more unjust place in virtue of who will suffer and how this suffering will be managed.
Earth’s climate has never been static; it has been fluctuating for billions of years. Besides
the concentration of greenhouse gases, factors that affect it include variations in the Earth’s
orbit, the motion of tectonic plates, the impact of meteorites, and volcanism on the Earth’s
surface. So, what’s special about the current climate changes? Why is this different?
What’s special about the current changes in Earth’s climate is the role of human activi-
ties in generating them. The basic reasoning behind this conclusion is simple and clear.
We have known since the 18th century that burning carbon-based fossil fuels releases
carbon dioxide (CO2) into the atmosphere. During the last three centuries, at least since
the beginning of the Industrial Revolution, human activities have been releasing CO2
into the atmosphere at an unprecedented rate. Large-scale releases of CO2—one of the
greenhouse gases—into the atmosphere increase its heat retention, thus increasing the
Earth’s average global temperature. And scientists have in fact measured such an increase
in average global temperature. So it’s clear that human activity during the last couple of
centuries has increased the Earth’s average global temperature.
Systematic research on the relationship between CO2 emissions and climate change
began in the 19th century, when the American engineer Marsden Manson noted that
‘the rate at which a planet acquires heat from exterior sources is dependent upon the
power of its atmosphere to trap heat; very slight variations in the atmospheric constitu-
ents [produce] great variations in heat trapping power’ (Manson, 1893, p. 44). A few
years later, the Swedish physicist and chemist Svante August Arrhenius (1859–1927)
completed an extensive set of calculations, showing that the changes in CO2 function
as a ‘throttle’ on other greenhouse gases like water vapor. He also calculated that there
would be an Arctic temperature increase of approximately 8° Celsius (46.4° Fahrenheit)
from atmospheric carbon levels two to three times their known value at the time.
Arrhenius later predicted that ‘the slight percentage of carbonic acid in the atmosphere
may, by the advances of industry, be changed to a noticeable degree in the course of a
few centuries’ (1908, p. 54).
Just before the outbreak of World War II, a British steam engineer, Guy Callendar, pre-
sented a breakthrough paper to the Royal Meteorological Society entitled ‘The Artificial
Production of Carbon Dioxide and Its Influence on Temperature’. Callendar pointed out
that the atmospheric concentration of CO2 had significantly increased between 1900 and
FIGURE 1.1 Notable early scientists studying carbon dioxide (CO2) and climate
As man is now changing the composition of the atmosphere at a rate which must be
very exceptional on the geological time scale, it is natural to seek for the probable
effects of such a change. From the best laboratory observations it appears that the
principal result of increasing atmospheric carbon dioxide . . . would be a gradual
increase in the mean temperature of the colder regions of the earth.
(1939, p. 38)
410
400
390
CO2 concentration (ppm)
380
370
360
350
Annual Cycle
340
330
320
= 44 Jan Apr Jul Oct Jan
310
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
Years recorded at Mauna Loa Observatory
400
Ronne Ice
350 Shelf
CO2 concentration (ppm)
Dome A
Siple
SOUTH
S
Station POLE
V
Vostok
250
200
1750 1800 1850 1900 1950 2000
Antarctican ice core measurements (1700–1958)
demonstrate that there are multiple lines of evidence for increasing levels of CO2 in the
atmosphere (see Figure 1.4) and that the average temperature for the end of the 20th
century is higher than in the previous two millennia (Ahmed et al., 2013).
The unprecedented pace of current climate change and its connection to human activi-
ties like burning fossil fuels, cattle ranching, and clear-cutting rainforests are clear. In the
previous 800,000 years, the concentration of CO2 in the atmosphere had never been over
285 ppm. Since the Industrial Revolution—only 0.025% of the last 800,000 years—the
concentration has spiked to 412 ppm. The milestone of 400 ppm was reached in March
2015 (see www.co2.earth). CO2 concentration measured 409.39 on May 30th 2017, the
day before Donald Trump announced that he would withdraw the US from the Paris
Agreement. One year later, in May 2018, the concentration has risen more to 412 ppm.
The last time CO2 levels were this high, humans did not yet exist. The average tempera-
ture of our planet has gone up by about 0.85° Celsius (1.5° Fahrenheit) since 1880, and
the last three decades are estimated to have been the hottest in the last 1,400 years.
favorite restaurant with your dinner. The types of expertise required for these positions
takes years, even decades, to develop, and the expertise doesn’t neatly transfer from one
domain to another. Don’t trust the average climate scientist to fix your car or make you
a delicious meal. Similarly, politicians and policy-makers know things about political and
legislative matters, but they should not be looked to as authorities on climate change. This
includes politicians who deny climate change, as well as those who grant its existence.
Reputable scientists and scientific societies, including the national science academies
of the world and the Intergovernmental Panel on Climate Change (IPCC), agree that
human-caused, or anthropogenic, climate change is occurring. This includes virtually all
climatologists. In 2004, for instance, the historian of science Naomi Oreskes analyzed
928 abstracts on climate change published in peer-reviewed scientific journals from 1993
to 2003; none expressed disagreement with the consensus position that anthropogenic
climate change is occurring (Oreskes, 2004). In 2010, a group of researchers studied the
views of the top 200 climate scientists (defined as the scientists with the most extensive
publication records) and confirmed that more than 97% actively affirm the existence
of anthropogenic climate change as described by the IPCC (Anderegg et al., 2010). So
there is striking agreement among climate scientists about the existence of anthropogenic
climate change.
Climatologists’ agreement on climate change is grounded in a rich body of independent
sources of evidence that support the same conclusion: human activities are causing Earth’s
atmosphere to heat up. Well-established theories in physics explain how heat radiation
works. Physical chemistry shows how CO2 in the atmosphere traps heat, contributing
to greenhouse effects. As we pointed out, at least since the 1890s, scientists have known
about the relationship between CO2 buildup and average global temperature. Satellites
and other technology have enabled scientists to collect many different types of informa-
tion about relevant changes on our planet—including variations of sea level and of oceans’
temperatures, and the decreasing mass of polar ice sheets. Since the 1950s, scientific
models and computer simulations have been helping scientists to make testable predic-
tions about what would happen to the global climate in response to different changes in
human activities. Evidence has confirmed these predictions.
And, yet, despite decisive scientific evidence, public awareness and concern for climate
change lag behind the research (Lee et al., 2015). As of 2016, four out of every 10 adults
worldwide hadn’t even heard of climate change. Whether or not people are sensitive to
the risks of climate change mainly depends on understanding its human causes and on
one’s level of education. In some countries, like the US, however, being better educated
doesn’t guarantee that one is more likely to believe that climate change is really happen-
ing and is caused by human activities. Instead, political views are a better predictor of
Americans’ belief in and concern about the reality of climate change.
People who don’t know much about some topic also tend to experience an illusion
of understanding, where a lack of genuine understanding of some topic is linked to a
lack of appreciation for the depth of one’s ignorance about that topic. Applied to climate
change, this means that people who have no advanced education or training in science,
or who otherwise don’t understand how the climate works, tend to have unwarranted
confidence in their ability to assess scientific findings or make pronouncements about
climate change.
The illusion of understanding has become easier to sustain in today’s society. In part,
this is because finding information through internet searches (so-called Google knowing)
has diminished genuine understanding. We also have limited opportunities for productive
public discourse and disagreement; our conversations online and in person tend to happen
with people who have beliefs similar to our own.
Improving public climate literacy is thus important for informed public engagement
with global warming. And, more generally, understanding the processes that give rise to
trustworthy scientific knowledge is vitally important to deciding what to believe, whom
to believe about what, and how to learn more.
yields knowledge about the nature of light and color. Knowledge of these things may
have applications, but that is not why scientists study them. Scientific research that aims
at knowledge for its own sake is sometimes called basic research.
Not all knowledge is equally valuable. For example, it wouldn’t be valuable to know
how many rainbows have ever occurred on Earth; such truths are pointless truths. When
science aims for pure knowledge, the aim is explanatory knowledge, or generating knowl-
edge of how things work and why things are the way they are. We know so much about
our world, and we understand so many things because of scientific discoveries and
theories.
A different type of scientific research is applied research. Scientific research is applied
when it exploits knowledge in order to develop some product, like software, pharma-
ceutical drugs, or new materials. Often, a central motivation for applied research is to
generate products for profit. For example, the scientists who discovered the neurotrans-
mitter dopamine in the human brain in 1957, Kathleen Montagu and Arvid Carlsson,
were doing basic research; by contrast, scientists who are employed by pharmaceutical
companies to improve upon existing dopamine-related treatments for Parkinson’s disease
are doing applied research.
As this suggests, basic and applied scientific research can operate synergistically.
Scientists aiming at the production of knowledge for its own sake often rely on the new
materials and techniques created by scientists doing applied research, while scientists
doing applied research often exploit pure scientific knowledge in order to develop new
products.
Science’s Limitations
So science is our best route to knowledge about the world around us and to developing
innovations based on that knowledge. To appreciate science’s significance, it’s also impor-
tant to recognize what it doesn’t do.
Scientists try to gain knowledge about certain kinds of phenomena, or appearances
of things occurring in the world, and they do so in a certain kind of way. The list of the
phenomena investigated in science is long; in principle, it includes everything in our uni-
verse. But there are some important limitations to the scope of science. Science doesn’t
replace or limit non-scientific intellectual pursuits, like literature or philosophy—or poli-
tics for that matter. Basing our scientific knowledge about climate change on fluctuating
political agendas would be a mistake. But when it comes to addressing climate change
with policy interventions, debating which steps are politically feasible and desirable is
fair game for politicians.
Scientific knowledge differs from theological doctrine and religious practice too. Unlike
religious practitioners, scientists attempt to explain things without appeal to supernatural
entities or influences, such as deities or miracles, or to literary allegories or culturally
significant myths. Of course, one can be religious in any number of ways, and people can
be religious and believers in scientific knowledge, or even scientists themselves. People
disagree about the role religion should play in our society, but whatever role that might
be, science is not designed for fulfilling the role of religion.
Science’s limitations will become clearer in the next section, where we examine what
distinguishes science from other human projects.
EXERCISES
1.1 How do scientists know that human activities are radically altering Earth’s climate?
Why are these changes a serious concern?
1.2 Do all scientists, by virtue of being scientists, have the expertise to make pronounce-
ments about global warming? Give reasons to support your answer.
1.3 Some people know much more than the average layperson about some topics; these
people are experts on those topics. Think of at least three people you consider to
be experts and their areas of expertise. Why exactly do you consider them to be
experts? Is your answer the same or different for the three experts you listed? Why?
1.4 Laypersons are not always in a position to recognize who is a genuine expert on a
certain topic. Many people don’t know enough about the topic to assess expertise,
and genuine experts sometimes disagree with one another about the topic of their
shared expertise. Think again of the people you listed as experts in Exercise 1.3. How
can laypeople identify whether they should trust each of these experts? Considering
your answers, describe the kind of evidence, in general, that a layperson can use to
identify genuine expertise.
1.5 Based on the text or your other knowledge, list a few reasons why public concern about
anthropogenic climate change lags behind scientific research. Given that lag, how
should climate scientists affect environmental policy in the government? Should they
merely collect evidence and produce knowledge, leaving the construction of policy to
policy-makers? Do they have any obligations to more actively engage with the public?
1.6 Define knowledge, and say how science relates to knowledge. What are the limita-
tions to the kinds of knowledge science can produce?
1.7 What’s distinctive about science, in comparison to activities like literature, music, and
art, as a source of knowledge about the world? Do you think there are any important
differences between scientific and artistic ways of gaining knowledge? Support your
answers with justification.
1.8 Define basic research, and describe why you think scientists may choose to pursue it.
Is basic research important? If so, how? Should it be funded by the government? Why
or why not? How do you think it should be decided what kind of scientific research to
fund?
India. The Persian polymath Muḥammad ibn Mūsā al-Khwārizmī (c. 780–c. 850) further
developed this system and brought it to Arabic mathematics, and his work later intro-
duced this numeral system to Medieval Europe. Al-Khwārizmī also made significant con-
tributions to algebra, geometry, and astronomy. Around the same time, the Persian Abū
Bakr Muhammad ibn Zakariyyā al-Rāzī (854–925) was responsible for many innovations
in medicine, including advocating for experimental methods and developing classifica-
tions of contagious diseases. And the Arab scientist Ibn al-Haytham (c. 965–c. 1040) did
revolutionary work in optics and vision, including the insight that vision occurs by eyes
detecting light deflected by objects.
Other Persian and Arabic polymaths, including especially Ibn Sina (980–1037), known
also by the Latinized name Avicenna, as well as ibn Aḥmad Al-Bīrūnī (973–1048) and Ibn
Rushd (1126–1198), or Averroes, preserved and developed theories about the natural world
from the Greek philosopher Aristotle (384–322 BCE). This was in turn the basis of ideas
about the natural world in 15th-century Europe, with ideas added from Christian, Jewish,
and Islamic cosmogony and theology. Based on Aristotle’s views, the universe was thought
to be geocentric—the Earth at the center—and with two regions: terrestrial for Earth and
celestial for the planets and stars. The celestial region was thought to contain transparent
concentric spheres that rotate around the Earth. The Greco-Egyptian astronomer Ptolemy
(c.100–168) had supplemented this with an account of the apparent motions of the stars
and planetary paths, including detailed models and tables that could be used to calculate
the positions of the stars and planets. Geocentrism in 15th-century Europe blended observa-
tions and calculations with religious ideas and ideas about humanity’s place in the universe.
A longstanding problem with the geocentric view of the cosmos was the appearance
of so-called retrograde motion. The planets sometimes seem, in observations made over a
series of nights, to stop in their orbit, reverse course back across the sky, then stop again,
and reverse course yet again to continue on their original way. An example of this is shown
in Figure 1.6. Following Ptolemy, geocentrists explained retrograde motion by positing
epicycles, mini-orbits of planets that themselves orbit the larger orbits. This successfully
accounted for retrograde motion, but it wasn’t as intuitive and seemingly obvious as the
other elements of geocentrism.
In 1543, in what is considered to be the beginning of the Scientific Revolution,
Copernicus presented a radical alternative conception of the cosmos as heliocentric, or
centered around our sun. This provided an alternative explanation for retrograde motion.
According to heliocentrism, retrograde motion of planets was due to Earth changing
position relative to other planets as they all revolved around the sun. Copernicus’s pro-
posed heliocentric conception of the cosmos was met with skepticism. It violated widely
accepted beliefs and called for a fundamentally new physics of the heavens. Besides, the
mathematics of Copernicus’s system was just as complex as Ptolemy’s epicycle solution
to retrograde motion, and it did not make predictions of planetary motion any more
accurate. So, few astronomers were convinced by Copernicus’s system.
The situation changed with the research of Michael Möstlin (1550–1631), Johannes
Kepler (1571–1630), and Galileo Galilei (1564–1642), each of whom championed and
improved the Copernican heliocentric system. Möstlin and Kepler were German math-
ematicians and astronomers with interest also in astrology. Kepler devised a set of laws
that described the motions of planets around the Sun. Based on calculations of the orbits
of Mars, he inferred that planets do not have the circular, uniform orbits proposed by
Copernicus. Their orbits are ellipses. This simplified the Copernican theory and signifi-
cantly improved the predictive accuracy of heliocentric models.
Born in Italy, Galileo is one of the most important figures of the Scientific Revolution.
He was instrumental in establishing Copernicus’s heliocentric system and, more generally,
in replacing Aristotelian mechanics of the separate terrestrial and celestial realms with a
new single physics. Galileo invented the telescope, which he used to observe the phases of
Venus and to discover that Jupiter had moons orbiting it. This was a significant discovery
for heliocentrism: if our Earth were the center of the universe around which all things
orbited, then those moons should be orbiting Earth instead.
Recalling the main purpose of this discussion, might this early period of science give
us a way to approach defining it? In the Scientific Revolution, the rapid development
of new ideas, methods, and tools resulted in the swift accumulation of knowledge. A
similar process played out in the later development of the fields of chemistry, biology,
and psychology. Perhaps, then, science can be defined simply as those pursuits that have
descended from the Scientific Revolution. Something ‘clicked’ that facilitated the devel-
opment of knowledge about our world, and today’s scientists are still engaged in that
process of accumulating knowledge.
One problem with this suggestion is that many of the pursuits that furthered scientific
knowledge also included religious, theological, and philosophical ideas that we would not
consider scientific nowadays. In the Persian Golden Age and the Scientific Revolution,
philosophy, theology, and science were not divided as they are now, and often, the same
ideas had significance for religious belief and for beliefs about the natural world. Another
problem with defining science straightforwardly by its history is that it’s unclear whether
and how some of today’s scientific disciplines, like economics and neurolinguistics, relate
to the Scientific Revolution.
Perhaps we can instead look to the methods developed as science was established as
the defining features of science. Methods established in the Persian Golden Age and the
Scientific Revolution that may be characteristic of science include looking to sense experi-
ence and performing experiments to decide what’s true, the systematic use of mathematics
to study phenomena, and the institutionalization of investigation in formal organizations.
These will all find their way into our eventual attempt to identify the main ingredients
of science. But scientific methods have also significantly developed and changed since the
Scientific Revolution. For example, statistical and computational methods emerged in the
late 1800s. These methods are staples of present-day science, and they are essential for
understanding complex phenomena like Earth’s climate. The institutional and social struc-
tures governing scientific practice have also undergone massive changes in the last centu-
ries. One profound transformation in the social organization of scientific activity was the
professionalization of science in Europe and North America beginning in the mid-19th
century. So, although science’s methods are key to defining it, we’ll have to look beyond
the Persian Golden Age and the Scientific Revolution to fully characterize those methods.
Here’s one more idea for defining science inspired by this quick look into science’s
history: perhaps we can define science by focusing on what it is that scientists investi-
gate. The Scientific Revolution was a decisive step toward the separation of scientific
from non-scientific questions. Recall that geocentrism had implications not just for the
natural world but also for religious belief and views of humanity’s role in the universe.
Heliocentrism was more explicitly a view just about the universe around us. So maybe the
definition of science relates to its subject matter—the world we see around us—as distinct
from philosophical, religious, and theological investigations of, for example, meaning and
purpose. We’ll explore this idea next.
different times and places can observe the same natural phenomena. Observability across
people, times, and places is essential to scientific study.
Natural explanations invoke observable features of the world to account for natural
phenomena. If there’s an epidemic in Florida or increased employment in Colombia, you
might wonder how that came to be. A natural explanation of the epidemic might specify
a contagion and a mechanism of transmission, or other such factors. A natural explanation
for the increase in employment might specify private investments in industry and legisla-
tive choices made by labor unions and political parties. These are natural explanations
of natural phenomena.
realm of what science can investigate because scientific investigation is limited to natu-
ralistic inquiry. This suggests that science need not interfere with most forms of religious
belief. The exception is when religious belief is used to provide competing explanations
for natural phenomena.
However, pursuit of natural explanations for natural phenomena doesn’t by itself
adequately demarcate science from non-science. Some naturalistic approaches to natural
phenomena aren’t things we consider to be scientific. Take football coaching, for example.
Its subject matter ranges from physical training and development of individual technical
skills to psychological motivation and knowledge of tactics and strategy, and coaching
employs what we know of the world to engage with this subject matter. But football
coaching is not a science. Naturalism might be an ingredient of science, but it isn’t defini-
tive of science all by itself.
a scientific claim only indirectly. Even the physicists who study quarks haven’t directly
observed quarks. Instead, they have made predictions based on the idea that quarks exist,
and those predictions have been supported by empirical evidence collected in carefully
controlled experimental conditions.
wrong. This is required for scientific claims to be subject to empirical evidence. Notice
that true claims can still be falsifiable—you can describe what kind of evidence would
prove them wrong; it’s just that, because they are true, you will never actually find such
evidence. Even for false claims, scientists may never be in the right circumstances to obtain
falsifying evidence. But for any scientific claims—any bold and risky conjectures—it should
at least be possible to say what falsifying evidence would look like, even if we aren’t in
the position to get such evidence or even if the evidence does not exist (because the
claim is true). Falsifiable claims enable science to be based on empirical evidence and to
reject ideas when the evidence warrants doing so.
Second, science requires honesty when evidence seems to go against a claim or theory.
When scientists discover apparently falsifying evidence, they should begin to doubt the
idea under investigation. In general, we humans try really hard to hold on to our existing
beliefs, even when they are challenged. Scientists are no different. But the norms of good
science obligate them to doubt any scientific claims—even ones they really like or thought
were really promising—in the face of evidence that challenges those claims. It is part of
the very idea of science that any claim or theory should be abandoned when the pre-
ponderance of evidence suggests it’s wrong. We might call this openness to falsification.
To summarize, falsificationism implies that scientists are always earnestly trying to falsify
their scientific theories, even and especially the ones they are the most certain about. This
is up for debate. But it does seem like all scientific claims should be falsifiable, at least in
principle, and that scientists should be open to the possibility, at least in principle, that any
conjecture
attempted
refutation
successful?
YES NO
Karl Popper
(1902–1994 )
FIGURE 1.7 (a) Schematic flowchart of simple falsificationism; (b) Karl Popper
claim or theory will need to be given up if sufficient evidence is found that goes against it.
This is depicted in Figure 1.7 as a process of conjecture and attempted refutation.
Let us briefly mention two other candidates for hallmark methods of science. Much
of science makes use of mathematical techniques ranging from statistics to linear algebra
and geometry. This is another distinctive characteristic of science. Quantitative analysis,
or the use of mathematical techniques to measure or investigate phenomena, is found
in most science. Not all science employs numbers, however. So to say that quantitative
analysis is a hallmark of science is not to say that qualitative analysis, or the investiga-
tion of phenomena without using mathematical techniques, is not. For example, social
scientists often rely on in-depth interviews, focus groups, and other probative techniques
that don’t involve any mathematics.
Finally, another method distinctive of science is found in its social and institutional
structure. Science relies on communities of many people working together on related
projects but also with different ideas, techniques, aims, and values. Scientists are in some
ways always collaborating; teams of scientists work together on research projects, and all
research is based on the findings of other scientists’ work. In other ways, scientists are
always competing with one another, for example, to make a discovery first, to get their
research projects funded, and to show that one’s idea is better supported by the evidence
than an opposing idea. This social aspect of science is one of its most salient characteris-
tics. This social and institutional structure also relates to science’s role in society, which
we’ll explore in Chapter 8.
TABLE 1.1 Checklist for evaluating whether an idea or project qualifies as scientific
✓ Puts forward ideas that can be tested with empirical evidence (empirical investigation,
falsifiability)
✓ Would abandon any idea that was thoroughly refuted (openness to falsification)
✓ Employs mathematical tools appropriately when they are useful (mathematical techniques)
Here’s an obvious contrast with science as we have defined it: researchers in literature
do not collect measurements or other similar forms of evidence to test hypotheses about
the literary value of a piece of written work. Disagreements about the literary value
of, say, Dante’s most famous work, La Divina Commedia, cannot be settled by running
experiments. By reading this work, you can learn about 13th- and 14th-century social
life in Italy and about moral and theological views in Europe. But the literary work itself
is a work of fiction, not intended to directly provide natural explanations of features of
the natural world.
Now consider astrology, a canonical example of pseudoscience introduced earlier. The
primary claims made in astrology, such as horoscope predictions, are not designed to be
falsifiable, and many are even designed to be unfalsifiable. They are vague in ways that
allow many different interpretations; so for any interpretation that is wrong, another
can be offered in its place. Further, the systems of horoscopes used by astrologists are
inconsistent with well-understood basic theories of biology, physics, and psychology. This
violates the expectation of the collaborative exchange of ideas among scientists.
Astrology may be a harmless fad, with negative consequences largely confined to mis-
spent leisure time and money. Other pseudoscientific projects are much more dangerous.
Denials of anthropogenic climate change, for example, can be no less pseudoscientific than
astrology, and they have contributed to a lack of political will to address climate change—a
failure that may well have catastrophic consequences. Generally, the prominent climate-
change deniers have no genuine interest in engaging with the science. Their project is
not the earnest and disinterested search for truth, wherever it leads, but instead one of
shielding their cultural or political values by introducing doubt, distraction, and bluster
and lobbing personal attacks (Oreskes & Conway, 2010). Their denial of climate change
is not designed to be falsifiable; no amount or kind of evidence will change their minds.
Indeed, some climate-change deniers have even rejected the idea that science is a trust-
worthy source of knowledge in order to hold fast to their commitment against the idea
of climate change. Climate-change deniers also violate the expectation of collaborative
and competitive exchange among scientists, insofar as they neither produce hypotheses
and evidence for other scientists to evaluate nor acknowledge the vast empirical evidence
that supports the theory of anthropogenic climate change.
Anti-vaccination advocacy is another example of pseudoscience with pernicious effects.
One popular anti-vaccination argument is that vaccines increase the risk of autism. But,
as we will discuss in Chapter 7, all vaccines have been subject to incredibly extensive
testing for safety, and those tests have demonstrated conclusively that there is no causal
connection between vaccination regimes and the incidence of disorders like autism. This
conclusion of safety is scientific; it is based on evidence, is open to falsification, and
would be rejected if sufficient evidence against it were found. But existing research is
so extensive and compelling that the possibility of newfound disconfirming evidence is
virtually nonexistent. Nonetheless, propaganda outlets and anti-vaccination groups peddle
misinformation, trying to induce doubt with hearsay and uncritical stories of children
who were diagnosed with autism after vaccination. (This does regularly happen, for the
simple reason that vaccination regimes and many symptoms of autism tend to emerge in
the same stage of early childhood.)
Yet another example of pseudoscience with pernicious effects is creationism and intel-
ligent design. In the United States, for more than 50 years, creationism has posed under
[S]ince the 1950s, many of the observed changes are unprecedented over decades
to millennia. The atmosphere and ocean have warmed, the amounts of snow and
ice have diminished, and sea level has risen … Human influence on the climate
system is clear, and recent anthropogenic emissions of greenhouse gases are the
highest in history. Recent climate changes have had widespread impacts on human
and natural systems.
(IPCC, 2014)
EXERCISES
1.9 Choose one scientific development from the Persian Golden Age or the Scientific
Revolution. Describe how that development constituted progress in the subject mat-
ter of science and in the methods of science.
1.10 Order the following disciplines from most scientific to least scientific, consulting
the discussion of defining science and the checklist for science in Table 1.1: astrol-
ogy, economics, cinematic theory, cultural anthropology, social work, paleontology,
criminology. (You might need to first investigate what some of these disciplines are.)
For each, briefly explain why you ranked it as you did, making reference to the
hallmark features of science.
1.11 Describe how the history, subject matter, and methods of science are each relevant
to the nature of science.
1.12 Outline the specific elements of science’s history, subject matter, and methods that
relate to hallmark features of science. Rate each of these on a scale of 1 to 5, where 1
is the least important to the nature of science and 5 is the most important. Choose one
feature you rated ‘1’ and one you rated ‘5’, and say why you gave each this rating.
1.13 Define pseudoscience in your own words. Then, choose one of the examples of pseudo-
science from this section and evaluate it using the checklist of science. Describe how it is
similar to science and how it is different. Can you identify any features of the example
you’ve chosen that seem to be intended to appear more like science than they are?
1.14 Based on the information we have provided in this section, evaluate intelligent
design against the checklist for science. Assign it a letter grade, where A+ is fully
scientific and F bears no resemblance to science. Defend the grade you’ve assigned
with reference to the checklist.
1.15 Enter the phrase intelligent design into an internet search engine. Find and consult at
least one site that endorses intelligent design and at least one site that is critical of the
idea that intelligent design is scientific. (a) Evaluate the case presented by each side,
taking into account the checklist for science when it’s relevant. Describe your findings
in writing. (b) Say what, if any, differences you identify between the sources—that is,
between the two websites—and whether and how those differences matter to the author-
ity of these sources on the question of whether intelligent design is a scientific theory.
1.16 Why must science be limited to the study of natural phenomena? Why must it give
only natural explanations? Can you think of any scientific projects that don’t seem
to satisfy these requirements? If so, describe one or more such projects and say why
they might not be naturalistic. If not, describe a non-scientific idea that seems like it
is not naturalistic and say why.
1.17 Mythology and science are generally understood to be very different from one an-
other. And yet early science had its origins in, and then grew out of, mythology, and
both myths and scientific theories provide explanations of the natural and social
phenomena observed in the world around us.
a. Look up three creation myths from different cultures and historical periods—that is,
look up myths of how the world began and how people first came to inhabit it.
b. Identify similarities and differences across the three myths.
c. Describe similarities between the creation myths and scientific theories of
human origin. In particular, identify potential similarities between the kind
• Explain why there is not a single thing we can call ‘the Scientific Method’
• Name two general flaws in human reasoning that science is designed to counteract,
and give examples of their influence
• Describe five features of scientists and of the scientific community that are important
to the trustworthiness of science
• Describe each of the three steps found in most recipes for science and why each is
a challenge
In 1891, van Osten travelled around Germany to exhibit his amazing horse. There was
such fanfare that the famous psychologist Carl Stumpf appointed a special commission
to provide critical scrutiny. In 1904, the commission concluded that Hans’s abilities were
legitimate. The horse was able to answer a great variety of questions on topics from simple
arithmetic to square roots, fractions, and decimals; units of time; musical scales; and the
value of coins. Hans could even respond accurately even when van Osten wasn’t present.
The commission was wrong. Stumpf’s pupil Oskar Pfungst demonstrated that Clever
Hans was not actually performing the sophisticated mental calculations attributed to him
(Pfungst, 1911). Pfungst used blinders to vary whether Hans could see the questioner, and
he varied who played the role. Hans produced the correct answer even when van Osten
himself did not ask the questions, but Hans’s performance fell apart when either the
questioner did not know the answer or could not be seen by Clever Hans. In particular,
when Hans could not see the spectators and questioners, his ability to produce correct
answers fell dramatically from 89% to 6%. Further observations confirmed that Hans was
being unwittingly cued by his human audience. Questioners’ body language and facial
expressions became tauter as his tapping approached the correct answer, and then more
relaxed upon the final tap; this change prompted Hans to stop tapping.
Like van Osten and all the other people who posed questions to Clever Hans, our
expectations can affect how things play out, even when we don’t intend for this to hap-
pen. More generally, because of confirmation bias, our expectations can influence what
sources of evidence we seek, how we view the evidence we encounter, and how well
we remember evidence later. These problems increase with emotionally and politically
sensitive topics. All of this makes it hard for people—including scientists—to reason their
way to the right answers.
Norms of Investigators
An element of science’s great success in generating knowledge about our world is its fea-
tures that protect against or counteract the basic flaws in human reasoning. We’ll discuss
those features in two categories: first, norms of science that apply to individual scientists
and second, norms of science that apply to the scientific community. You can think of
these norms as rules or guidelines against which scientists’ actions can be deemed good
or bad, desirable or undesirable.
Because of science’s aim of producing knowledge, scientists are obligated and trained
to have a certain kind of integrity. Scientific integrity requires scientists to be sincere and
honest, and to avoid improper influence by others. Violations of these norms can under-
mine science’s ability to produce trustworthy knowledge.
Plagiarism is an obvious example of dishonesty. Plagiarism consists of presenting
somebody else’s ideas, scientific results, or words as one’s own work, intentionally or
unintentionally, by not giving proper credit. When plagiarism is discovered in science, it
is severely penalized, perhaps including a ban from publishing in peer-reviewed journals
or suspension or expulsion from one’s institution.
Faking data to support a desired conclusion is another egregious violation of scientific
integrity. In 2011, Diederik Stapel, a Dutch social psychologist, published a widely read
study in Science—one of the most prestigious scientific journals. The study presented evi-
dence supporting the dramatic conclusion that trash-filled environments lead people to
be more racist. But rather than earnestly collecting actual data, Stapel just made up the
data. When this was discovered, Stapel’s reputation shifted from that of a respected aca-
demic to a prominent example of fraud. All of his other publications were scrutinized, and
approximately 60 other papers were retracted for data fabrication. Other scientists have
also been forced out of science after their ethics violations were discovered, such as the
stem-cell researcher Hwang Woo-suk from Seoul National University and the Harvard evo-
lutionary biologist Marc Hauser. Some science journalists have helped increase awareness
of retraction due to issues like data fabrication by running blogs such as Retraction Watch.
But scientific integrity requires more than just not misattributing or misrepresenting
ideas and data. Scientists also ought to avoid conflicts of interest —that is, financial or
personal gains that may inappropriately influence scientific research, results, or publica-
tion. Scientists are obligated to disclose any potential conflicts of interests. The existence
of a potential conflict of interest does not necessarily lead to researcher bias or misinter-
pretation of data, but transparency about any potential conflicts of interest allows others
to better evaluate the possibility of improper influence. Conflicts of interest, especially
when research is funded by organizations with a financial stake in the findings, can result
in researchers intentionally or unintentionally altering the research they conduct, their
findings, or what they report in publications.
Clair Patterson, a geochemist at Cal Tech in California, became famous for definitively
calculating the age of the Earth (≈4.54 billion years) in the 1950s. He also led the cam-
paign to remove lead from gasoline in the 1960s and 1970s. Leaded gasoline contained
lead tetraethyl, which is extremely toxic—a single drop of pure lead tetraethyl can be
fatal—and had to be handled with utmost caution by its manufacturers. Because the
campaign against leaded gasoline threatened their profits, the fossil fuel industry—led
by the Ethyl Corporation—fought bitterly against Patterson. Among their tactics was to
procure a shill, Robert A. Kehoe, who was paid handsomely by the fossil fuel industry to
attest to the safety of leaded gasoline. Led by Patterson, honest science eventually carried
the day. But serious damage had already been done: generations of Americans suffered
from elevated lead levels in their blood through the end of the 20th century.
Kehoe later used his scientific credibility to endorse pollutants like Freon, undermining
scientific evidence showing their damage to the earth’s ozone layer. He was later com-
missioned by DuPont, General Motors, and other companies to produce studies assert-
ing that dangerous carcinogens were safe. Ultimately, Kehoe’s efforts have been a model
for executives in a range of industries (tobacco, asbestos, pesticides, fracking, and so on)
for how to obstruct scientific efforts with misinformation. The ‘Kehoe approach’ is still
being deployed by the fossil fuel industry to evade evidence and undermine the scientific
research about anthropogenic climate change.
One way to promote scientific integrity is holding scientists accountable for their work.
Scientists should be prepared to engage with others about the ideas, methods, and data
they use to support their scientific findings. In climate science, for example, scientists
should be able to answer questions about what kinds of uncertainties their models involve
and what kinds of evidence they have for the reliability of their findings. This is related
to a second norm: scientists should be open to criticisms of their work and to new ideas.
Remember that science is always in revision. We have said that scientific ideas should
be in principle falsifiable and that scientists should be open to the possibility that any
idea will turn out to be false. Similarly, scientists should at least sometimes be willing to
entertain ideas that might initially seem unlikely, and they should be open to criticism
of their ideas and methods.
A third norm governing scientists as individuals is ingenuity. This is a natural partner
to the norm of openness to new ideas. Science benefits from the development of lots of
interesting ideas, violating our preconceptions. It’s often impossible to tell at the outset
which ideas will prove to be promising. Many, even most, new ideas in science will turn out
to be false. But time and time again, science has gone in unexpected directions, and that
can’t happen without new, creative ideas about the world and about how to pursue science.
The Blackawton Bees project is one striking example of how ingenuity can guide scien-
tific research. In this project, 28 children between 8 and 10 years of age from Blackawton
in Devon, UK, conducted a collaborative scientific study on bumblebees’ visuospatial
abilities, supervised by an ophthalmologist and an educator. The children wondered how
bumblebees decide to which flower to go to for food and whether bumblebees could learn
to recognize different flower shapes and colors. The children ingeniously brainstormed
about possible answers and creatively designed experiments to test their ideas. Their co-
authored findings were published as an original article in the scientific journal Biological
Letters. The article summarizes its discoveries as follows:
We discovered that bumble-bees can use a combination of color and spatial rela-
tionships in deciding which color of flower to forage from. We also discovered that
science is cool and fun because you get to do stuff that no one has ever done before.
(Blackawton et al., 2011, p. 168)
As the children involved in the bumblebee project can attest, scientific reasoning can and
should be ingenious, challenging, and creative.
Social Norms
No matter how many requirements are placed on scientists as individuals and no mat-
ter how good scientists get at satisfying those norms, science cannot be fully protected
against the flaws inherent to human reasoning by that alone. Individuals—even very bright
scientists—often aren’t aware of the flaws in their own thinking, and often aren’t in a
good position fix them by themselves. Thus, another important form of protection against
flaws in reasoning involves requirements placed on the scientific community as a whole.
We’ve already mentioned the importance of the social organization of science. This
organization is especially salient when research involves collaborations among lots of
researchers, sometimes with different disciplinary backgrounds, working at different times
and in different physical locations, often using different kinds of complex instruments.
Climate science regularly involves radically collaborative endeavors of this sort. After all,
climates are extremely complex, interconnected systems; no single person and no single
field of expertise can alone produce knowledge of how the climate works or how it will
change. Instead, many different people perform specific tasks based on their expertise
and their available instruments. When these different tasks are added together, they lead
to knowledge that no one could have produced alone.
But even scientific research that is not so visibly collaborative relies on the communities
of science. Research is informed by previous results found by other scientists, papers are
reviewed for publication by other scientists, and other scientists decide whether and how
to react to any given published finding. In massively collaborative research and solitary
research alike, it is really scientific communities, instead of scientists as individuals, that
produce knowledge.
The good functioning of scientific communities depends on various social norms and
incentive structures. One primary social norm in science is trust. Scientists’ trust in one
another is the glue of scientific communities. For example, collaborative projects on
climate change involve scientists with a range of different expertise, including climatolo-
gists, ecologists, physicists, statisticians, and economists. None of these scientists alone
possesses comprehensive expertise to understand the full range of evidence that bears
on our understanding of anthropogenic climate change. So these scientists must rely on
each other and must trust one another’s scientific work. Individual members of the public
often don’t have the expertise to evaluate most scientific findings, and so they too must
trust the scientists who are experts on those topics.
But it’s also true that the work of scientists must be critically evaluated, by other
scientists and the public alike. For the public, the most straightforward way to evaluate
scientific work is to assess the quality of the arguments presented. This is not always
possible, however, as scientific information can be technical and difficult to understand
by non-experts. For this reason, it is also important to pay attention to the reputation of
the alleged expert. Scientists’ reputations are based on their track record of accomplish-
ments in their field, as judged by other scientists with similar research expertise. Scientists
critically evaluate one another’s work by deciding whether particular results warrant
publication, evaluating the strengths and weaknesses of scientific studies, and choosing
whether and how to respond to published findings.
One form that scientists’ healthy skepticism toward the work of others and their
openness to criticism takes on is attempts to replicate others’ findings. In replication,
TABLE 1.2 Individual and social norms that protect against bias and flaws in reasoning
Integrity Trust
Openness Skeptical evaluation
Ingenuity Diversity
an experiment or study is carried out again in order to see whether the same results
are achieved. If successful, the replicated results further confirm the idea under inves-
tigation. If the results are not replicated, this raises doubts about the results of the
original work.
This balance of trust and skepticism among scientific communities also helps protect
against individual biases. Imagine that a scientist with legitimate expertise is paid by an
oil company as a consultant. The same scientist pursues research aiming to show that the
evidence for anthropogenic climate change is inconclusive. Such a scientist would have an
obvious conflict of interest; this should lead other scientists and members of the public to
be very cautious about trusting those findings. It doesn’t guarantee the scientist is wrong,
but it does raise questions about whether her judgment is clouded.
Communities of experts also protect against the undue influence of individual eco-
nomic, religious, and political values, and thus confirmation bias and expectancy bias, in
another way. Scientists are each susceptible to biases that make them see the world more
like what they expect it to be and how they hope it will be. But their expectations and
hopes are different from one another’s. So, in a scientific community, these biases tend to
balance each other out. The conclusions that different scientists all agree to are thus less
likely to have resulted from bias. Hence, if a large and diverse group of scientists agree
about some result, we should be more confident that the result is accurate.
This brings us to one last point about the social norms of science, on which we will
elaborate in Chapter 8. To adequately protect against individual bias and flaws in reason-
ing, scientific communities need to be diverse in any way potentially relevant to scien-
tists’ values and, thus, to flaws in their reasoning. The best science is done by female and
male scientists with diverse cultural knowledge, various ethnic and racial backgrounds,
and from a variety of nations. This kind of diversity benefits science by guarding against
any individual biases. The individual and social norms of science we have identified are
summarized in Table 1.2.
Methods in Science
The third and final topic about how science protects against flaws in reasoning is also the
most important. The definition of science developed here stressed the methods used in
science. And it’s these scientific methods that bear most of the responsibility for helping
science overcome the flaws and limits of individual scientists’ reasoning.
We have suggested that, despite what you might have heard in high school science
classes, there’s no set scientific method: science proceeds in myriad ways. But this doesn’t
mean there are no methods to science. In fact, as the title of this book suggests, the oppo-
site is true: science may have no set method, but it does proceed according to familiar
recipes. It is the purpose of this book to outline some of the main recipes for science.
At the heart of many of those recipes, there is a pattern that includes something like
these three steps: the formulation of hypotheses, the development of expectations based
on hypotheses, and testing expectations against observations. One common way this plays
out is for scientists to formulate hypotheses about the world—what was described earlier
as bold and risky conjectures—and then use those hypotheses to generate specific expec-
tations regarding their experiences. If their observations conform to those expectations,
their hypotheses are confirmed. If not, they return to the drawing board.
These steps can occur in different orders, and they happen again and again in vari-
ous combinations. Different methods may also be involved in each of the three steps.
For example, sometimes, scientists have a specific hypothesis to investigate; other times,
research is more exploratory and open ended. Sometimes, hypotheses have obvious empir-
ical implications; other times, scientists need to use statistics to develop their expectations.
Sometimes, scientists design experiments to test their expectations; other times, they
develop models. Other scientific work simply isn’t described well by this trio of hypoth-
esis, expectations, and observations, such as the theoretical work behind string theory.
But these three steps are integral to the production of scientific knowledge. They are
the basic ingredients that, with tremendous variation, occur in many of the recipes for
science. With the help of these steps, scientific theories, laws, models, and other advances
are developed and refined. And these advances, when successful, are the vehicles of our
scientific knowledge.
We’ll conclude this chapter by talking about each of these three steps in greater detail.
Throughout the rest of the book, we’ll regularly refer back to these three ingredients and
the different ways they factor into the recipes for science.
Hypotheses
As we have seen, empirical investigation is how we learn about our world. Scientists
make observations to try to figure out what’s out there, why things are the way they
are, how things change, and so forth. But simple observations can’t accomplish that task
by themselves. A second ingredient is needed: theoretical claims. Theoretical claims are
claims made about entities, properties, or occurrences that are not directly observable.
These things might be excessively large or small, like the whole universe or quarks; they
might be too distant in time or space to observe, like the first forms of life on Earth
or black holes; or they might not be directly observable at all, like what physicists call
‘dark matter’.
As an example of a theoretical claim, consider a claim about all of something of some
kind, like the claim that all salt dissolves in water. You might have seen plenty of salt
dissolving in water, but you will never be able to witness all of the salt that exists dis-
solving in water. So direct observation can’t guarantee that all salt dissolves in water. We
have plenty of evidence that this is so, but the claim is theoretical because it goes beyond
what we can directly observe.
Science is centrally concerned with a special kind of claim called a hypothesis. A
hypothesis is a conjectural statement based on limited data—a guess about what the
world is like, which is not (yet) backed by sufficient, or perhaps any, evidence. Scientists
do not yet know whether any given hypothesis is true or false; when there is sufficient
evidence in favor of some hypothesis, it graduates from that category. Formulating a
hypothesis often requires some imagination; if you could observe something we can’t—if
you could witness the beginning of life on Earth, fly into a black hole, or see all the salt
in the world—what would you find?
Scientists might formulate a hypothesis before any observations have been made, just
with the use of their imagination. But often initial observations, other hypotheses, or
background knowledge about related phenomena help inspire new hypotheses. Before
scientists knew about the properties of potassium chloride, they’d seen table salt—sodium
chloride—dissolves in water. This informed their expectations for potassium chloride, a
similar compound. Scientists’ hypotheses about the first life forms were shaped by what
they know about organisms, existing and extinct, and how the Earth has changed over
geologic time.
Scientists can have different levels of confidence in different hypotheses. If a hypothesis
is informed by lots of experience with similar objects or significant background knowl-
edge of related phenomena, scientists might be much more confident in it than if it is a
random guess. But, by their very nature, hypotheses are guesses. This is why hypotheses
must be tested.
Expectations
Learning whether a hypothesis is true is often more circuitous than just making direct
observations. A second ingredient is usually needed to test hypotheses; this is developing
expectations based on hypotheses. Expectations are conjectural claims about observable
phenomena based on some hypothesis. These claims are conjectures since they go beyond
what scientists have observed so far, but, unlike hypotheses, their truth or falsity can be
discerned directly from the right observations. Indeed, expectations are claims about what
scientists expect to observe if a given hypothesis is true.
Expectations do not regard just any potential observations, but observations that sci-
entists anticipate being able to make. We could say what we would expect to experience
if we were in the middle of a black hole (given some hypothesis about black holes), but
since we don’t expect to ever be making observations from inside a black hole, those
expectations are useless. Instead, expectations based on a hypothesis regarding black holes
should be about what scientists would expect to see through a telescope, patterns of x-rays
and gamma rays detected from Earth or from a satellite, and so on.
Depending on the nature of a hypothesis, developing expectations in light of the
hypothesis can be relatively straightforward or incredibly complicated. On one extreme,
the hypothesis that all salt dissolves in water leads directly to an expectation: any given
sample of salt will dissolve when placed in water. But even then, there are some other
conditions to stipulate. Should salt dissolve when placed on a chunk of frozen water (that
is, ice)? What if some salt is already dissolved in the water? Should we still expect the
sample of salt to dissolve?
Expectations regarding black holes are much more complicated to develop. Black
holes are objects so massive, so dense, that even light gets pulled inside. No one will ever
take measurements on the edge of a black hole. Even if someone could get there and sur-
vive to take measurements, the measurements couldn’t be recorded in a way that escaped
the black hole. Nor does anyone expect to see a black hole through a telescope, since that
would require light to leave the black hole and travel to our telescope. Hypotheses about
black holes must thus be investigated entirely by formulating expectations regarding their
effects on other objects, objects that can emit electromagnetic radiation (and thus be seen
through telescopes) and give off measurable x-rays or gamma rays.
No matter whether deriving expectations is relatively straightforward or incredibly
complicated, this is an important and nontrivial step of scientific work. Expectations set
scientists up to make observations that can provide evidence for or against the truth of
a hypothesis. Deriving expectations thus serves as a bridge between conjectural claims
(hypotheses) and immediate observational claims (data).
Observations
Nearly all science fundamentally depends on observations. This is because, as we have
discussed, scientific inquiry is ultimately an empirical inquiry. It’s not enough to think up
interesting ideas about how the world might work; those interesting ideas must also be
evaluated by how well they fit with our observations of the world. Observations include
any information gained from your senses—not only what you see, but also what you hear,
smell, and touch and sense in any other way you can experience the world.
Your observations belong only to you. If we are on a hike together, we might both
hear a rattling sound coming from behind a boulder. But each of us only has access to
our own experience of the sound. You can’t compare my observation to yours. Data are
different. Data are public records produced by observation or by some measuring device.
(The singular form in Latin is datum.) Observations are important because they are your
only way to directly access the world. Data based on observations are important because
they allow us to record and compare our observations.
Our powers of observation are ultimately capacities to detect physical information
and then—literally—to incorporate it into our bodies. For example, when one hears a
serpentine rattle from behind a boulder, an acoustic waveform with a number of distinct
physical features enters into the ear canal and causes the tympanic membrane to vibrate.
The vibrations are ‘forwarded’ to the cochlea via the bones of the middle ear, where
shearing force of the tectorial membrane mechanically moves the hair cells of the basilar
membrane. Hair cell movements are then transduced into electrical signals, which leave
the organ of Corti and travel via the main auditory nerve to the brain. The embodied
brain then has to interpret that signal (as a serpentine rattle rather than a baby rattle,
for instance). This is what it takes for humans to hear a serpentine rattle from behind a
boulder. A similar transformation occurs when you see that a test strip of litmus paper
has turned blue, feel the heat produced by a chemical reaction, and so forth.
Observation isn’t passive. We can move our heads to see different things and relo-
cate our bodies to different places where we can hear different things. We can also use
observations from multiple senses together. If you’re wondering about that rattling sound
from behind the boulder, you can walk around to the other side to see whether there’s a
rattlesnake there. Besides changing our position and using multiple senses to enhance our
observations, we can also change the world around us to create opportunities for different
observations. Crushing a leaf lets you better smell whether it’s sage or mint. When we do
such things, we have begun to do simple experiments on the world around us. We will
talk about experiments in Chapter 2.
Humans have also found many ways to use tools to enhance our powers of observation.
Light can be refracted with mirrors, prisms, and lenses to extend the reach of vision. We
now can see not just through our eyes alone but also through our eyes aided by telescopes,
microscopes, and other devices. To help us hear beyond our ears’ capabilities, we have
developed microphones, stethoscopes, and so on. These technological enhancements range
from observational correctives like eyeglasses and simple sensory aids like microscopes to
much more complex technology with highly specific purposes, like an fMRI machine to
show brain activity and the Large Hadron Collider, which uses superconducting magnets
to cause streams of high-energy particles to collide in a detectable way. Such enhance-
ments have allowed humans to generate what we might call super-observational access
to what would otherwise be undetectable to us, given our sensory modalities.
Making observations, collecting data, is at the heart of science’s ability to generate
knowledge of our world. But observations aren’t always independent from the ideas about
the world we already have. Changes in what we believe to be true can have a significant
impact on what we observe. For instance, when we observe the Sun at the horizon, what
we seem to see is the Sun at one point on its path across the sky. Geocentricism organizes
this and similar observations into an easily understood pattern, and those observations con-
firm the geocentric conception. But from the perspective of the heliocentric conception
of the solar system, with your head slightly turned to the side, the Sun at the horizon and
the other planetary bodies that appear comprise a different observation. See Figure 1.9.
Heliocentrism is a different perspective, and it may also create a different perceptual
experience, or observation—the Sun setting not because it moves below the horizon but
because your position on Earth rotates away from it. New ideas can sometimes have a
strong effect on what we think we see, on our very observations. Observations are crucial
to science, but they aren’t always the starting point, and they aren’t always decisive.
EXERCISES
1.19 Why do the authors suggest there’s no unitary scientific method? Evaluate that idea,
raising considerations in favor of it as well as considerations opposed to it.
1.20 Describe three types of influence confirmation bias has, and define the observer-
expectancy effect. Think of a novel example for each of the four (three types of influ-
ence of confirmation bias and observer-expectancy effect). Make sure it’s clear how
each example illustrates each idea.
1.21 Describe three kinds of scientific fraud or scientific misconduct, giving an example
of each. Explain how each example undermined science’s ability to produce trust-
worthy knowledge.
1.22 How should trust and skepticism be balanced in scientific communities, and why is this
important to science? How should trust and skepticism of the public toward scientific
findings be balanced, and why is this important for the public’s relationship to science?
1.23 Choose one of the following, and invent a pseudoscientific theory about it. Feel free
to be creative!
a. The origin of the universe
b. The healing power of music
c. People’s handwriting
d. The change of organisms over time
Then, describe how the norms for scientists, the scientific community, and the meth-
ods of science could help guard against your made-up theory. Try to make your
answer comprehensive, involving all the main topics from this section.
1.24 Search the internet (news websites, magazines, and so forth) for a story or advertise-
ment about a scientific finding or a medical or health treatment that purports to be based
on science, and answer the following questions about it. Make sure you include the story
or advertisement when you submit your answers, as well as a link to it on the internet.
a. What is the source? Is the person or entity making the claims someone with
genuine expertise in what he or she is claiming?
b. Does it seem like there’s any conflict of interest? Why or why not?
c. Does the claim involve vague or ambiguous language?
d. Do the claims fit with other well-confirmed scientific theories?
e. What is the evidence cited in support of the claim?
f. Does this describe good science? Why or why not?
1.25 What is the difference between observations and data? What is important about obser-
vations in particular and why? What is important about data in particular and why?
1.26 Hypotheses, expectations, and observations are all important ingredients for most
science. Describe the importance of each, a typical way that the three ingredients
work together, and what they accomplish together.
1.27 Hypotheses, expectations, and observations are all important ingredients for most
science. Describe a difficulty with each, or circumstances in which it can be difficult.
1.28 Imagine you are a doctor in a large medical practice. The other doctors are consider-
ing introducing a homeopathic service for their patients. They ask you to prepare a
report summarizing the pros and cons of doing so. One of the other doctors, Dr. A,
is entirely dismissive of homeopathy on the grounds of the weakness of its scientific
basis; another doctor, Dr. B, has read a report of a study that she says shows that
homeopathy can outperform placebo and is inclined to be sympathetic. Yet another
doctor, Dr. C, has said that he doesn’t care about the evidence, so long as homeopa-
thy works and is not toxic. Write a 500- to 800-word report describing homeopathy
(you’ll probably have to do a bit of research), addressing each of the other doctors’
points of view, and recommending whether to introduce homeopathic service. You
should employ any of the concepts from this chapter that you find useful.
FURTHER READING
For more on the science of climate change, see the Intergovernmental Panel on Climate
Change. (2014). Fifth Assessment Report (AR5). Geneva: IPCC. Retrieved from www.
ipcc.ch/report/ar5/
For the latest data and information for stabilizing Earth’s atmosphere, climate, and living
environments, see CO2.earth: Retrieved from www.co2.earth/
For more on political influence used to cast doubt on climate change research and other
scientific findings, see Oreskes, N., & Conway, E. (2010). Merchants of doubt. New
York: Bloomsbury.
For more on the demarcation between science and pseudoscience, see Pigliucci, M., &
Boudry, M. (eds.) (2013). Philosophy of pseudoscience: Reconsidering the demarcation
problem. Chicago: University of Chicago Press.
For more on the Scientific Revolution, see Kuhn, T. S. (1957). The Copernican revolution:
Planetary astronomy in the development of Western thought. Cambridge: Harvard Uni-
versity Press. See also Shapin, S. (1996). The scientific revolution. Chicago: University
of Chicago Press.
For more on science in the Persian Golden Age and other periods around the world, see
the History of Science Society: Introduction to the history of science in non-Western tradi-
tions. Retrieved from https://hssonline.org/resources/teaching/teaching_nonwestern/
For a concise treatment of the illusion of explanatory depth, see Keil, F. C. (2003). Folk-
science: Coarse interpretations of a complex reality. Trends in Cognitive Sciences, 7(8),
368–373.
For the psychology of confirmation bias and bias more generally, see Nickerson, R. S.
(1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of Gen-
eral Psychology, 2(2), 175–220. See also Hahn, U. & Harris, A. J. (2014). What does
it mean to be biased: Motivated reasoning and rationality. The Psychology of Learning
and Motivation, 61, 41–102.
For more on how social norms and social structures influence scientific inquiry, see Mer-
ton, R. K. (1942). Science and technology in a democratic order. Journal of Legal and
Political Sociology, 1, 115–126. See also Boyer-Kassem, T., Mayo-Wilson, C., & Weis-
berg, M. (eds.) (2018). Scientific collaboration and collective knowledge. Oxford: Oxford
University Press.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 47
the heights of your parents or how the shape of a pea plant seed depends on the shapes
of the seeds of the parents of that plant. How could you investigate this? The scientist and
friar Gregor Mendel (1822–1884)—born in the Austrian Empire in what is now the Czech
Republic—investigated such questions by breeding pea plants. He fertilized some pea plants
with pollen from their own flowers and others with pollen from the flowers of plants with
different physical characteristics. In this way, Mendel controlled the physical characteris-
tics of the parent plants, such as their seed shape (smooth or wrinkled) and flower color
(purple or white). He could then observe what characteristics resulted for their offspring.
For example, if a pea plant with purple flowers (whose parents all had purple flowers)
is crossed with a pea plant with white flowers, the offspring will all have purple flowers.
Mendel’s selective fertilization of pea plants illustrates a key feature of experiments.
In an experiment, a researcher introduces specific changes to a system and observes the
effects of these changes. The patterns in characteristics resulting from his selective breed-
ing of pea plants led Mendel to posit units of heredity (now known to be genes) that
determine variation in inherited characteristics according to set patterns across biological
organisms from pea plants to humans. In part, Mendel conjectured that some heredity
units are dominant and others recessive; this accounts for why purple-flowered plants and
white-flowered plants have purple-flowered offspring (Mendel, 1865/1996).
Figure 2.1 illustrates two crosses between pea plants, showing flower color (purple or
white) and dominant or recessive heredity units, or genes (A and a, respectively; each plant
has two). Flower color was observed from experiments; from these, Mendel postulated the
genes shown here. The grid on the left shows that a purple-flowered pea plant with two
dominant genes and a white-flowered pea plant with two recessive genes have offspring
with entirely purple flowers. But, as the grid on the right shows, two purple-flowered
pea plants bred in this way have one dominant and one recessive gene. Despite having
entirely purple flowers, these plants have 25% offspring with white flowers.
In experiments, as Mendel’s work illustrates, scientists introduce specific changes to
a system in order to make observations about how the system responds. In Chapter 1,
we learned that data are public records produced by observation. Experiments are used
to produce data of one kind or another. Experimental data can include various kinds of
a a A a
Copyright © 2018. Taylor & Francis Group. All rights reserved.
A A
Aa Aa AA Aa
A a
Aa Aa Aa aa
F1 generation F2 generation
FIGURE 2.1 Illustrations of two crosses between pea plants, representing dominant and reces-
sive genes for flower color
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
48 Experiments and Studies
measurements, artifacts, signs, the location of some object, or even an object’s absence.
Mendel’s data consisted of systematic records of the fertilization history and physical
properties of each pea plant. For physicians, the results of blood tests and testimony
about one’s medical history can both count as data. Fossils, tracks, and recordings of the
geochemical features of rocks all can count as data for a paleontologist; and for anthro-
pologists, data may include monuments, pottery, and written documents.
Another concept from Chapter 1 that we need to structure this discussion is scientists’
use of empirical evidence to justify their scientific beliefs. When a hypothesis is used to
develop clear expectations for the outcome of some experiment, and data are gathered
from the experiment that match or conflict with those expectations, then the experi-
ment has produced empirical evidence for or against the hypothesis. The data collected
by Mendel, for instance, turned out to be empirical evidence supporting the belief that
inherited characteristics are caused by discrete units of heredity that come in pairs, one
from each parent.
Often, it’s not obvious what a hypothesis should lead one to expect to observe.
Explicit expectations must be developed from a hypothesis before that hypothesis
can be tested with empirical evidence. Before Mendel’s experiments, most people
believed that physical characteristics resulted from a blending of each parent’s char-
acteristics. This hypothesis would lead us to expect that offspring tend to have
traits that are intermediate between the traits of their parents. So, for example, the
offspring of purple-flowered pea plants and white-flowered pea plants should have
light-purple flowers. Mendel’s observations did not support this expectation; both
crosses in Figure 2.1 illustrate this. But is flower color just an exception to a general
pattern of blended inheritance? Notice that this is a question about what the hypoth-
esis of blended inheritance should lead us to expect. Many different experiments, on
different traits, helped confirm that Mendel’s hypothesis of hereditary units holds
up more generally.
Scientific experiments are designed to be a particularly powerful way to test expecta-
tions against observations. Challenges stem from humans’ tendency toward biased rea-
soning, including a tendency to observe what we want to be true, and the difficulty of
discerning what hypotheses should lead us to expect. Experiments offer two different
approaches to overcoming such difficulties. In some experiments, typically performed in
a laboratory, data are produced under tightly controlled conditions that are designed to
make expectations and observations both as clear as possible. In other experiments, often
performed outside a laboratory or ‘in the field’, scientists compare specific features of
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 49
change, or occur in different values. For example, the number of books read during the
past year, height, the flower color of a pea plant, and the temperature in your hometown
are all variables. The value of a variable is just its state or quantity in some instance. For
example, the flower color of a pea plant may have values like white, purple, or pink; and
your hometown temperature might have the value 62° Fahrenheit one summer evening.
In experiments, there are three categories of variables: independent, dependent, and
extraneous. An independent variable is a variable that stands alone, that is, whose values
vary independently from the values of other variables in an experiment. When scientists
introduce specific changes to a system in an experiment, they do so by changing the value
of one or more independent variables. This is often called an intervention. A dependent
variable is a variable whose change depends on another variable. When scientists change
the value of an independent variable in some experiment, they do so in order to investigate
how that change affects one or more dependent variables. For example, one might vary
the amount of visible light (independent variable) in a factory or workspace and then
look for changes in workers’ productivity (dependent variable).
Experimental methods are designed to enable scientists to isolate the relationship
between independent and dependent variables. This requires controlling background con-
ditions, or extraneous variables, as much as circumstances allow. Extraneous variables
are other variables besides the independent variable that can influence the value of the
dependent variable. If you’re exploring the relationship between the amount of visible
light in a factory (independent variable) and workers’ productivity (dependent variable),
then extraneous variables include noise levels in the factory, the heights of the workers,
the amount of coffee workers drink daily, the country in which the factory is located,
the weather, and so on.
If extraneous variables are not taken into account, they, and not the independent vari-
able, may be responsible for any changes in the dependent variable. Alternatively, extrane-
ous variables may counteract the influence of the independent variable on the dependent
variable. In these ways, extraneous variables can ‘confound’ the relationship between
the independent and dependent variables. If they do so, they are known as confounding
variables. These are extraneous variables, which vary in ways that influence the value of
the dependent variable in unanticipated ways. Confounding variables can interfere with
the accuracy of the conclusions drawn from an experiment.
Imagine now that you want to investigate the relationship between the amount of
visible light in a factory and workers’ productivity. In particular, your hypothesis is that
better lighting increases workers’ productivity. To test this hypothesis, you could run an
Copyright © 2018. Taylor & Francis Group. All rights reserved.
experiment by varying the amount of light (independent variable) and subsequently look-
ing for changes to workers’ productivity (dependent variable). What are some ways you
can think of to change the value of the independent variable? One option is to change
the number of light fixtures in some workspace; another option is to wait for the time of
year to change (there’s more sunlight for longer hours in summer than winter). A third
choice is to compare two factories, one with better lighting than the other.
One thing to consider when weighing these options is the possibility of confounding
variables. Of these options, which introduces the least number of extraneous variables?
Think about all the changes between summer and winter beyond the amount of light;
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
50 Experiments and Studies
perhaps wearing scratchy wool clothing in the winter or the shorter winter days decrease
work productivity, or perhaps summer heat decreases productivity. These are extraneous
variables that could easily become confounding variables. Likewise, different factories
can have many other differences between them beyond just the quality of lighting.
Perhaps one pays a better hourly wage than the other, offers more vacation, or has free
coffee in the break room. The best option seems to be to choose one workspace and
then vary the number of light fixtures, while keeping all other conditions as uniform
as possible.
You will want to measure and record the values of the independent variable (amount
of light), so as to compare those values with the values observed for the dependent vari-
able (workers’ productivity). How could you measure worker productivity? Perhaps by the
number of widgets produced in one hour? But what if the number of people who come
to work on a given day varies? It’s probably better to measure the number of widgets
produced in one hour divided by, or averaged over, the number of workers. That takes
into account, or controls for, the extraneous variable of number of workers.
The general point here is that the setup of the experiment, including its location
and timing, how the independent variable is intervened upon, the type of measure-
ments, and so forth, are all shaped by the need to minimize the possibility of con-
founding variables. This is the key to effective experimental design: an independent
variable is varied in a controlled way, and the value of a dependent variable is mea-
sured, while keeping all extraneous variables fixed or taking them into account in
some other way. (This is why we divided widgets produced by number of workers
present to take into account any variation in the number of workers who showed
up to work that day.)
In experiments involving human participants, one confounding variable can be the
Hawthorne effect or observer bias, where experimental participants change their behav-
ior, perhaps unconsciously, in response to being observed. The name of this effect
originates from a series of experiments from the late 1920s and early 1930s that were
performed in a Chicago suburb at Western Electric’s Hawthorne factory (Parsons, 1974).
Some of these experiments investigated the effects of lighting conditions on workers’
productivity. Two groups of workers participated in the study. One group worked in an
area where there was a dramatic increase in the quality of the lighting. For the other
group, the lighting conditions remained just as before. Experimenters discovered that
the worker productivity in the well-illuminated area increased significantly compared to
the other group. This finding seemed to support their hypothesis that improved light-
Copyright © 2018. Taylor & Francis Group. All rights reserved.
ing increases productivity. However, the experimenters were surprised to discover that
workers’ productivity also improved with changes to rest breaks, to working hours, and
to the types of meals offered in the factory’s canteen. As they experimented further,
they found that even dimming the lights to original levels increased productivity! This
result undermined the initial findings about the relationship between the amount of
light and productivity. The experimenters eventually concluded that the changes to the
quality of illumination had no real impact on job productivity. As it turned out, workers
became more productive simply when they knew they were being studied. This is, of
course, the Hawthorne effect in action.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 51
The Hawthorne effect can be found in almost any experiment with human participants
and can be a serious confounding variable. This is related to the observer-expectancy
bias discussed in Chapter 1, which is when researchers’ expectations are themselves a
confounding variable in an experiment. Fortunately, there are experimental methods that
control for the extraneous variables of researchers’ and participants’ expectations; we’ll
get to that topic later in the chapter.
contributed to our knowledge of light. The light we ordinarily see is visible or ‘white’
light; what is not illuminated appears to us as shadow, darkness. What is the nature of
light? Is it made of more basic matter, and if so, what? And is the light we see the only
kind of that stuff there is? Intuitively, it’s hard to imagine that light could be anything
other than something visible to us.
The nature of light and its relation to the color spectrum visible in rainbows have
been studied for millennia. In Chapter 1, we mentioned Ibn al-Haytham (Latinized as
Alhazen), who, during the Persian Golden Age, made important contributions to the
scientific understanding of vision, optics, and light. In his book Kitāb al-Manāẓir (Book
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
52 Experiments and Studies
of Optics), he evaluated existing theories of light and vision, emphasizing that carefully
designed experiments are a basis of our knowledge of the world. Through experiments
using lenses and mirrors, Ibn al-Haytham showed that light travels in straight lines. From
dissections, he began to explain how the eye works and synthesize the medical knowledge
of previous scholars. In particular, Ibn al-Haytham demonstrated that light is not produced
by the eye, as some theories had claimed, but instead that it enters the human eye from
the outside. Once it was clear that light given off by objects enters the eyes, this raised
new questions about the nature of light (Al-Khalili, 2015).
In the centuries following Ibn al-Haytham’s breakthrough work, many other philoso-
phers and scientists engaged with those questions. In the 17th century, influential natural
philosophers thought that colored light was produced by the modification of white light
by interactions with objects and the materials through which it travels. So, passing light
through a glass prism was thought to produce a spectrum of colors because white light is
modified by the impurities of the glass. Similarly, it was thought that we perceive colorful
rainbows because sunlight is modified by going through drops of moisture.
Isaac Newton (1643–1727), one of the most influential scientists of all time, was not
convinced by this view. Instead, he hypothesized that colors are always contained within
the light itself and that passing light through materials just separates out the colors of
which light is made. To test these competing hypotheses, Newton darkened his room and
bored a small hole in the window shutters, so that only a thin beam of light could enter
the room. When Newton placed a glass prism in the beam, the spectral colors—a rainbow
of light—appeared on his wall. This observation was consistent with both hypotheses,
however. Both the modification hypothesis and Newton’s hypothesis that white light is
a mixture of colors could explain the observation that a beam of light travelling through
a glass prism produces a spectrum of colors.
In another experiment, Newton passed a beam of light through two prisms instead
of one. What would you expect to observe if the modification hypothesis were true?
Presumably, the impurities contained in the two glass prisms would continue to modify
white sunlight and just spread out the color spectrum further. When Newton let the beam
of light pass through the first prism, it split into a spectrum of colors as expected, just
like in the previous experiment. But when the spectrum of colored light passed through
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 53
the second prism, it recomposed back into white light! This observation was unexpected
under the modification hypothesis, but it was consistent with Newton’s thought that
white light is composed of colors. So, this experiment provided Newton with evidence
against the modification hypothesis and in support of his own hypothesis that passing
light through a prism merely separates out what is already there.
While experiments can generate scientific knowledge, they also often prompt new
questions. This was so for Ibn al-Haytham’s finding that light does not originate in the
eye and also for Newton’s later prism experiments. Light isn’t just something we see but
also something we feel; surely you’ve noticed that ordinary sunlight is warm. Newton’s
finding that visible white light is actually a spectrum of colors prompted further questions.
If light is a spectrum of colors, is it also a spectrum of temperatures? Or are different
colors of light the same temperature as one another?
In 1800, the British astronomer William Herschel (1738–1822) used a telescope to
observe sunspots, which are regions on the Sun that appear temporarily dark (Herschel,
1801). Observing sunspots is hazardous for the eyes, so he used colored glass filters to
reduce the intensity of the rays. Herschel noticed that he could feel the Sun’s heat com-
ing through the filters. Different filters seemed to differ in temperature; but since the
filters didn’t differ in material, Herschel wondered whether the different colors of the filters
might actually be responsible for the differences in temperature. Notice that this wasn’t
what Herschel had set out to investigate; sometimes experiments, or observations more
generally, take us in unanticipated directions.
Herschel tested his hypothesis about a relationship between light’s color and
temperature by directing sunlight through a prism to spread the spectral colors, as
Newton had. Then he measured each color—red, orange, yellow, green, blue, indigo,
violet—with a mercury thermometer. He also measured the ambient temperature in
the room in order to have a baseline temperature to compare with the temperature
measurements of the light. This setup yielded data in the form of measured values of
color (independent variable) and measured values of temperature (dependent variable),
which could be used as evidence to evaluate the hypothesis that different colors of
light differ also in temperature. The evidence confirmed this hypothesis: Herschel
found that the temperatures increased incrementally from the ‘cool’ colors like blue
to the ‘warm’ colors like orange.
Another of Herschel’s observations introduced a new question about light. Herschel
also measured the temperature of the air just beyond the beam of red light, outside
the edge of the spectrum created by sunlight through the prism, where no light was
Copyright © 2018. Taylor & Francis Group. All rights reserved.
visible. His hypothesis was that this temperature would be the same as the ambient
temperature in the room, since it was beyond the edge of the light spectrum. To his
surprise, the temperature at that location was much warmer than the ambient room
temperature, even higher than any of the temperature measurements for the light
spectrum. How could that be?
Herschel’s observation immediately led to a new hypothesis: some kind of invisible, hot
light exists just beyond the red part of the visible spectrum. This hypothesis would explain
the observation—anticipated by the French physicist Émilie du Châtelet (1706–1746)
almost 65 years earlier—that the temperature continued to increase beyond the edge of
red light. Later observations confirmed this hypothesis, and we now accept the existence
of this hot, invisible light. It’s called infrared light.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
FIGURE 2.4 William Herschel’s experimental setup to test the relationship between the color
and temperature of light
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 55
Experimental Setup
Experiments have different aspects—physical, technological, and social—that need to fit
together in the right way for scientists to harvest useful evidence; how these aspects are
arranged is the experimental setup.
First, there are concrete, physical aspects. Experiments involve one or more subjects:
humans, non-human animals, or inanimate objects. They also often include instruments:
technological tools or other kinds of apparatus that help enable the experimental process.
Newton and Herschel used telescopes, lenses, prisms, light filters, pencils, and notebooks
to collect and analyze their data. Present-day experiments in high-energy physics at the
European Organization for Nuclear Research, CERN, take place in the Large Hadron
Collider. This is located in a tunnel on the border between France and Switzerland, and
it is used to accelerate and collide subatomic particles. The Large Hadron Collider took
10 years to construct (1998–2008) and involved the collaboration of over 10,000 sci-
entists and technicians from more than 100 countries and hundreds of universities and
laboratories. With a circumference of 27 kilometers, it is currently the largest scientific
instrument in the world. CERN experiments also require the use of powerful computers
for data collection, analysis, and visualization of the myriad particles produced by colli-
sion in the accelerator.
Experiments also occur in some place, over some period of time. Experiments can take
place in laboratories located in universities and hospitals or in the field, that is, in natural
settings like classrooms, subway stations, glaciers, coral reefs, nesting areas, and so on.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Some experiments have a short duration; others can last many years. Herschel observed
different temperatures related to different colored sun filters in one day, on February 11,
1800. Mendel’s experiments with pea plants stretched over a seven-year period. Present-
day experiments at CERN can take dozens of years, as do the experiments carried out in
space by the US National Aeronautics and Space Administration (NASA).
Experiments are also normally carried out by one or more individual scientists.
Collaborative experiments are common in contemporary science; this is one element of the
social structure of science discussed in Chapter 1. Most collaborative experiments involve
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
56 Experiments and Studies
scientists with different backgrounds who rely on one another’s expertise. Experiments at
CERN, for example, are highly collaborative, run by hundreds of scientists and engineers
from all over the world, each of whom brings some specific expertise to bear. This is
more extensive collaboration than is common across science though, and some scientific
experiments are still run by a single lab or even an individual. But even in those cases,
communities of scientists, represented by scientific institutions and societies, determine
protocols to be followed in experimental design and data analysis.
Another aspect of experimental setup is harder to discern but just as important to pro-
ducing evidence. These are the background conditions or extraneous variables. Consider
Newton’s prism experiments. The room at Trinity College, Cambridge, where he per-
formed these experiments, had a certain ambient lighting, temperature, and humidity. The
angle at which sunlight hit the room’s windows varied by time of day and season. Prisms,
the instruments Newton used, were not commonly thought of as scientific instruments
in the 1660s and so were sold simply for their entertainment value. As a result, they
were irregular in both size and composition. These factors were all in the background of
Newton’s experiment.
So, Newton needed to show that none of these background factors undermined his
conclusion that apparently white sunlight contains distinct colors within it (Newton,
1671/1672). As it happened, the Royal Society—the learned society for science of which
Newton was a member—criticized his results on the basis of the condition of the prisms.
The Royal Society suggested that, consistent with the earlier modification hypothesis,
the prisms’ bubbles, veins, and other impurities caused the light to become colored as it
passed through. In general, managing background conditions is one of the most challeng-
ing issues of running experiments.
ensure that data are collected thoroughly and accurately—enough to provide evidence
of the desired form and to enable replication.
Quite often, data collection involves one or more specialized instruments. This
may sound odd, but the acceptance of instruments for data collection in science
was not achieved without struggle. During the Scientific Revolution, a main chal-
lenge was to legitimize the data collected using glassware like prisms, telescopes,
and microscopes, as well as scales, chronometrical devices, and other instruments.
We saw earlier how this challenge factored into the reception of Newton’s findings
(Schaffer, 1989).
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 57
While there’s no longer any question that instruments in general can play an essential
role in data collection, questions about the reliability of specific instruments still arise. No
scientific instrument is free from error. For example, in 2017, scientists at the National
Institute of Standards and Technology (NIST) used a Kibble balance—an instrument for
making extremely accurate measurements of the weight of an object—to determine the
most precise value yet of the Planck constant, which is an important quantity in quantum
physics named after the German physicist and Nobel Prize winner Max Planck (1858–
1947). But even after more than 10,000 measurements, those scientists were still left
with uncertainty about the exact value of the Planck constant, partly because of the error
involved in any measurement (Haddad et al., 2017). (The value of the Planck constant
is about 6.626069934 × 10−34 Joule · second, in case you were wondering.) Such mea-
surement error is an inherent part of data collection. Ultimately, the best that scientists
can do is to avoid systematic measurement error by continually calibrating instruments,
where calibration involves the comparison of the measurements of one instrument (for
example, an electronic ear thermometer), with those of another (for example, a mercury
thermometer), to check the instrument’s accuracy so it can be adjusted if needed.
Different types of data can be analyzed in different ways. One basic distinction is that
data can be either quantitative or qualitative. Quantitative data are in a form—often
numerical—that makes them easily comparable. Climate science data, for example, are
often quantitative. It is recorded as arrays of numbers, numerical indices, and symbols
that correspond to measurable physical quantities. Such quantitative data can be used
for statistical analysis (see Chapters 5 and 6) and computer simulation (see Chapter 3).
Qualitative data consist of information in non-numerical form. This information can
be obtained, for example, from diary accounts, unstructured interviews, and observa-
tions of animal behavior. Analysis of qualitative data is often less straightforward than
quantitative analysis. It requires accurate description of subjects’ responses and behavior,
trustworthy informants, and significant background knowledge. We will say more about
qualitative research in Section 2.3.
In experiments with human subjects in the social, cognitive, and behavioral sciences,
data collection often involves questionnaires that create quantitative, numerical data from
qualitative information. These questionnaires may include multiple-choice questions and
scales of various kinds. For example, standardized tests like the SAT, used for admission
decisions to colleges and universities in North America, are considered predictors of stu-
dent performance. Student performance varies along multiple dimensions, but the SAT
and similar tests boil this down to a single score for each test taker that is relative to other
Copyright © 2018. Taylor & Francis Group. All rights reserved.
test takers’ performance. Other questionnaires provide quantitative data about personal-
ity traits, political opinions, attitudes toward some topic or group of people, and so on.
Questionnaires can be a very useful form of data collection, but good questionnaire
design is vital for collecting reliable data. This is like the need for instrument calibration
described earlier. And, for questionnaires, effective design and calibration can be surpris-
ingly difficult. A poorly designed question can prime subjects to answer in a certain
way—often, because of the observer-expectancy effect, the way the experimenter expects
or desires them to answer. Questions can also be vague or ambiguous, eliciting different
kinds of responses from different people or unintentionally ask about more than one thing
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
58 Experiments and Studies
at once. Frankly, there are many ways to go wrong, and so there are many more poorly
designed surveys out there than well-designed surveys. Poorly designed questionnaires can
result in data that are too weak to count as evidence or to support inferences, or that are
otherwise useless because they cannot be analyzed in the intended way.
you might assign people to play different amounts of violent video games (intervening
on the independent variable of violent-video-game playing) and then record their level
of violence. If you observe increased violent behavior then the intervention—the violent-
video-game playing—is responsible for it. But is it the violent video games, or would
playing any video games at all result in more violent behavior? A new experiment is
called for. If it’s the violent video games, is it a particular form of violence or any violent
video games? These kinds of questions are always possible. Other untested hypotheses
often lurk right around the corner. For that reason, few if any experiments are crucial
experiments that decisively favor a given hypothesis.
There’s also a problem with the idea that an experiment can definitively prove some
hypothesis is wrong. An experiment to test some hypothesis involves a number of aux-
iliary assumptions—assumptions that need to be true in order for the data to have the
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 59
intended relationship to the hypothesis under investigation. When data do not match
expectations, this might be because the hypothesis is wrong, or it might be because one
of the auxiliary assumptions is wrong. Perhaps your data collection instrument is miscali-
brated, or your group of subjects is atypical, or there’s some confounding variable you
haven’t predicted. So, whether the data from an experiment match your expectations
or not, this is not truly decisive. One experiment can weigh in favor of or against some
hypothesis, but it generally can’t settle the matter once and for all.
We’ve discussed three sources of uncertainty about what an experiment shows: extra-
neous variables, unanticipated hypotheses, and auxiliary assumptions. One of the primary
Copyright © 2018. Taylor & Francis Group. All rights reserved.
ways to minimize uncertainty from these three sources is for experiments to be repli-
cated. Replication involves performing the original experiment again—often with some
modification to its design—in order to check whether the result remains the same. If, for
example, the spectrum of light recombining into white light observed by Newton is also
observed by different people, using different prisms, in different places and at different
times, then this additionally supports Newton’s hypothesis that white light contains a
spectrum of colors. If some experimental result cannot be replicated—if different scien-
tists follow similar experimental procedures but do not get the same result—then the
original experimental result may be a fluke, or it may be due to some confounding variable
in the experimental setup that the scientists haven’t yet identified.
The replicability of experiments is an indispensable ingredient of science, so much so
that a persistent failure to replicate findings may undermine a scientific field’s credibility.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
60 Experiments and Studies
For example, we saw in Chapter 1 that astrology’s failure to replicate findings is part
of its pseudoscientific status. Recently, it has also been suggested that the field of social
psychology faces a crisis in replicability, where different research groups have tried but
failed to replicate some classic experimental results. This suggests we should perhaps not
put too much stock in those findings, unless this failure in replicability is resolved (Pashler
& Wagenmakers, 2012).
The difficulties in designing genuinely crucial experiments and the importance of rep-
lication fit with the idea that science is essentially a collaborative, social venture. Because
of this, gaining scientific knowledge via experimentation is generally more complicated
and slower than a single dramatic experiment. Also, scientific knowledge can go in unex-
pected directions: a surprising finding that upends something we thought we understood
might be right around the corner.
the function of and to calibrate an instrument for data collection. Functional magnetic
resonance imaging (fMRI) machines track blood flow in the brain. They do not directly
measure neural activity, but that is what the scientists employing these machines want to
assess. Neuroscientists use data about blood flow to reason about neural activity because they
know that greater neural activity requires more energy, which requires increased metabolism,
which uses more oxygen, and oxygen is delivered by blood flow. The expectation that blood
flow provides a good proxy for neural activity is also confirmed by findings concerning brain
metabolism and the relationship between different brain areas and functions.
Besides evaluating and calibrating instruments, experiments can be used to deter-
mine the value of physical constants, or quantities that are believed to be universal and
unchanging over time. We mentioned Planck’s constant earlier. Another physical constant
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 61
is the speed of light in a vacuum. In the Opticks, Newton reported the calculations of the
Danish astronomer Ole Rømer (1644–1710) regarding the speed of light. Rømer observed
that there could be a difference of up to 1,000 seconds between the predicted and
observed times of the eclipses of Jupiter’s moons. Based on the estimated distance between
Jupiter and the Earth, Rømer concluded that light travels at about 200,000 kilometers per
second. In 1849, the French physicist Hippolyte Fizeau ran the first major experiment to
precisely determine the speed of light. Fizeau built an experimental apparatus in which
an intense light source and a mirror were placed eight kilometers (about five miles) apart.
He placed a rotating cogwheel between the light source and the mirror and increased
the speed of the wheel until the reflection back from the mirror was obscured by the
spinning cogs. Based on the rotational speed of the wheel and the distance between the
wheel and the mirror, Fizeau calculated that the speed of light is 313,000 kilometers per
second. Rømer’s estimate and Fizeau’s later calculation were on the right track; today,
we take the speed of light to be 299,792 kilometers per second.
A third role of experiments is exploratory. In this use, experimentation does not rely
on existing theory and may not be aimed to test a specific hypothesis. An exploratory
experiment is used to gather data to suggest novel hypotheses or to assess whether a poorly
understood phenomenon actually exists. Herschel’s work on the relationship between
heat and light, for example, did not rely on a particular theory or a hypothesis about the
relationship. When, in the course of investigating sunspots, he discovered that red light
has a greater heating effect, Herschel surmised that the light spectrum is made of both
heat and colors. This idea was on the right track, but it was not until James Maxwell’s
(1831–1879) theory of electromagnetic radiation that Herschel’s observations could be
adequately explained and his work vindicated.
EXERCISES
2.1 Review the discussion of Newton’s prism experiment. Identify the hypothesis under
investigation, the independent variable, and the dependent variable, and describe
the intervention.
2.2–2.5 The Anglo-Irish scientist Robert Boyle (1627–1691) used equipment like vac-
uum chambers, air pumps, and glass tubes in his experiments. With the assistance
of Robert Hooke, Boyle conducted a series of experiments in the 1660s to ascertain
how the pressure and volume of the air vary when the air is either ‘compressed or
Copyright © 2018. Taylor & Francis Group. All rights reserved.
dilated’. He used a J-shaped glass tube. The tube was closed off at the short end,
and the long end was left open. By adding mercury in the longer end, Boyle could
trap air in the curved end of the tube; by changing the amount of mercury, he was
also able to change the air pressure at the short end. Boyle repeated this experi-
ment, measuring the volume of the air in the short end of the tube at a range of
pressures. What he discovered was that, as he increased the pressure on the air,
the volume of the air would decrease. Boyle’s formulation of this relationship would
become the first gas law, now known as Boyle’s law.
2.2 What was the hypothesis under investigation? Use that hypothesis to identify the
independent variable and the dependent variable. What evidence was gained from
this experiment?
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
62 Experiments and Studies
2.3 Make a list of 10 extraneous variables in Boyle’s experiment. Put a star next to any
variables that you think might have been confounding variables, and say why. Try
to do this for at least two variables on your list.
2.4 Think of an alternative hypothesis that could account for the results of Boyle’s experi-
ment. State that hypothesis, and describe how it could account for the data.
2.5 Define calibration, and describe how it was involved in Boyle’s experiments.
2.6 Describe three features of experiments that are particularly valuable to testing
hypotheses, and describe the value of each of those features.
2.7 What is the relationship between extraneous variables and confounding variables?
Why are experiments designed to limit confounding variables?
2.8 List the three kinds of sources of uncertainty regarding what a given experiment
shows. Describe each one, and give an example of each.
2.9 Describe the problem of underdetermination, and discuss how scientists deal with it.
2.10 Briefly describe three roles for experiments other than testing hypotheses, and give
an example of each. Then discuss how each of these might relate indirectly to testing
hypotheses.
2.11 Before Ibn al-Haytham’s work, some thought that vision involved light shining out of
the eye, coming into contact with objects, and thereby making them visible. This was
known as the emission theory of vision.
Describe an experiment that would test the emission theory of vision. What would
you expect to observe in that experiment if the emission theory were true? Finally, list
the auxiliary assumptions you would need to make in order for the emission theory
to generate those expectations.
2.12 Ibn al-Haytham set up the following experiment to test the emission theory of vision.
He stood in a dark room with a small hole in one wall. Outside of the room, he hung
two lanterns at different heights. He found that the light from each lantern illuminated
a different spot in the room. For each, there was a straight line between the lighted
spot, the hole in the wall, and one of the lanterns. Covering a lantern caused the
spot it illuminated to darken, and exposing the lantern caused the spot to reappear.
a. What data were produced by this experiment?
b. How do the data provide evidence against the emission theory?
c. Describe one way in which the emission theory might be adapted to account for
the data (but still remain an emission theory of vision).
Copyright © 2018. Taylor & Francis Group. All rights reserved.
d. Describe one new hypothesis you can formulate based on the results of Ibn al-
Haytham’s experiment.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 63
Defining Expectations
To test a hypothesis with an experiment, an important first step is to articulate what the
hypothesis would lead you to expect for the outcome of the experiment. Those expecta-
tions are predictions of the results of some intervention if the hypothesis in question is
true. The expectations might also be informed by background knowledge or some general
accepted theory. Ideally, expectations are clearly and precisely defined in advance in a
way that makes them easily comparable to the data the experiment will produce. This is
important for controlling the extraneous variable of experimenters’ beliefs, which other-
wise may influence their perceptions of the experimental results (recall from Chapter 1
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
64 Experiments and Studies
Contrast this with Albert Einstein’s theory of general relativity. This theory revolu-
tionized our understanding of space and time. While Newton believed that space is a
sort of absolute stage on which events unfold, Einstein conceived of space and time as
a single interwoven manifold, a fabric of sorts. For Newton, gravity was a force; Einstein
instead explained gravity as the curvature of the space-time manifold. Just as marbles
placed on a fabric sheet held in the air bend the sheet around them, massive objects like
the Sun warp space-time in their vicinity. This is why other objects accelerate toward
those massive objects.
Unlike Freud’s psychoanalytic theory, Einstein’s theory of general relativity generates
clear expectations. One of these expectations is that light, just like any other form of
matter, is affected by gravity. If a beam of starlight passes near the Sun, then it should be
deflected, or bend, toward the Sun. The beam’s deflection can be measured as the angle
between where we actually see the star and where we would expect to see the star if
the beam of light had travelled in a straight path. Einstein’s theory also provides us with
a precise prediction of this angle.
This prediction could first be tested a few years after Einstein completed his theory in
1915. On May 29, 1919, when a total solar eclipse blocked out the dazzling light of the
Sun, a group of scientists led by English astronomer Arthur Eddington took photographs
of stars visible near the dimmed Sun. They compared these to other photographs taken
at night, when the light of those same stars did not pass close to the Sun before reaching
Earth. From this comparison, Eddington was able to test, and confirm, Einstein’s predic-
tion of the light’s deflection. The Sun changed the path of nearby starlight as the theory of
general relativity predicted, providing confirmation of that theory (Dyson, Eddington, &
Davidson, 1920). When the press reported that a key prediction of Einstein’s theory had
been borne out by observation, Einstein became a famous public figure.
Here’s another example of a clear and precise expectation based on a hypothesis.
This example comes from game theory, which is a broad framework for thinking about
conflict and co-operation among strategic decision-makers. Imagine you are given $10.
You’re asked to share this sum with a partner, and you and your partner must agree about
how to divide it. You can propose a division of the $10, and your partner can accept or
reject that offer. If your partner rejects your proposed division, neither of you will get
any money; if your partner accepts your offer, you’ll each get your agreed-upon share of
the money. What would you do?
Based on standard game theory, if everyone acts in their own self-interest, one would
expect that proposers in this situation will offer close to nothing to their partners and
Copyright © 2018. Taylor & Francis Group. All rights reserved.
that responders will accept anything more than $0. For responders, it’s rational to accept
anything, since otherwise they’ll get nothing. And proposers know this, so it’s rational for
them to offer only a small amount.
This expectation has been experimentally tested time and again, and it turns out to
be wrong (Güth, Schmittberger, & Schwarze, 1982). The average offers are around 40 to
50% of the total sum, that is, about $4 or $5 dollars when $10 is being divided. And when
proposers offer less than 30%, responders consistently reject the offer, deeming it unfair,
even though this results in them getting no money at all. The proposers and responders
were on the same page, apparently willing to sacrifice self-interest for fairness. This was
not at all what standard game theory predicted.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 65
FIGURE 2.6 Headlines reporting on Arthur Eddington’s observations during the 1919 eclipse,
which confirmed Albert Einstein’s theory of general relativity
As both this example and the one before illustrate, scientists’ hypotheses and theories
often involve concepts and variables that we don’t have an obvious way to test. This is
a stumbling block in formulating clear expectations for an experiment. How can you
measure the values of variables like wealth, violence, mood, and fairness? To manage this
difficulty, scientists often use operational definitions and clusters of indicators to charac-
terize fuzzy concepts in a way that allows for measurement.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
66 Experiments and Studies
For some concepts, there simply is no single best definition or measure. There is,
accordingly, choice in how to operationally define it or which cluster indicators to use.
Still, some definitions are better than others. Some definitions or sets of indicators get
closer than others to capturing what we have in mind for, for instance, a fair deal or being
wealthy. Our theories of the phenomena under investigation regularly inform how we
define concepts. For example, some definitions may be shown to specify the nature of
poverty more accurately than others because they accord better with our best economic
and sociological theories or because these definitions have been shown to better predict
future events consistently across studies.
Intervention
An experimental intervention is the centerpiece of a perfectly controlled experiment.
Recall that an intervention is a direct manipulation of the value of a variable. Because
of this intervention, that variable is called the independent variable. Interventions could
include the administration of a drug to a group of patients, fertilizer to a plot of land,
or deliberate changes in the lighting conditions in a workplace. During an experiment,
scientists deliberately intervene on the independent variable and then measure the impact
of their intervention on the dependent variable. In an agriculture experiment, for example,
scientists may assess the hypothesis that a particular fertilizer is better for crop yield.
Their intervention would consist in changing the value of the variable of interest: the
type of fertilizer. In particular, they would change the fertilizer to the particular fertil-
izer the hypothesis predicts is better. They then would watch for changes in crop yield,
the dependent variable. The expectation based on the hypothesis is that crop yield will
increase; their measure of the value of the dependent variable, crop yield, is a way to
assess this hypothesis.
There are many different ways to perform experimental interventions. But ideally,
scientists want interventions to be ‘surgical’. This metaphor suggests interventions should
be made with the precision that surgeons bring to the operating table; the incision should
be carefully made at the exact location that will bring about the desired effect. If an
intervention is surgical in this sense, it affects only the independent variable. Any change
in the value of the dependent variable can then be traced back to the independent vari-
able’s influence. A surgical intervention on the type of fertilizer will simply switch out
the old fertilizer for a new kind. Everything else should remain the same: when and how
frequently the fertilizer is applied, the method used to apply it, the location of the field,
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 67
Controlling Variables
In ‘surgical’ interventions, conditions are created in which no variables, other than the
independent variable and the dependent variable, change when an intervention is per-
formed. So, another key feature of a perfect experiment is the full control of all extra-
neous variables. Full variable control is exceedingly difficult to accomplish. There are
always countless extraneous variables in an experiment, many of which scientists don’t
fully understand or aren’t even aware of. All of those extraneous variables need to be
controlled in order to avoid confounding variables, but it’s hard to control what you have
not identified!
Control over variables can be approached in a number of ways. These can be divided
into two broad categories: direct and indirect. Direct variable control is when all extrane-
ous variables are held at constant values during an intervention. Because the extraneous
variables are unchanging, they cannot be responsible for any changes to the dependent
variable. So, if direct variable control is successful, only the intervention can be responsible
for a change in the dependent variable.
Recall Newton’s prism experiments. Newton could directly control some extraneous
variables, like the time of day at which he ran his experiments and the lighting conditions
in his chambers. Keeping those variables constant ensured that, for example, any difference
in the composition of morning and afternoon light didn’t affect his findings. Newton also
attempted to control for the confounding influences of air bubbles and other impurities
in the prisms by using higher-quality prisms.
The carefully arranged conditions in today’s laboratories help scientists to directly con-
trol many variables. Temperature, cleanliness, lighting, noise, instructions to human sub-
jects—all of these factors and more are extraneous variables, and all should be held fixed
during an experiment. Consider again experiments conducted with the Large Hadron
Collider at CERN, the world’s largest laboratory. One important independent variable is
the proton-proton collision. Dependent variables, which are measured and analyzed by
scientists at CERN, are features of the by-products of these collisions. During experiments,
scientists use sophisticated technologies to keep many variables under direct control, such
as the magnetic fields and temperature in the collider.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
In many experiments, however, direct control of all extraneous variables is simply not
possible. As we have seen, scientists often don’t even know all the extraneous variables
that may be relevant. The second category of variable control, indirect variable control,
helps with this. The basic idea is to allow extraneous variables to vary in a way that is
independent from the intervention. Then, although extraneous variables will vary, they
should vary in a way that is the same for the different values of the independent vari-
able. Any systematic differences in the dependent variable between different values of
the independent variable can then be reasonably attributed to the independent variable.
The first step to indirect variable control is to set up two groups of experimental entities
(whether cells, plots of land, people, mice, or other subjects) to compare. The intervention
should be the only thing that distinguishes these groups from one another. One group,
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
68 Experiments and Studies
the experimental group, receives the intervention to the independent variable. The other
group, the control group, experiences the default other value(s) of the independent vari-
able. And then, some approach is used to try to ensure that all extraneous variables affect
the two groups equally.
One approach to indirect variable control is randomization: the indiscriminate assign-
ment of experimental entities to either the experimental group or the control group.
Some method of group assignment is adopted so that no features of the experimental
entities can be taken into account, even unconsciously, in determining group member-
ship. This is meant to ensure that any differences among the experimental entities vary
randomly across groups and thus bear no relation to the systematic difference between
groups, the intervention. Many scientists believe randomization is the gold standard of
indirect variable control.
Randomization is one of the best approaches to indirect variable control, but it’s not
a surefire guarantee. It could happen that all patients with some characteristic—say, all
smokers—are randomly assigned to the experimental group, while all nonsmokers are
randomly assigned to the control group. In an experiment designed to test, say, the effect
of exercise on health, whether people smoke is surely a significant confounding variable.
This example is extreme, but there is a much more general point behind it. Random
group assignment guarantees extraneous variables are not related to group assignment,
but it does not guarantee that extraneous variables do in fact vary equally across the two
groups. Even with random assignment, the experimental and control groups may still
differ from one another in ways other than the intervention.
For this reason, there’s another condition that must be met for randomization to be an
effective approach to indirect variable control: the sample size must be sufficiently large.
Sample size refers to the number of individual sources of data in a study; often, this is
simply the number of experimental entities or subjects. If the sample size is very small,
chance variations between randomly assigned experimental and control groups is likely.
If the sample size is very large, such chance variation is exceedingly unlikely, so unlikely
that these variables can be considered effectively controlled.
Imagine an experiment that involves only four people, two of whom are smokers. It is
reasonably likely that both smokers will be randomly assigned to one group. Indeed, this
would happen one out of every three times they are randomly assigned to groups. Now
think about all of the variables among those four people: age, gender, medical history,
education level, and so on. It’s all but guaranteed that at least some of these extraneous
variables will be randomly distributed unevenly between the experimental and control
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 69
scientists often harbor background beliefs or even specific expectations about the outcome
of an experiment. These are also extraneous variables that can readily become confound-
ing variables; recall from Chapter 1 the power of confirmation bias.
The strategies of direct and indirect variable control that we have talked about so far
don’t help with these kinds of extraneous variables. Recall the example of investigating
the effects of some exercise regime on health and, in particular, how randomization and a
sufficiently large sample size control for the extraneous variable of cigarette smoking (and
many others as well). If the researchers administering the tests used to evaluate health (the
dependent variable) know whether the subjects they are testing exercised or not, then this
knowledge and their expectations regarding the effects of exercise might subtly influence
their evaluation of subjects’ health. Randomization and large sample size are no help here.
To control for potential researcher bias, scientists sometimes design their experi-
ments so that not even they know which subjects are in the control group and which
are in the experimental group. This protocol is called a blind experiment. In the exercise/
health experiment, assignment to groups should be not only random but also blind;
researchers shouldn’t know which subjects are in which group. Then, when they test
a subject’s health, their expectations regarding the effects of exercise can’t influence
their judgments of that individual’s health, since they won’t know whether the subject
has exercised or not.
With a blind experimental setup, the researchers’ expectations cannot influence the
findings but the expectations of the experimental subjects might. Imagine you’re assigned
to the experimental group, and you dutifully exercise as assigned. You might be motivated
to work extra hard on the assessment of your health, or your expectation of good health
might decrease your blood pressure, or there may be some other unintended influence on
your health because of your expectation of the exercise’s effects. You might also simply
want to please the researchers by helping show they are right about the value of exer-
cise. This possibility is eliminated if both researchers and subjects are unaware of which
subjects are in which group. This is called a double-blind experiment.
Double-blind experiments are especially important for drug trials that test out new
medicines. If participants or experimenters expect a particular medicine to be effective,
then that expectation can directly lead to improved health. This is called the placebo
effect. For this reason, it’s important that neither experimenters nor experimental
participants know which participants receive the medicine being tested. The control
group receives a placebo, an impotent substance or therapy. This way, no participants
can discern whether they are receiving the real medicine, and they will be equally
Copyright © 2018. Taylor & Francis Group. All rights reserved.
subject to the placebo effect. (This is, then, indirect control of the extraneous variable
of placebo effect.)
Another way to control for participants’ expectations is with deception. Whereas
blinding involves omitting some piece of information, deception involves actively
misinforming participants to interfere with how their expectations influence their
behavior. The American social psychologist Stanley Milgram (1933–1984) often used
deception in his experiments. For instance, Milgram wanted to understand people’s
willingness to obey an authority figure who instructed them to inflict serious harm
to others. It probably wouldn’t have worked to tell the experimental participants this
was what was being tested. Few of us want to be viewed as inflicting harm on others
just because someone in power told us to! So, Milgram falsely told participants that
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
70 Experiments and Studies
they were helping another person learn some material by quizzing the other person and
delivering electric shocks to them to punish any incorrect answers. In reality, there was
no other person learning, and no electric shock. The experimenters were simply study-
ing how far participants would go in harming others simply because they were told to.
(Ethics guidelines are more stringent now than they were then, and this study would
probably not pass muster now.)
to learn. The experimenter instructed the subject to flip a switch for each wrong
answer, starting from 15 volt shocks and increasing for each error until the learner
had learned all the pairs correctly.
The dependent variable of the real experiment was the maximum shock
subjects were willing to administer before refusing to continue. What results
do you think Milgram obtained? Out of, say, 100 subjects, how many do you
think would have administered shocks up to the highest level when instructed
to do so? In Milgram’s first study, he found that, although many displayed
deep discomfort at doing so, a full 65% of subjects administered the highest
level of shock, marked ‘XXX’.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 71
EXERCISES
2.13 List all the features of a perfectly controlled experiment. For each, say what is impor-
tant about that feature and what is challenging about accomplishing it.
2.14 Imagine you want to establish what effect, if any, taking notes on a laptop during
class instead of on paper has on retention of information.
a. Specify your hypothesis regarding the note-taking medium and memory. What
are your expectations for your experiment, given this hypothesis?
b. Describe your ideal experiment to test this hypothesis. Don’t worry about how
easy it would be to actually conduct the experiment or if it’s even possible.
Make sure to specify all the main features of the experiment.
c. Identify three major challenges to conducting the ideal experiment you have
described. Say why each is a problem.
2.15 Philosophy majors tend to perform very well on all of the main entrance exams required
by graduate programs and professional schools. They are the only major to score above
average on all four of the following: the General Management Admissions Test (GMAT),
the Law School Admissions Test (LSAT), the verbal portion of the Graduate Record Exami-
nation (GRE), and the quantitative portion of the GRE. Philosophy majors are vying with
physics majors each year for the best comprehensive GRE scores, and they also have
had the highest average on the verbal portion of the GRE, second highest on the GMAT
(after mathematics), and third highest on the LSAT (after physics and economics).
Formulate three different hypotheses that are each compatible with these data.
Choose one of the three hypotheses, and design an experiment that could test it.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Make sure you specify the independent and dependent variables, the intervention,
your expectations for the findings if the hypothesis is true, and how you will control
for extraneous variables, including experimenter and subject bias.
2.16 We have discussed how Einstein’s theory of general relativity generates the expecta-
tion that light, just like any other form of matter, is affected by gravity. This was surpris-
ing in the sense that it predicted certain events that had not been observed before.
a. Why are surprising expectations, or novel predictions, important for testing
hypotheses?
b. How can surprising expectations, or novel predictions, be generated in sci-
ences like archaeology and paleontology that study the past?
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
72 Experiments and Studies
c. How can surprising expectations be generated about events that have already
occurred or about data that scientists already have?
2.17 Suppose you want to test the hypothesis that baseball players who eat pizza every
day hit more home runs. Let’s suppose that to test this hypothesis, you want to divide
the baseball players of some team into two groups that are balanced in all important
background variables that can affect players’ performance. The only difference you
want between the two groups is that the members of one group eat pizza every day
and the members of the other group do not.
Rank the following four strategies from best to worst for accomplishing this goal:
1. Sit in the clubhouse after a game. The first players who enter the clubhouse are
assigned to the group of pizza eaters (the experimental group), while the fol-
lowing players are assigned to the control group.
2. Allocate players born in the first six months of the year to the experimental group
and players born in the second six months of the year to the control group.
3. For each player in the team you toss a coin. If the coin lands on heads, then the player
is in the experimental group; otherwise, the player is assigned to the control group.
4. Assign all players over 230 pounds to the experimental group and the rest of
the players to the control group.
Justify each of your rankings by describing how well or poorly you expect that strat-
egy will control the extraneous variables.
2.18 What is the purpose of having an experimental group and a control group in an
experiment? How does division into two groups achieve this purpose?
2.19 Describe what randomization involves, why it can help to control for confounding
variables, and what its limitations are.
2.20 Define direct variable control and indirect variable control. Then, describe (a) how
each is accomplished and (b) the advantages and disadvantages of each approach.
2.21 The American Psychological Association (APA) code of ethics maintains that experi-
mentation may not involve use of deceptive techniques unless doing so has significant
prospective scientific, educational, or applied value; that effective non-deceptive
alternative procedures are not feasible; that participants are not deceived about
research that is reasonably expected to cause physical pain or severe emotional
distress; and that psychologists explain any experimental deception to participants
as early as is feasible. Now, given these guidelines, think about Milgram’s (1963)
experiment, and answer these questions:
Copyright © 2018. Taylor & Francis Group. All rights reserved.
• Distinguish between lab and field experiments and identify the features of each
• Define external validity and internal validity and describe the importance of each
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 73
to conduct because of the laws of nature. Astrophysicists and cosmologists have long
pondered the nature of black holes, which have such strong gravitational fields that
they bend the surrounding space-time, so that all light and matter spiral inescapably
into them. No one can possibly be in the right position to directly observe this, let
alone to intervene on it.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
74 Experiments and Studies
FIGURE 2.7 Mars Curiosity rover selfie taken on Mount Sharp (Aeolis Mons) on Mars in 2015
Laboratory experiments give researchers control over many aspects of the experiment,
specifically over any interventions performed and the direct and indirect control of many
extraneous variables. Depending on the nature of the experiments, a lab’s design features
may include constant temperature, sterile environment, special equipment to produce
unusual conditions, or, for experiments with human subjects, carefully selected lighting
and furniture, soundproofing, and experimenters’ confederates who behave in a specified
way. Those design features, and the control they provide, constitute one of the greatest
advantages of the laboratory. Laboratory conditions are designed to control extraneous
variables, to aid in detection and measurement of focal variables, and to create unique
situations that don’t often or ever occur outside the lab. These features can enable scien-
tists to discover regularities that are not easy to discern in the outside world.
The high degree of control enabled by laboratory conditions brings with it a high
degree of internal experimental validity. An experiment has high internal validity when
Copyright © 2018. Taylor & Francis Group. All rights reserved.
scientists can correctly infer conclusions about the relationship between the independent
and dependent variables with great certainty. This amounts to the absence of confound-
ing variables, achieved by direct or indirect control of all relevant extraneous variables.
A second advantage of laboratory experiments is that the experimental setup and data
analysis can follow predetermined, standard procedures, which make it easier to assess
and replicate an experimental finding.
However, there are also some disadvantages to lab research. To start with, some phe-
nomena are not easily investigated in a lab. Suppose you are investigating the effects of
climate change on large marine mammals. Specifically, you want to determine the effects
of elevated Arctic Ocean temperatures on the deep-diving behavior of narwhal whales.
Narwhals—the so-called unicorns of the sea because of their tusks—can dive as deep as
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 75
1.8 kilometers (6,000 feet) in Arctic waters. To directly investigate this phenomenon in a
lab, you will need—for starters—a huge tank of freezing salt water nearly two kilometers
deep. Good luck with that, right?
Furthermore, the same conditions that make it easy to directly and indirectly control
variables make the lab conditions different from the outside world, and that has some
disadvantages too. The artificiality of the experimental setting might mean that the results
obtained in the lab do not generalize well to real-life settings outside the lab. This is
problematic, since it’s ultimately the features of real-world phenomena that we want to
know about. Laboratories thus facilitate high internal validity, but potentially at the cost
of external validity. External experimental validity is the extent to which experimental
results generalize from the experimental conditions to other conditions—especially to the
phenomena the experiment is supposed to yield knowledge about.
External validity has two components: population validity and ecological validity.
Population validity is the degree to which experimental entities are representative of
the broader class of entities of interest. For experiments with human subjects, this is the
broader population they represent. The more representative a sample is of the broad class
or population, the more confident scientists can be of the experiment’s external validity.
Here’s an illustration of the importance of population validity. Many clinical trials test-
ing the efficacy and side effects of drugs are performed only on men, but the results are
expected to generalize to women as well. This decreases the population validity of the
results, since women and men differ in a number of medically relevant ways. There is thus
relatively limited experimental knowledge about the effects of some drugs on women, and
this may have serious consequences for health and medicine. Indeed, many prescription
drugs have been withdrawn from the market after they were belatedly revealed to pose
greater health risks for women than for men (Simon, 2005).
The second component of external validity, ecological validity, is the degree to which
experiment circumstances are representative of real-world circumstances. Experimental
settings or what subjects are asked to do can be artificial, unlike real-world circumstances,
in ways that impact the phenomenon under investigation. Consider again Milgram’s
experiment on compliance. How do you think the ecological validity of this experiment
rates? To answer this question, we need to consider how similar the situation encountered
in this experiment, administering electrical shocks to other people following instruction
from an authoritarian leader, is to scenarios in which people are usually asked to comply.
Limited ecological validity is a reason to question an experiment’s external validity, that
is, its significance for the broader conclusions we want to draw from it.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
76 Experiments and Studies
these advantages is decreased internal validity. Less influence over the circumstances and
the selection of experimental subjects is also linked to decreased control over extraneous
variables and sometimes a decreased ability to intervene in the desired way. Because ran-
domization may not be feasible in field experiments, the researchers should decide how
best to divide the subjects into control and experimental groups, which may introduce
confounds. Besides decreasing internal validity, this decreased influence on experimental
design also makes it more difficult for other researchers to replicate the experiment.
Researchers conducting field experiments may also be constrained in what they can be
in a good position to observe or measure, the number of subjects they can involve, and how
long they can run the experiment. Many field experiments, for example, require special
permissions from individual subjects or from authorities that control access to areas like
nature preserves. Gaining these permissions can be difficult, and authorities can impose
limitations on researchers. Uncontrollable events like inclement weather, or warfare, can
disrupt observation or limit the length of study that’s feasible.
Let’s see how these features play out in a real field experiment. In their study entitled
‘Women as policy makers’, Raghabendra Chattopadhyay and Esther Duflo (2004) inves-
tigated how women village council leaders, or pradhan, might affect the social services
provided by councils in India. This experiment was possible because of an Indian consti-
tutional amendment in 1993, calling for one-third of pradhan positions to go to women.
Thus, the experimenters had no say in the assignment of pradhan positions to women,
as this was established by the Indian government. This also means the intervention was
not implemented by the researchers, but the law was structured so that the change in
leadership was randomly implemented across villages, mimicking a surgical intervention.
Data were collected on 265 village councils in West Bengal and Rajasthan. In each
village council, the two researchers collected the minutes of village meetings and inter-
viewed the pradhan. They also collected data from each village about social services,
infrastructure, and complaints or requests that had been submitted to the village council.
The pradhans’ policy decisions and villagers’ requests were not affected by their interac-
tions with the experimenters, since those requests and decisions were already made at the
time of data collection. It was found that women policy makers (independent variable)
had important effects on social service policy decisions (dependent variable). Women
pradhan invested more in the social goods that were more closely connected to women’s
concerns in a village: drinking water and roads in West Bengal and drinking water in
Rajasthan. They invested less in public goods connected to men’s concerns: education in
West Bengal and roads in Rajasthan.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 77
trol groups. But sometimes randomization isn’t possible for practical or ethical reasons.
If you’re studying the effects of gestational diabetes on fetuses, for example, you can’t
simply assign subjects to mothers with or without gestational diabetes (the independent
variable). And it’s not ethical to randomly assign pregnant women to experimental condi-
tions aimed to increase the chance of developing gestational diabetes.
Other methods can be used to control variables when randomization isn’t feasible.
One method is to restrict participation in an experiment to experimental subjects with
the same levels of some extraneous variable. For example, suppose that age and smoking
are the two extraneous variables of greatest concern in an experiment aimed to test the
relationship between cholesterol level and heart disease. Randomization is not possible
here, or at least not ethical, but the extraneous variables of age and smoking can be
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
78 Experiments and Studies
controlled by restricting admission into the experiment to subjects who are non-smokers
age 30–50. This method is simple. However, it decreases the achievable sample size and lim-
its the external validity of the experimental findings (due to decreased population validity).
Another approach is to use data about extraneous variables and their effects in order
to account for their influence on the dependent variable. For example, in a landmark
study known as the Harvard Six Cities Study, researchers investigated the effects of air
pollution on health (Dockery et al., 1993). During the 1980s and 1990s, different areas
in the US had very different levels of air pollution. The researchers studied 8,000 experi-
mental participants living in six cities in different areas, including Boston, an industrial
area in Ohio, and rural Wisconsin. Participants’ health was monitored for 20 years and
compared with air pollution measurements in the six cities. The researcher used statis-
tics regarding the health effects of socioeconomic factors, demographics, and smoking to
estimate the likely effects of those extraneous variables on participants’ health. This was
a way to indirectly control for those variables, even if there were systematic differences
in how they affected participants in different studies (the different groups in the study).
The researchers found that, taking all these other variables into account, decreased air
pollution is linked to increased life expectancy.
Yet another approach to indirect variable control is to match the members of the
experimental and control groups so the groups don’t differ in the values of known extra-
neous variables. This involves matching every subject in the experimental group with a
subject in the control group, based on knowledge of how certain extraneous variables,
such as age and smoking history, affect individual subjects. For example, researchers might
include pairs of smokers of the same age and pairs of non-smokers of the same age in
their study. One member of each pair should experience the experimental condition (say,
complete an exercise regime) and the other should experience the control condition (say,
exercise as they ordinarily would). In this way, groups of subjects can be made similar
with respect to the primary extraneous variables, thereby indirectly controlling them.
This method is often effective, but it has some limitations. It only works for extraneous
variables researchers are already aware of. It can also be time-consuming and expensive
to find matched subjects, and this may limit the sample size.
A fourth choice in experimental design concerns how many groups to include in an
experiment. So far, we have focused on experiments with two groups: an experimental
group and a control group. More complicated experimental designs include multiple
experimental groups, each of which experiences a different but related intervention. We
saw an example of this in the Harvard Six Cities study. There were six different groups,
Copyright © 2018. Taylor & Francis Group. All rights reserved.
each corresponding to a city with some measured value of air pollution. Participants
were assigned to groups simply according to which city they lived in. Including multiple
experimental groups can be enlightening but also complicates experiments, making them
more difficult to perform. They also make it more difficult to get adequately large sample
sizes for each group, which leads to the drawbacks we’ve already discussed. And finally,
multiple groups can make analysis of the results more difficult.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 79
intervention and variable control are significantly compromised or impossible. Most vari-
eties of non-experimental scientific study are observational studies, which involve col-
lecting and analyzing data without performing interventions or controlling extraneous
variables.
One example of an observational study is John Snow’s investigation into the source of
a cholera outbreak in London, England. Cholera epidemics ravaged London in the mid-
19th century, with notable outbreaks in 1831–1832 and again in 1849. Snow studied
these outbreaks, recording the details of dozens of cases. Because his research seemed to
indicate that cholera was transmitted from person to person, Snow wanted to find out
how it was transmitted. Previous reports suggested that cholera began with ‘an affec-
tion of the alimentary [digestive] canal’. From this, Snow hypothesized that cholera was
transmitted through the inadvertent ingesting of ‘morbid material’ from the vomit and
‘evacuations’ of cholera patients.
Then, on Thursday, August 31, 1854, cholera hit London’s Soho district. The out-
break appeared to be concentrated in certain areas. One such area was the corner of
Broad and Cambridge Streets, where more than 100 neighbors died in three days.
Three-quarters of the neighborhood residents fled within a week, but hundreds more
died nonetheless. Snow reported that ‘within 250 yards of the spot where Cambridge
Street joins Broad Street, there were upwards of 500 fatal attacks of cholera in 10 days’
(Snow, 1855).
At this intersection, there was a water pump from which locals could draw water.
Snow’s own observations of the pumped water led him to note that it looked abnormal.
Given his prior reasoning about cholera transmission, Snow began to suspect that the
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
80 Experiments and Studies
pumped water contained ‘morbid material’. He learned that of the 89 cases of deceased
cholera victims, 61 were known to have consumed water from the Broad Street pump.
This was suggestive evidence. However, there was an apparent anomaly—that is, a
phenomenon that deviates from the expectations of a theory or hypothesis. One detail
didn’t fit the pattern suggested by Snow’s hypothesis: very near the Broad Street pump
was a brewery, but none of the more than 70 brewers had died from cholera. This was
puzzling.
Snow had the Broad Street pump handle disabled seven days after the outbreak began.
Even though the epidemic had already begun to fade, he was convinced of having reasoned
correctly from his detailed observations:
Whilst the presumed contamination of the water of the Broad Street pump with
the evacuations of cholera patients affords an exact explanation of the fearful out-
break of cholera in St. James’s parish, there is no other circumstance which offers
any explanation at all, whatever hypothesis of the nature and cause of the malady
be adopted.
(1855, p. 54)
In other words, Snow could think of nothing else that could account for the outbreak’s
features, other than the hypothesis of contaminated water from the Broad Street pump.
Snow was right. It was later discovered that the well serving the Broad Street pump
had been dug only a few feet away from an old cesspit, which had begun to leak fecal
bacteria. The lack of cholera deaths among brewers turned out to be further evidence in
favor of Snow’s inference; the brewers only drank their own beer, which used water from
their own well, water that was sterilized in the beer-brewing process.
In this study, Snow did not perform an intervention, control variables, and study the
results. What he did was assemble a system of detailed observations and reason his way
to the one hypothesis that best explained those observations.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 81
FIGURE 2.9 Phineas Gage posing with the rod that passed through his skull
front of his head, ultimately landing about 25 meters away. The rod destroyed much of
his brain’s left frontal lobe, but Gage survived (Harlow, 1848).
In 1868, Dr. John Harlow, one of the physicians attending Gage, reported on the
patient’s mental condition after this accident. He described Gage as ‘fitful, irreverent,
indulging at times in the grossest profanity’, ‘manifesting but little deference for his fel-
lows’, and ‘at times pertinaciously obstinate’. He claimed that this was a radical change
for Gage after the accident, ‘so decidedly that his friends and acquaintances said he was
no longer Gage’ (1868, p. 277). Overall, the damage seems to have resulted in a major
degradation of, among other things, Gage’s social skills.
Since the 19th century, neurologists, neuropsychologists, and cognitive neuroscientists
have studied the case of Phineas Gage to understand the role of the frontal cortex in
Copyright © 2018. Taylor & Francis Group. All rights reserved.
social behavior. But it has been difficult to make precise inferences from this case, since
the immediate damage to Gage’s frontal cortex was so extensive, with surgical repairs
and subsequent infections complicating matters further. Another complicating factor is,
of course, that there is just one instance of Gage’s injury; a single case study creates no
opportunity for variable control or the observation of how different instances play out.
For these reasons, although case studies can provide a rich body of qualitative infor-
mation, they have limited internal and external validity. A case study’s internal validity
is limited by the lack of control over extraneous and confounding variables. Case studies
are also particularly vulnerable to bias due to the evaluation of qualitative data and no
blinding. And because the research focuses on only one individual, event, or group, results
can be difficult to replicate and to generalize.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
82 Experiments and Studies
Every now and again, nature yields a case that can play the role of an experiment. These
so-called natural experiments occur when an intervention on an independent variable
occurs naturally in real life without any experimenters doing anything. This very thing
happened in the case of Phineas Gage and also in other famous cases from the history of
neuropsychology, like the case of Louis Leborgne.
When he was about 30 years old, Louis Leborgne lost the ability to speak. He could
utter only a single syllable, tan, which he usually repeated twice in succession, giving
rise to his nickname ‘Tan Tan’. Apart from his inability to speak, Leborgne exhibited no
symptoms of physical or psychological trauma. He could understand other people, and
his other mental functions were apparently intact. After Leborgne died at the age of 51
in a hospital in Paris in 1861, the French physician Paul Broca performed an autopsy,
and found that Leborgne had a lesion in the frontal lobe of the left cerebral hemisphere
(which later came to be known as ‘Broca’s area’). This case is a kind of natural interven-
tion. The variable of interest, brain region x, was not deliberately manipulated, but there
was no evidence of any confounding variables associated with that manipulation. Broca
used this case to identify a brain region important for the articulation of speech; injure
Broca’s area, and an inability to produce speech—that is, Broca’s aphasia—would ensue.
Leborgne just happened to suffer the very kind of brain damage that could make clear
the function of that area of the brain.
Sometimes, even groups of individuals just happen to get sorted—naturally and
without any scientific intervention—into something approximating experimental and
control groups. Some natural or historical process separates them out, such that one
group but not the other can be construed as receiving an experimental treatment or
condition. The Indian councils and Harvard Six Cities studies discussed earlier are
examples of natural experiments. Their conditions approximated experiments well
enough that we described them as such, but really the experimenters were not in the
position to intervene.
Another example of a natural experiment on experimental and control groups occurred
with the separation of the Korean territory and population into two sovereign nations.
When the Korean War ended in 1953, the peninsula was partitioned in half. Many aspects
of the resulting two nations—South and North Korea—have remained similar. For exam-
ple, both nations have a shared history, and they have similar geographies, climates, lan-
guages, and cuisines. But they differed in one main respect: political regime. North Korea
adopted single-party state socialism, headed by a totalitarian military dictatorship, whereas
South Korea eventually became a multi-party liberal democracy.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
The separation of the Korean population into two groups is often described as a large-
scale natural experiment, in so far as the political regime (independent variable) seems
to be related to many observable differences between the two nations. These differences
include changes in economy, infrastructure, religion, education, and health. By 2010, the
difference in infant mortality, an indicator of population health, was striking: 3.8 deaths
per 1,000 births in South Korea but 27.4 deaths per 1,000 births in North Korea. By
2011, life expectancy in South Korea was 77.5 years for men and 84.4 for women but
only 65.1 years for men and 71.9 for women in North Korea (Khang, 2013). The differ-
ences are even visible from space: the per capita power consumption in the two countries
differs greatly (South Korea at more than 10,000 kilowatt hours, North Korea at less than
750 kilowatt hours).
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 83
Longitudinal research is another approach that tracks subjects over time. In a longi-
tudinal study, the same subjects are measured repeatedly over a period of time, some-
times many years, allowing the researchers to track subjects’ change. A benefit of such
diachronic studies is that they can reveal changes over time in the characteristics of a
group of subjects. The Early Childhood Longitudinal Study started in the late 1990s and
followed 20,000 American children, examining their development, performance at school,
and early school experience. Researchers also conducted extensive interviews with their
families. This study provides a lot of information about American children’s development
and family life. Analyzing this longitudinal data, the economists Steven Levitt and Stephen
Dubner (2005) showed that many things that parents do to make their kids ‘smarter’
do not seem to actually help children do well on tests. Reading to kids every day, for
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
84 Experiments and Studies
example, does not relate to higher test scores. Higher test scores are strongly related to
being born to a mother over 30, but not to a mother taking time off to raise the child.
In a cross-sectional study, different subjects are measured at a single time in order to
get a sense for the prevalence of some trait(s) in the population at large. For example, a
cross-sectional approach to studying children’s development and family life would involve
assessing the kinds of variables just discussed—family characteristics, reading exposure,
test scores, and so on—at once. One advantage of cross-sectional studies is that they enable
researchers to measure and compare several variables. They are also easier to accomplish,
as there is no need to track individuals over time. But the information they provide is
correspondingly more limited and perhaps less accurate. For example, instead of assess-
ing whether kids are read to every day based on subjects’ actual experiences, researchers
must rely on their memories of earlier years.
relevant to food preferences is extraordinarily valuable, as is visualizing the data about the
popularity of various foods.
The patterns and trends uncovered by analyzing big data can give insight into relation-
ships among variables of interest and can be used to make predictions. One well-publicized
example of ambitious research based on online data is the long-term analysis of user data
from the online dating website OKCupid (Rudder, 2014). But it can be difficult to assess
big data research, and some are concerned that it’s taken more seriously than it should be.
In 2008, researchers from Google claimed that they could immediately predict what regions
experienced flu outbreaks based simply on people’s online searches. The idea was that
when people are sick with the flu, they often search for flu-related information on Google.
Unfortunately, this idea wasn’t borne out. Google Flu Trends made very inaccurate predic-
tions, significantly overestimating flu outbreaks, and was shut down (Lazer et al., 2014).
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 85
Perhaps the biggest challenge facing big data techniques is their opacity. The algorithms
used to sample, filter, and order data are often unknown to outside researchers, and the
people who create the data in the first place are generally unknown to even the research-
ers performing the investigation. This makes it difficult to assess study procedures, the
significance of the data, and the possibility of confounding variables. Another challenge
with big data techniques regards population validity (see Section 2.2). Many people in
the world don’t use any social media, so those who do may not be representative of the
broader population, and more nuanced versions of this problem exist for any particular
form of online data. There are issues with privacy too. Online data are often in the public
domain, but big data research publicizes data and reveals trends that the people respon-
sible for the data may not be comfortable with. The publication of OKCupid user data
was an instance of this issue widely discussed in the popular press.
These challenges do not erase the scientific value of big data though. And the analysis
of data can even help us better understand how science works. For example, in the field
of library and information science, bibliometrics is used to understand the dissemination
and production of literary work by analyzing big data sets of written publications. This
approach is also directed to scientific publications. Bibliometric methods, including the
analysis of networks of citations in published work, can be used to investigate the level
of productivity of a certain field of research, trends in the topics of scientific research,
and even the social dynamics underlying scientific practice. The number of citations of
a published article is an index of recognition, which is one of the primary rewards for
scientists. So, citation rates and patterns can be used to quantify scientific impact and to
predict what factors might affect the future course of science.
computer simulations can play a role analogous to experiments. Computer programs are
developed that use algorithms to mimic the behavior of a real-world system. For example,
computer simulations of the Earth’s climate represent the dynamic interactions of solar
energy, chemicals in the atmosphere, oceans, landmasses, ice, and other factors. Such simu-
lations can then be studied to yield insight into real phenomena such as anthropogenic
climate change. Interventions can be performed in a simulation of the climate system
that would be undersirable or impossible to actually perform in Earth’s climate system.
For example, climate scientists might investigate what a specific increase of the amount
of carbon in the atmosphere would do to the rate of glacier melt.
Another extension of the concept of intervention is to our rich imaginations. Thought
experiments are devices of the imagination that scientists sometimes use to learn about
reality. Thought experiments involve an imagined intervention on a system. In the right
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
86 Experiments and Studies
conditions, these can be used to test a hypothesis, to show that nature does not conform
to one’s previously held expectations, and to suggest ways in which expectations can
be revised. Just like experiments in a lab or in the field, thought experiments may be
criticized because their setup is faulty or because scientists draw unjustified conclusions
from them.
Galileo used many thought experiments in his investigations of physics and astronomy.
In one instance, he wished to investigate an idea of Aristotelian physics that objects with
different weights fall at different speeds. Galileo asked his readers to assume, as Aristotle
did, that heavier objects fall faster than lighter objects. He then imagined two objects, one
light and one heavy, connected to each other by a string and dropped from the top of a
tower. If Aristotle’s assumption was correct, then the string would pull taut as the heavier
object falls faster than the light object. But, Galileo reasoned, both objects together are
heavier than the heavy object. So, for Aristotle, the two objects together should actually
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
Experiments and Studies 87
fall faster than either object alone. These objects cannot simultaneously fall both faster and
slower, so the Aristotelian idea that was the starting point for this reasoning process could
not be right. Galileo’s thought experiment provided a refutation of the Aristotelian theory
of motion, suggesting that the speed of a falling body is not dependent on its weight.
Newton also used thought experiments to help show how his theory of gravitation
worked. He had readers imagine a cannon at the top of an extremely tall mountain, and
then asked what would happen if somebody loaded the cannon with gunpowder and fired.
Plausibly, Newton reasoned, the cannonball would follow a curve, falling faster and faster
because of gravity’s force, and would hit the Earth at some distance from the mountain.
But what if one used more gunpowder? The velocity of the cannonball would be greater,
and it would travel farther before falling back to Earth following a curve trajectory.
But if one used vastly more gunpowder, then, Newton suggested, the cannonball would
travel so fast that it will fall all the way around the Earth, never landing. The cannonball
would be in orbit, going around again and again just like the Moon! This is pictured in
Figure 2.10. If the cannonball went even faster, then it would escape Earth’s gravity,
heading out in space. Newton’s theory of gravitation provided the resources to arrive at
these same conclusions through mathematical calculations. Imagining this situation gives
a satisfying, intuitive sense for how an object like the Moon can stay in orbit by remain-
ing in constant free fall.
EXERCISES
2.22 Recall the ideal experiment you described in Exercise 2.14 and the three challenges
to that experiment you identified. Describe an alternative experiment that is more
practical but that still can successfully test your hypothesis.
2.23 Describe a different approach to the experiment you described in 2.22. Then list
the advantages and disadvantages of each approach, with an eye to the trade-offs
among features of experiments described in this section.
2.24 Recall, from section 2.1, the experiment when participants divide $10, with one
person offering some division and the other only being able to accept or reject
the offer. (Rejecting the offer results in neither participant getting any money.) The
finding was that people offered fairer divisions and also rejected divisions deemed
unfair even though this resulted in no money won. The researchers concluded that, in
general, people seem to be willing to sacrifice self-interest to promote fairness. In this
Copyright © 2018. Taylor & Francis Group. All rights reserved.
experiment, participants haven’t previously interacted with one another, and they
don’t interact with the same participant more than once. Let’s assume participants
are randomly selected and randomly assigned to roles.
a. Define internal validity, and assess this experiment’s internal validity, justifying
your assessment.
b. Define external validity and name and define each of its two components.
Assess this experiment’s external validity, justifying your assessment.
c. What was the researcher’s conclusion from this study? Does the experiment’s inter-
nal validity or external validity cast doubt on this conclusion? Why or why not?
2.25 What are the main advantages and disadvantages of a laboratory experiment?
How about a field experiment?
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
88 Experiments and Studies
2.26 Decide whether each of the following statements is true or false. For any false state-
ment, write a new sentence, changing the original sentence so it is true.
a. A completely randomized design offers no control for confounding variables.
b. Randomization controls for the placebo effect.
c. A cohort is a group of subjects with some defining characteristic in common.
d. Longitudinal studies involve repeated observations of the same variables over
long periods of time.
e. Natural experiments occur when experimenters intervene on an independent
variable in the real life setting of their subjects.
f. In observational studies, the independent variable is under the control of the
researcher.
2.27 What are three reasons experiments sometimes cannot be performed? For each
reason, say whether it absolutely prohibits experiment or experimentation might be
possible at another time or in another way.
2.28 Briefly describe case studies, cohort studies, prospective studies, and longitudinal
studies. What features do these have in common? How do they differ?
FURTHER READING
For an introduction to the philosophy of experiments with a focus on the natural sciences,
see Hacking, I. (1983). Representing and intervening: Introductory topics in the philoso-
phy of natural science. Cambridge: Cambridge University Press.
For a historical perspective on experiment with a focus on the debate between Rob-
ert Boyle and Thomas Hobbes over Boyle’s air-pump experiments in the 1660s, see
Shapin, S., & Schaffer, S. (1985). Leviathan and the air-pump: Hobbes, Boyle, and the
experimental Life. Princeton: Princeton University Press.
For more on the experimental approach in the social sciences with a focus on economics,
see Guala, F. (2005). The methodology of experimental economics. Cambridge: Cam-
bridge University Press.
For a case study on the role of instruments and measurements in experiments and stud-
ies, see Chang, H. (2004). Inventing temperature: Measurement and scientific progress.
Oxford: Oxford University Press.
For an account of the scientific method in physics and an early statement of the problem
of underdetermination, see Duhem, P. (1954/1991). The aim and structure of physical
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:11.
CHAPTER 3
19 kilometers, away in real life). This is possible because, as large as it is, the Bay Model
is 1,000 times smaller than the actual San Francisco Bay, a large body of salty ocean
water surrounded by a large urban population living in a variety of geological terrains
and climates.
The Bay Model is a hydraulic model; it can be filled with water, just as the real San
Francisco Bay is. Pumping systems move the hundreds of thousands of gallons (1 gallon =
3.785 liters) of water in the model and do so in a way that mimics the tides and currents
of the real bay. This works in part because the model is three-dimensional and propor-
tional, so the different parts of the bay and river delta in the model are the right amount
lower than sea level, and the surrounding land is the right amount above sea level. The
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
90 Models and Modeling
Bay Model also includes many other features that affect water flow, such as rivers, canals
in the delta, wharfs, bridges, and breakwaters.
The Bay Model is not just a toy model, however. It’s a scientific model, and this has
some important implications. Scientific models are constructed and investigated in order
to learn, not just about the model itself, but also about phenomena in the real world.
This particular model is a terrific tool for learning about the San Francisco Bay and how
human activities can affect it. Teachers, students, and scientists use it to study geography,
ecology, human and natural history, and hydrodynamics. It has been used to help answer
questions about how dredging new shipping channels would affect the San Joaquin River
Delta, about how mining during the California Gold Rush changed the rivers, and about
Copyright © 2018. Taylor & Francis Group. All rights reserved.
what would happen if the system of dikes and levees in the delta failed.
Why Models?
Chapter 2 discussed the role of experiments and non-experimental studies in science,
considering especially how these are used to generate data to compare with expectations,
providing evidence for or against hypotheses. In this chapter, we will survey another
important feature of science that relates to experimentation in interesting ways: the
use of models. To uncover the roles that models play in science and to see how the Bay
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 91
Model in particular works, let’s look back at why that model was originally constructed.
(See Weisberg, 2013, on this case study and an overview of the use of models in science.)
John Reber moved from Ohio to California in 1907 and set up as an amateur play-
wright, dramatist, and theatrical producer in the 1920s and 1930s. Because of his work,
he enjoyed social connections with numerous businessmen and politicians. In the 1940s,
Reber became dismayed that the transcontinental railroad terminated in Oakland rather
than San Francisco, and came to believe that the bay that isolated San Francisco from
the rest of California and the United States interfered with industry. He saw that large
body of water as a ‘geographic mistake’ to be corrected.
Reber’s career was in entertainment, and he had no expertise in science or engineer-
ing. Nonetheless, Reber intrepidly proposed a grand plan to re-engineer, and then exploit,
natural features of the bay that he thought would enable more efficient use of it. He
suggested filling some parts of the bay to create additional land for things like airports and
factories and to establish two lakes to store freshwater supplied by the rivers that empty
into the bay. As freshwater has always been a limited resource in the San Francisco Bay
area, it could be valuable to repurpose the bay for potable drinking water and irrigation.
Reber’s plan was taken seriously, and the US Army Corps of Engineers decided to
test it out. An immediate problem, though, was that the corps couldn’t effectively test
out Reber’s plan in the actual bay without implementing the plan. What to do? How
could they consider the effects of the plan without going ahead and carrying it out? Such
circumstances highlight one way in which scientific models are particularly useful. When
performing an intervention on a system of interest isn’t possible, practical, or otherwise
desirable, a model of the system can be used instead.
Consider another example of a circumstance when modeling is useful. Suppose you
are playing chess against a computer and are considering moving, say, your rook. How will
that move affect the next three moves in the game? The easiest way to find out would
just be to move the rook and see what happens. But the easy thing to do isn’t always the
best thing to do. Without thinking through the consequences first, such a move might
result in a quick defeat. It would be helpful to have a second chessboard set up to be
just like the game that you’re actually playing but ‘offline’—in other words, it isn’t in
the midst of an actual game. That way, you could try out various moves and consider
moves that might be made in response. Doing so would help you anticipate how the
actual game might proceed without suffering any bad consequences in the process. The
offline chessboard might be chessboard you’ve set up beside you, or it could just be a
chessboard you imagine, or it could be another game on a computer but not in active
Copyright © 2018. Taylor & Francis Group. All rights reserved.
play. Regardless, if the second chessboard is used in this way, it is a model of the actual
chess game. You’ve set it up to have the pieces in the same places, and you can then try
to figure out what your opponent might do were you to move your rook.
This is just like the decision of how to study Reber’s plan for the San Francisco Bay. The
Army Corps of Engineers wasn’t prepared to radically alter the bay and the surrounding
river delta before knowing what the results would be. They recognized that such changes
might have unintended negative consequences for the local water supply, wildlife, vegeta-
tion, agriculture, and human population. So, like a second chessboard used to explore
possible consequences of moves in a real game of chess, the Corps of Engineers built a
hydraulic model designed to be like the San Francisco Bay in some important respects.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 93
This enabled them to investigate the consequences of the changes Reber had proposed,
by this time known as ‘the Reber Plan’.
Once they were confident that their model was sufficiently similar to the real San
Francisco Bay in the important respects, scientists could make predictions about the
real bay based on what they saw happening in the Bay Model. The model could then
be manipulated—an intervention could be performed on it—to determine what would
happen in the real bay were the Reber Plan implemented. The scientists did exactly that.
They built scale models of the dams that would create the proposed lakes and landmasses,
and then they sat back to see what would happen.
It turned out that, when the Reber Plan was implemented in the Bay Model, its
unintended consequences were disastrous. The dams didn’t create lakes at all but instead
stagnant pools with poor water quality that wouldn’t support ecosystems and couldn’t be
used for drinking or irrigation. Altering the dam configuration in the model in an attempt
to solve that problem just created another problem: fast currents that again destroyed
ecosystems and made travel in the bay significantly more dangerous. When the Corps of
Engineers reported these findings, the Reber Plan was abandoned.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
94 Models and Modeling
Everything is similar to everything else in at least some regards, so any old similarity
won’t necessarily result in a good model. Rockets from the US Apollo space program were
white, cylindrical, rigid bodies, which were shaped much like a parsnip, but no one uses
an Apollo rocket as a scientific model of parsnips. Scientific models need to be similar to
their targets in relevant ways and dissimilar in irrelevant ways, at least for the most part.
This is why the Bay Model replicated tides and currents and other important features of
the San Francisco Bay, but not the number of sailboats in the bay.
So, the features of a model that scientists construct should be relevantly similar to the
features of the target system they think are important. This is what makes it possible to
get accurate information about a target from studying a model. Things are a bit more
complicated, though, since relevant similarity can be achieved in different ways. In the
example of the second offline chessboard in which you try out chess moves, it wouldn’t
matter too much if you replaced the chess pieces with colored paperclips or berries of
various sizes. You could even just draw your own chessboard on a napkin. The dissimilari-
ties between these approaches and the target—the actual chess game—don’t matter, so
long as they don’t interfere with the model’s ability to represent the intended features
of the chess game. Here’s a difference that would matter: using different-sized piles of
sand on a chessboard to represent chess pieces isn’t a good idea, since these piles can’t
be easily moved like chess pieces.
Intuitively, one way to achieve relevant similarity is to construct a model as similar
as possible to the target system. But as it turns out, this is usually a bad idea. Too much
Copyright © 2018. Taylor & Francis Group. All rights reserved.
similarity between a target and a model can actually be counterproductive. Had the
Corps of Engineers tried to build a model exactly like the San Francisco Bay in all rel-
evant respects, it would have been too large for them to have anywhere to put it, and it
would have changed so slowly they would have had to wait years to find out about the
consequences of the Reber Plan. Consider constructing a map of your hometown that
is exactly like it in every respect; it is three-dimensional, the same size as the real town,
contains a full representation of every building, shrub, alley, fire hydrant, stray cat, and
so on. Even if this could be done, why even bother with the model? You might as well
just investigate the town itself!
So, scientific models need not—indeed, should not—be similar to their targets in
every respect or even in most respects. Like maps, models are incomplete and usually
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 95
simpler than their targets. They’re designed to represent selected features of the target,
the features about which scientists want to learn. Their lack of completeness is part of
what makes them useful.
But what’s the right amount of similarity then? This is an important question that
doesn’t have a general answer. All scientists who work with models regularly consider the
extent to which some particular model should be like its target and the extent to which it
should be different. The Bay Model’s different spatial and temporal scales are two features
that made it useful for learning about the real San Francisco Bay and Delta. The model is
much smaller than the real bay, with much faster tidal cycles, which allowed the scientists
to observe what would happen with a spatially distributed, long-lasting sequence of events
in a short time and without having to leave the warehouse of the Bay Model.
Some other features of the real bay that were changed or ignored either didn’t mat-
ter or would have been too difficult to accurately incorporate. For instance, the model
doesn’t have any trees or buildings, as those were unimportant for its purpose. And being
inside a big warehouse is a difference with a practical benefit: the model isn’t exposed
to changing weather like the real bay is. The model also doesn’t incorporate the oceanic
wind currents that affect the bay; it’s tricky to see how those could be replicated and
whether the outcome of doing so would be worth the effort.
The scientists thus decided which features of the Bay Model should be similar to
the real bay and which could, or should, be different. They also had to decide how to
represent changing features of the San Francisco Bay. For example, they had to decide
whether the model should be like the actual bay is during dry seasons or wet seasons or
some combination of these. They had to get all of these features right, or right enough,
for the model to give them trustworthy information about how the bay would change
if the Reber Plan were carried out. As it turned out, the model they developed was suf-
ficiently similar to the real bay not only to serve this purpose but for it to eventually
be put to other uses as well. For example, the Bay Model was also used to study how a
later plan of deepening water channels would affect water quality.
One special type of similarity is called exemplification. For a model to exemplify some
group of target systems, it must itself be one of the target systems. Such a model is called
an exemplar. Researchers can use an exemplar to represent the broader class of targets that
includes the exemplar and can thus draw conclusions about the whole class of targets by
investigating the exemplar. For example, the fruit fly (which goes by the scientific name
Drosophila melanogaster) is a common model organism in genetics and developmental
biology. Just like Mendel used pea plants to understand how certain characteristics are
Copyright © 2018. Taylor & Francis Group. All rights reserved.
passed from one generation to the next, biologists have used the fruit fly to learn how
genes influence the development of embryos from single cells to mature organisms.
Fruit flies are small and reproduce quickly, and large populations are easily maintained
in labs. In addition to fruit flies being easy to keep and work with, scientists know about
their entire genome and so can intervene on their genes in precise ways. These interven-
tions allow scientists to identify specific sections of DNA within the genome that carry
information needed to produce specific molecules like proteins, which in turn influence
characteristics like fruit-fly size and color. As a model organism, the fruit fly is used to
reason about other organisms, such as the biological mechanisms of hereditary disease
and the regularities in the inheritance of physical characteristics observed by Gregor
Mendel. Scientists might study one population of fruit flies to learn about all fruit
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
96 Models and Modeling
FIGURE 3.3 (a) Drosophila melanogaster; (b) The four chromosomes of Drosophila
Image from droso4schools.wordpress.com
flies or to learn about all insects or even about all forms of life, including human life.
The last, broadest range of target systems is surprisingly common in genetics research.
Like all models, exemplars are both similar to and different from the target systems
they represent. For example, fruit flies have genes organized into chromosomes, as do all
other living organisms. This is an important similarity for their use as a genetic model.
But fruit flies have only four chromosomes, so they are much simpler genetically than
many other organisms. Further, because they breed very quickly, they have much shorter
generations than many organisms. These features make them very good models to use in
labs, but they also make them somewhat unrepresentative of all other organisms out there.
To sum up, target systems are real-world phenomena selected for study, models are
constructed to represent target systems for particular purposes, and models are similar to
but also different from their targets in various ways. Most similarities and differences are
carefully chosen, not only so the model can be developed and studied, but also—impor-
tantly—so it can provide accurate information about the target system. Studying a model
can lead to knowledge about a target system insofar as the model can stand in for that
Copyright © 2018. Taylor & Francis Group. All rights reserved.
system.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 97
a general pattern of how models tend to be constructed and used in science. That pattern
has three basic steps: (1) specification of the target system(s), (2) construction of the model,
and (3) analysis of the model. Let’s consider each of these steps, starting with the first.
At first glance, it might seem easy to specify the target system; this basically just
requires scientists to decide what it is they want to find out about using a model. Do they
want to learn about the effects of proposed changes to the San Francisco Bay? Examine
the genetic influences on some trait? Or, say, learn more about how the number of preda-
tors influences other animal populations?
But like everything else in science, things aren’t as simple as they at first seem. An
archer cannot accurately hit a target with her arrow if she doesn’t know where the tar-
get is or what it looks like. Similarly, scientists need to know quite a bit about a target
system before they can construct a model of it. This is a version of an age-old problem
called the paradox of inquiry: if you don’t already know what you’re looking for, how can
you inquire about it? The central reason to develop a model in the first place is to gain
knowledge about the target, but in order to learn about a target using a model, scientists
must already know about that target.
Scientists may initially know little to nothing about the target systems they want to
investigate—especially when those systems are very distant in space or time, or exces-
sively large or small. Yet, without some knowledge about a target, scientists can’t evaluate
whether the model is similar enough to the target, and in the right ways, to accurately
represent it. So, at the beginning of the modeling process, scientists need to be able to
conceive of what a model should be a model of and what they want to learn from the
model. This can be preliminary and partial, just enough to get the process going. For
the Bay Model, for example, the task was to evaluate the feasibility and any unforeseen
consequences of the Reber Plan for damming up the bay. Scientists didn’t know what
in particular they’d be evaluating—for example, whether strong currents would result or
excessive evaporation would occur.
In order to later construct a model that relates to the target in the right ways, scientists
must also possess more specific information about at least some aspects of the target
system. This point actually suggests two requirements: scientists need to know which
features of the target system are important, and they need to have more specific informa-
tion about those features. For example, when planning the Bay Model, scientists had to
guess that the tides and currents might be important features. And then, in order to be
able to calibrate the model to have the same tides and currents as the real San Francisco
Bay, the engineers needed access to a lot of information about these features of the real bay.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
To get the needed data, 80 people took measurements at different locations throughout
the 1,424 square kilometer (550 square mile) bay every 30 minutes throughout a full
tidal cycle of 48 hours. They recorded tide velocity and direction, changes in the water’s
salinity (salt content), and the concentration of sediment. All of these data were needed
in order to even decide what features a model of the bay should have.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
98 Models and Modeling
dx/dt = αx − βxy
dy/dt = δxy − γy
One variable, x, stands for the number of prey animals (for example, seals), and another
variable, y, stands for the number of predator animals (in this case, polar bears). In this
Copyright © 2018. Taylor & Francis Group. All rights reserved.
model, both x and y represent independent variables in the target system. (Independent
and dependent variables were discussed in Chapter 2.) These equations can be used to
calculate how predator and prey population numbers change over time (represented in
the model as the derivatives dx/dt and dy/dt) from the combination of those population
numbers and a few other parameters. A parameter is a quantity whose value can change
in different applications of a mathematical equation but that only has a single value in
any one application of the equation. In this equation, α, β, δ, and γ are parameters. These
help the model take into account the prey population’s rate of growth without predation,
the rate at which prey encounter predators, the predator population’s rate of growth, and
the loss of predators by either death or emigration.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 99
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
100 Models and Modeling
Because models can be similar to target systems in different ways, a single target
is sometimes represented by multiple models. This can be useful when the real-world
phenomenon is so complex that no single model can provide scientists with all of the
desired information. The weather is a good example of this. Any meteorological model
can only capture a few of the factors needed to generate reliable predictions about the
weather. Some meteorological models may invoke humidity, temperature, and dew point
to describe and predict certain basic weather patterns like precipitation. Other models
may invoke more specialized parameters, such as central pressure deficit, along with more
basic ones, such as wind speed and direction, to describe and predict a particular phenom-
enon like hurricanes. Sometimes meteorologists aim to make more reliable predictions
by carefully cobbling together the results of different models of a given weather system.
It’s also possible for a single model to have more than one target system. A model
might be designed to represent a repetitive activity or a type of event that occurs in many
different places. The Lotka-Volterra model is like that; it is designed to capture something
important about seal and polar bear populations, wildebeest and lion populations, and
many more. And the same meteorological models can be used to represent a number of
different hurricanes, as well as typhoons and cyclones.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 101
robustness analysis. This is one way of determining which models are trustworthy for
prediction and explanation—especially when their targets are highly complex systems
like the climate or predator-prey interactions. Robustness analysis begins by generating
multiple models of a target. For example, climatologists develop several distinct models for
predicting changes in the temperature in a specific region. If multiple meteorological mod-
els with different variables, parameters, and assumptions all predict an upcoming increase
of temperature in the region, this prediction is robust (and should be taken particularly
seriously). On the basis of similar predictions from different models, scientists may be able
to find the common features of the models that give rise to the robust prediction. They
can then examine how this core structure might relate to stable relationships involved
in the complex phenomenon of interest. In this way, climatologists and other scientists
studying complex systems can learn whether and to what degree the predictions of a
model should be taken seriously.
EXERCISES
3.1 Define model and target system in your own words, and say how the two relate. For
a modeling example from this section, say what the model is, what the target system
is, how they are related, and what the model is useful for.
3.2 One very familiar kind of scientific model is a mechanical model of the solar system,
called an orrery. These models are used to represent the relative positions and move-
ments of the Sun, planets, and moons. (If you have never heard of an orrery model, then
do some research on the internet or elsewhere to get a better idea of what they are.)
a. List as many similarities and as many differences between this model and tar-
get system as you can. You should have at least six similarities and at least six
differences.
b. Order the similarities from the most important to least, and then do the same
with the differences.
c. Describe the significance of each of the two similarities and two differences that
seem to be the most important. For each, say why you think the model-builders
chose to make the model similar to or different from the target system in that way.
3.3 State in your own words the main goal of each of the three steps of modeling, as
described in this section. Then, describe how each step may be involved for some
use of an orrery (a mechanical model of the solar system).
Copyright © 2018. Taylor & Francis Group. All rights reserved.
3.4 Suppose that you want to model the interactions between predators and prey, for
example, hawks (the predator) and mice (the prey). Make a list of at least five fea-
tures of that target system you think your model should take into account. Then, for
each feature, say how it is similar or different in other predator-prey systems. For any
features that are different, can you think of a related feature that would be similar
between the systems?
3.5 What features of modeling make it a useful approach when an experiment is not
possible and why? What features of modeling make it a useful approach when a
phenomenon of interest is highly complex and why?
3.6 Chapter 2 outlined the perfectly controlled experiment, which some refer to as the
‘gold standard’ for science. However, the National Weather Service usually opts
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
102 Models and Modeling
for modeling when studying the weather and making weather forecasts. Does this
suggest the Weather Service’s results are less scientific, in so far as they don’t aim
for this ideal? Why or why not?
3.7 The National Weather Service uses lots of climate models. Each of the models
(1) represents the climate system in a different way and (2) is inaccurate in some
way. Explain each of these features with reference to information from this section.
Why do you think the National Weather Service does not rely on just one single
climate model in making its predictions?
3.8 Can you think of another complex target system that, like the weather, may require
multiple models to investigate? Name two such systems. Then, explain what makes
those systems so complex. Why do you think scientists may benefit from constructing
multiple models of these systems?
3.9 Sketch how experiments involve the three main steps of generating expectations, per-
forming an intervention, and then analyzing the resulting data. State the three main
steps in modeling, and describe the similarities between those and the three main steps
in experimenting. Then, describe how modeling and experimenting are different.
3.10 Find two different maps of your city or town, on the internet or on paper.
a. For each map, assess its (i) completeness (does it represent all/most/many or
just a few features of the city/town? Which features?), (ii) accuracy (does it pro-
vide an accurate representation of the city/town? How accurate? What does
it get wrong?), and (iii) purpose (what does it seem like people use the map
for? How is that purpose served by the attributes you identified with respect to
completeness and accuracy?).
b. In light of your analysis, say whether one of these maps is better than the other.
If so, in what way(s) is it better? If not, why not?
mathematical
• Discuss how each of the five types of models vary along the concrete/abstract
dimension
Types of Models
As we have seen, scientific models aren’t always like toy models of airplanes or bays
filled with water. Indeed, the range of things that count as scientific models is extremely
broad. Scientific models can be concrete physical objects, such as the Bay Model or
Watson and Crick’s double helix model of DNA, which is made of metal plates. They
can also be abstract mathematical objects, like the Lotka-Volterra model of predator/prey
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 103
Models of Data
A model of data, or data model, is a regimented representation of some data set, often
with the aim of highlighting whether or not the data count as evidence for a given
hypothesis. The concept of data was encountered in Chapter 2, in the discussion of
experimental and observational studies. Recall that data are any public records produced
by observation, measurement, or experiment. Video recordings of capuchin monkey
behavior, observations of the positions of planets in the night sky, readings of a ther-
mometer, participants’ answers on a questionnaire in a psychological experiment, and log
locations with GPS on phones are all examples of data. Such recordings are raw data,
which must be processed before they are useful to scientists. For instance, observations
of the positions of planets in the night sky need to be corrected for measurement errors,
organized by time and day, arranged into some scale, and put into a visual format such
as a graph or table. Only then can astronomers use those data to gain knowledge about
the behavior of the planets. This process of data correction, organization, and visualiza-
tion results in a model of the data.
Data models are a rather different kind of model from the models discussed so far.
They do fall under our general definition of a model, since they are representations that
Copyright © 2018. Taylor & Francis Group. All rights reserved.
are investigated in place of what they represent. But what is represented are not phe-
nomena—what we’ve called target systems—but data. Data models thus play a wholly
different role in scientific reasoning than models of phenomena.
The first step in constructing a data model is to eliminate presumed errors from the
data set. Consider measurements of the positions of a certain planet in the sky—say,
Mercury, over a period of days. Those measurements will be influenced by more than
Mercury’s position. They will also be affected by some combination of human mistakes,
flaws and limitations of instruments, like the telescope, and inaccuracies due to changing
atmospheric conditions. Scientists can try to identify and correct these errors in various
ways. They might calibrate the telescope or record the atmospheric conditions along
with their measurement of Mercury’s position. This additional information can guide the
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
104 Models and Modeling
decision of which data are questionable and should be eliminated. This process is called
data cleansing.
Once erroneous data are removed from the data set, the next step is to represent the
clean data in a meaningful way. Data of Mercury’s position in the sky over a period of
days may initially be visualized as points on a chart. These points will probably be used as
the basis for a curve that represents Mercury’s progression in the sky. The points represent
the scientists’ measurements. The curve, in turn, represents the scientists’ best guess for
Mercury’s continuous path through the sky. This final representation is the data model.
We can generalize from this example to other data models. Of course, it’s not always
spatial position that’s being measured. There is, though, a common progression of (1) elimi-
nating errors, (2) displaying measurements in a meaningful way, and then (3) extrapolating
from those measurements to the expected data for measurements that weren’t actually
taken. This is what happens when scientists use points on a chart to draw a curve rep-
resenting Mercury’s position, even for times and days when data weren’t collected. As
we’ve suggested, this involves some amount of guesswork.
Indeed, how to extrapolate from measurements to create a data model is a compli-
cated enough task that it has its own name: the problem of curve fitting. To get an idea
of the problem, suppose that you have data for two variables—say, air pollution and life
expectancy—and you want to figure out the general mathematical relationship between
the two. That is, you want to learn how people’s life expectancy changes as a function
of the level of air pollution where they live. The mathematical equation capturing this
relationship will describe a curve that will ‘fit’ your observations. The basic problem of
curve fitting is that data, no matter how much you collect, are always consistent with
different curves.
Put in terms of underdetermination, which was introduced in Chapter 2, the data
underdetermine which equation captures the relationship between these two variables,
air pollution and life expectancy. See Figure 3.5. So, how should scientists decide which of
the equations defining a curve passing through their data captures the real relationship?
There is no easy answer.
Finding the curve that best fits all available data, no matter what, is seldom the best
approach. Sometimes, data models can fit the data too well; this is called overfitting a
model to the data. The problem with sticking too closely to the actual data is that those
data are never perfect. There might be outliers, or values that deviate from the norm for
one reason or another. There is also the possibility of noise, or influences on the data that
are incidental to the focus, such as confounding variables.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Scientists want their data models to be better than the actual data they’ve collected.
In the end, which model of data is the right one depends on several factors, including
the goals of the scientists, their background knowledge, and considerations of how easy
the data model is to use to make predictions.
Big data approaches, discussed at the end of Chapter 2, present significant data
modeling challenges. Big data sets provide science, public policy, and business with
an impressive resource for answering important questions. Data collected from social
media, for example, can be used to understand how often the public talks about politics,
sports, and sex; to make predictions about complex political and social events; and to
explain consumer behavior. But using big data to make predictions requires finding the
right models of the data. The difficulties we have briefly surveyed here are compounded
when modeling big data sets, as the conditions for and features of the data tend to be
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 105
less well understood. Chapters 5 and 6 elaborate on the statistical techniques scientists
employ to represent data and draw inferences from them.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Models of Phenomena
As we’ve already seen in this chapter, models of phenomena provide ways to learn about
a phenomenon indirectly by studying the model. This use of models is very different
from data models, both in model development and in the role the models play. Models
of phenomena have been the main focus in this chapter, so our focus here will be on the
contrast between data models and models of phenomena. Data models are used in experi-
ments and non-experimental studies, where the phenomena are investigated directly. In
contrast, models of phenomena are often used to indirectly investigate phenomena. In
order to do this, scientists have to first learn about the model itself. Then they have to
find a way to convert their knowledge about the model into knowledge about the phe-
nomenon being modeled.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
106 Models and Modeling
Building a model of a phenomenon is kind of like taking apart a toaster and putting
it back together again. A great way to learn about something is to try to build it, or
something that’s like it. Physical models might literally be built; other kinds of mod-
els, like equations and computer programs, are also built, only in a more metaphorical
way. Regardless, model construction should result in a model that represents the target
system(s). Scientists then manipulate and analyze the model to learn about the target
system(s). Just as the model represents the target system, manipulations of the model
represent manipulations of the system. Depending on the type of model, though, the
manipulations might be very different from what would happen in the actual target. And
then, so long as the model is similar in the right respects to its target system, scientists
can transform the knowledge they gain about the model’s behavior into knowledge of
the target system.
Recall how data models are better, more informative, than the data themselves.
Similarly, good models can be better for study than their targets. Consider a few ways in
which this is so. A physical model can provide a more quickly changing and simplified
version of a system. A mathematical model can enable precise predictions about a system
when its equations are solved. A computer model can be run again and again with differ-
ent conditions, simulating a range of possibilities. Differences between a model and the
phenomenon that is modeled are key to the value of model-based science, or learning
about the world indirectly through models.
Recall the discussion of how overfitting—that is, corresponding too closely to the
actual data—can hamper the value of a data model. Something similar is true for models
of phenomena. Scientists can go wrong by constructing a model that builds in too many
elements of the target system or is too similar to the target system. This could make
it so that the resulting model is only applicable in very narrow circumstances or too
difficult to study, either of which limits its usefulness. If instead a model is constructed
to incorporate only the most important, or most interesting, features of a phenomenon,
then it will be useful in lots of different ranges of circumstances. We see examples of
this in what follows.
Scale Models
To illustrate this use of models and the range of forms it can take, consider some
categories of models of phenomena. To begin, scale models are concrete physical
objects that serve as down-sized or enlarged representations of their target systems.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Architectural models of urban landscapes are a familiar example; these are widely
used in civil engineering. The Bay Model also belongs to this class, since it is a three-
dimensional physical object made of concrete slabs, copper tags, and water. The spa-
tial scale of the Bay model is 1:1000 (that is, 1 foot in the model represents 1,000
feet in the real world, where 1 foot = 0.3 meters) on the horizontal axis and 1:100
(that is, 1 foot in the model represents 100 feet in the real world) on the vertical
axis. Temporally, the Bay Model is also scaled; each 24-hour day is represented as
a 14.9-minute sequence, divided into 40 equal intervals of 22.36 seconds (that is,
one minute in the model represents 1 hour and 40 minutes in the real-world target
system, the San Francisco Bay).
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 107
FIGURE 3.6 James Watson and Francis Crick’s double helix model of DNA
While the Bay Model is a scaled-down representation, other scale models are
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
108 Models and Modeling
Analogical Models
Analogical models can be physical or abstract objects, depending on whether they rely on
physical or abstract analogies to represent their target systems. Scale models like the Bay
Model can be characterized as concrete analog models, as they share several physical prop-
erties with their targets. An example of an abstract analog model is the computer model
of the mind, which is based on formal similarities between computers and minds. Like
computers, the human mind is an information-processing system that can be described
in functional terms, without talking about its actual physical composition, or ‘hardware’.
Like computers, minds can be understood in terms of the operations they carry out in
order to solve certain tasks, or in terms of their ‘software’.
Here is another example of an analogical model, located somewhere between the Bay
Model and the computer model of the mind on the concrete-abstract spectrum. Another
hydraulic model, like the Bay Model, was built by William Phillips in 1949. But whereas
the Bay Model used water flow to represent a real body of water, Phillips’s model used
water flow to represent the British economy! This model is called the Phillips machine
or Monetary National Income Analogue Computer (MONIAC). The Phillips machine
was a set of plastic tanks, each representing some aspect of the economy, which were
connected by pipes and sluices and different valves. Dyed water, representing money, was
hydraulically pumped around the machine by an old airplane motor to simulate the ‘flow’
of money in an economy. An overhead tank, representing a treasury, could be drained so
that the water inside could flow to other economic sectors, like education, health care,
infrastructure and investment, savings, and so on. Water could be pumped back to the
‘treasury’ tank to represent taxation and state revenue, with pumping speeds adjusted to
simulate changes in tax rates. Exports and imports could also be simulated by adding or
draining water from the model.
The Phillips machine was a physical model, but it is not a scale model. (The British
economy isn’t itself operated hydraulically, of course.) Unlike the Bay Model, the
Phillips machine uses water flow as an analog to money flow. Changes in water level
and flow were analogous to changes in highly complex, abstract parameters of the
British economy. In its day, this actually was an amazingly accurate tool for learning
about how changes in different economic sectors affect others (Morgan & Boumans,
2004).
Relying on analogies is a particularly useful strategy in early stages of modeling, when
scientists may have little or no knowledge of the phenomenon they are interested in. This
enables scientists to focus on the salient features of a model and to let the discovery of
Copyright © 2018. Taylor & Francis Group. All rights reserved.
analogous features guide modeling approaches. For example, the similarity of the physical
arrangement of a spiral staircase to a DNA molecule was striking to Watson and Crick,
guiding their modeling efforts of DNA toward a double-helix structure. Watson, in his
memoir, says, ‘[E]very helical staircase I saw that weekend in Oxford made me more
confident that other biological structures would also have helical symmetry’ (1968, p.
77). Spiral staircases were useful analogous models for DNA, stepping stones toward the
scale model Watson and Crick ended up developing.
As knowledge about the target develops, analogical models may give way to models
less obviously related to the target systems they represent. As we have mentioned, the
Lotka-Volterra model is a set of mathematical equations, which is hardly analogous to
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 109
populations of predators and prey. But knowledge about those target systems was used
to develop mathematical equations that effectively—if indirectly—represent key relation-
ships among the populations in question.
Mechanistic Models
Mechanistic models are representations of mechanisms. Mechanisms are organized sys-
tems consisting of component parts and component operations that are organized spatially
and temporally, so as to causally produce a phenomenon. Certain features of cells (like
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
110 Models and Modeling
outside cell
K+ (potassium ion)
cell membrane
ATP
ATP
inside cell
neurons), organs (like brains), and whole organisms can be seen as mechanisms. Examples
of phenomena produced by mechanisms include blood circulation, protein synthesis, and
cellular respiration. Mechanistic models represent the causal activities of organized com-
ponent parts that produce some such phenomenon. By doing so, they can help illuminate
how the target phenomenon works and, in particular, how it depends on the orchestrated
functioning of the mechanism that produces it.
Mechanistic models can be physical structures representing concrete target systems,
such as an orrery. Other mechanistic models are physical structures representing more
abstract phenomena, such as the MONIAC Phillips machine model of the British econ-
omy. But most mechanistic models are schematic representations of abstract structures
and functions and the relationships among them. For example, consider the mechanistic
model of the sodium-potassium pump in cells depicted in Figure 3.8. This is not a model
Copyright © 2018. Taylor & Francis Group. All rights reserved.
of a particular instance of a particular cell exchanging sodium ions for potassium ions.
Instead, it is a generic representation of what all such exchanges, in any living cell, have
in common.
Mathematical Models
As we have seen with the Lotka-Volterra model of predator-prey populations, mathemati-
cal models are equations that relate variables, parameters, and constants to one another.
These models attempt to quantify one or more dependences among variables in the target.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 111
For example, the Lotka-Volterra model uses a pair of first-order differential equations to
represent changes in predator and prey populations over time. The first equation,
dx/dt = αx − βxy
describes the fluctuations of a population of prey, dx, over time, dt, where αx represents
the prey population’s exponential growth and βxy represents the rate of predator/prey
interaction. The number of mice at a given time, for example, is determined by their
population growth, minus the rate at which they’re preyed upon by hawks. By contrast,
the number of hawks is fixed by their population growth given the supply of prey, minus
their mortality rate. Hence, the second equation,
dy/dt = δxy − γy
describes the fluctuations of a population of predators, dy, over the same time interval,
where δxy represents predator population growth and γy represents the loss of predators
due to death, disease, resettling, and so on.
Another example of a mathematical model is a game theory model called the prisoner’s
dilemma. Suppose that you and your friend Dominik have been arrested for robbing a
bank, and you’ve been placed in different cells. A prosecutor makes this offer to each
one of you separately:
You may choose to confess or to remain silent. If you confess and your accomplice
keeps silent, all charges against you will be dropped, and your testimony will be
used to convict your accomplice. Likewise, if your accomplice confesses and you
remain silent, your accomplice will go free while you will be convicted. If you both
confess, you will both be convicted as co-conspirators, for somewhat less time in
prison than if only one of you is convicted. If you both remain silent, I shall settle
for a minor charge instead.
Because you are in a different cell from your friend, you cannot communicate or make
agreements before making your decision. What should you do?
Assuming that neither you nor Dominik want to spend time in prison, you face a
dilemma. Each of you will be better off confessing than remaining silent, regardless of
what the other does. Either Dominik doesn’t confess, or he does. If Dominik doesn’t
Copyright © 2018. Taylor & Francis Group. All rights reserved.
confess and you do, you go free, whereas if you didn’t confess, you’d both be charged
with a lesser crime—and going free is better than being charged with a crime. If Dominik
does confess and you do also, you get charged as co-conspirators, whereas if you didn’t
confess, you’d be charged as solely responsible for the crime—and this carries a longer
prison sentence. So, regardless of Dominik’s decision, you are better off confessing.
However, the outcome of both you and Dominik confessing is worse for both of you
than the outcome of both you and Dominik remaining silent. In the first scenario, you
are both charged as co-conspirators, while in the second scenario, you are both charged
merely with a lesser crime. Thus, the prisoner’s dilemma seems to raise a puzzle for
rationality. You are better off confessing, regardless of Dominik’s choices, but if you both
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
112 Models and Modeling
are inspired by that fact to confess, things are worse for you than if you had both kept
your mouths shut. Reasoning independently, you should confess. But, even so, both of
you employing that reasoning leads to a worse outcome than if you’d both acted in the
best interest of your conspirator.
This situation is customarily represented using the mathematical formalism of game
theory. In its simplest form, the prisoner’s dilemma is a game described by the payoff
matrix shown in Table 3.1.
Although this situation may seem contrived, many real-life scenarios can be modeled
with a generic version of the payoff matrix, as the one shown in Table 3.2. Here the
numbers are generic payoffs, or consequences for each decision. The higher the number,
the more desirable the payoff. The first number in each set of parentheses represents
Player 1’s payoff, the second number Player 2’s payoff. The players are also generic; they
might be suspects in a crime, or they might be any other people, businesses, nations,
animals, or even bacteria. Any entities that vary their behavior in response to others’
behavior are fair game.
The most basic relationship that characterizes the prisoner’s dilemma also dictates the
situations to which it can be applied. This basic relationship is that, no matter what one’s
partner chooses to do, one always does better by choosing to defect (in the original story,
to rat out your friend) rather than to cooperate (in the original story, to remain silent).
But—and this is key—players always do better if they are partnered with cooperators
than if they are partnered with defectors. (You’re always better off if your buddy doesn’t
rat you out, regardless of what you choose.) This mathematical model boils that scenario
down to simple numbers that represent the desirability of different outcomes.
The dilemma of the prisoner’s dilemma thus amounts to how to encourage cooperative
behavior, which is better for everyone, in the face of the temptation to defect into selfish
TABLE 3.1 Payoff matrix for the prisoner’s dilemma with Dominik
Dominik
Remains Silent Betrays
You Remain Silent Each pays a small fine You get 3 years of prison
Dominik goes free
Player 2
Cooperate Defect
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 113
behavior. The prisoner’s dilemma model has been applied in a variety of circumstances
to help account for scenarios involving cooperative behavior, ranging from symbiotic
relationships among organisms to the practice of not killing opponent soldiers that devel-
oped spontaneously in the trenches of World War I (Axelrod, 1984).
For example, consider the cleaning symbiosis. Individuals of one species, the
cleaner, remove parasites and dead skin from individuals of the other species, the
client. This happens in many pairs of species, but let’s focus on cleaner fish and cli-
ent fish. Cleaner fish have the choice of cooperating by cleaning the client fish or
defecting by eating extra skin from the client fish. Client fish have the choice of
cooperating by allowing the cleaner fish to clean safely or defecting by threatening
or eating the cleaner fish. The fish are better off if both cooperate: the client fish
gets an important cleaning, and the cleaner fish gets dinner. But there’s a benefit to
defecting for each: the cleaner fish would get a bigger dinner by eating more from
the client fish, and the client fish would get to eat the cleaner fish. The prisoner’s
dilemma has been used to represent these options and the circumstances that can
enable cooperative symbiosis to evolve.
Computer Models
Many real-world situations can be modeled as cases of the prisoner’s dilemma. But
what we’ve seen so far isn’t enough to demonstrate why business firms, gangsters,
animals, bacteria, and nations so often cooperate in real life. One important reason
is that, in most real-life scenarios, decisions about whether to cooperate aren’t made
in an isolated room, cut off from your partner, and in expectation that you’ll never
see that partner again. Real firms, gangsters, animals, bacteria, and nations interact-
ing with one another do not make their decisions once and for all, and without
communicating with one another. Instead, they might guess at what each other
might do, signal their own intentions, or interact repeatedly over time, allowing for
reputations to form.
The model of the prisoner’s dilemma introduced here does not capture these kinds of
interactions, but it can be extended so that it does. One common extension is to the iter-
ated prisoner’s dilemma, where we suppose that two agents play the prisoner’s dilemma
with each other repeatedly. This is one way in which cooperative behavior has a chance
of winning out over the selfish choice to defect.
Insight into how this can happen was provided in the 1980s by a computer game.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
The political scientist Robert Axelrod invited various social scientists to submit computer
programs for a tournament of the iterated prisoner’s dilemma. Each computer program
had its own strategy governing the circumstances in which it would cooperate or defect,
and these programs were pitted against one another to see which would do the best in
the long run.
This was a computer model. Computer models or simulations are programs run on
a computer using algorithms, or step-by-step procedures, to explore aspects or changes
of a target system. Like other models encountered thus far, computer models can range
from incredibly simple to quite complex. The goal is to create insight into some target
system(s) by examining a similar set of dynamics encoded in a computer program.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
114 Models and Modeling
It’s unusual for computer models to invite participation from other scientists, as in
Axelrod’s tournament, but by doing so, Axelrod made it so that the strategies available
weren’t limited by what he could imagine or what he thought would be successful. And,
indeed, the result surprised him. The winning strategy—that is, the strategy that accu-
mulated the most points in the iterated prisoner’s dilemma tournament—belonged to a
program named Tit-for-Tat, submitted by a psychologist Anatole Rapoport. The program
was so simple that it had only a few lines of programming code. Tit-for-Tat cooperated
in the first round of any game it played in the tournament, and then it simply mirrored the
other player’s previous action in every round thereafter. So, when Tit-for-Tat played
against generally cooperative players (other programs), it also cooperated and so reaped
the rewards of that mutual benefit. But when Tit-for-Tat played against uncooperative,
selfish players, which defected a lot, it too played selfishly after that initial cooperative
move. This protected it from exploitation by selfish programs. Axelrod’s computer simula-
tion thus demonstrated the success of a strategy of reciprocal cooperation, which is often
called reciprocal altruism (see also Rapoport, Seale, & Colman, 2015 for a more recent
assessment of Tit-for-Tat).
EXERCISES
3.11 In your own words, characterize models of data and models of phenomena, and
give an example of each. How are these types of models similar? How are they dif-
ferent from each other?
3.12 We have characterized the steps of data modeling as (1) eliminating errors, (2) display-
ing measurements in a meaningful way, and then (3) extrapolating from those mea-
surements to the expected data for measurements that weren’t actually taken. Describe
each of these steps for any example of a dataset from this section or Chapter 2.
3.13 Describe the curve-fitting problem, and indicate how it relates to the three steps of
data modeling.
3.14 List the five types of models of phenomena described in this section, and give an
example of each. For each example, indicate why it counts as a model of that type
and what target system(s) it is supposed to represent. Then, rank your examples from
1 to 5, where 1 is the most concrete relationship to the target system(s) and five is
the most abstract.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
3.15 Define mechanism in your own words. Then, refresh your memory of photosynthesis.
(You probably encountered this in high school science if not since.) Consider this as
an example of a mechanism by outlining (a) the main component parts and (b) how
their operations are organized so as to constitute the mechanism’s activity. Then,
consulting the description you’ve developed, say whether you think photosynthesis
is a mechanism and why or why not.
3.16 Thomas Schelling, an American economist and Nobel Prize winner, famously devel-
oped a model of segregation in 1971 (see also Schelling, 1969). The model utilizes
a checkerboard, pennies, and dimes. Initially, squares on the board are filled ran-
domly by either a penny or a dime or left empty. Over time, pennies and dimes are
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 115
moved around the board according to a rule representing whether they were satis-
fied to stay in their current location. Schelling discovered that a movement rule rep-
resenting a preference for at least a small percentage of like neighbors would, over
time, lead to segregated patches of pennies and dimes on the board. An example
of a movement rule representing such a weak preference is the following: an occu-
pant moves if fewer than three of the eight adjoining squares have occupants the
same as the occupant (pennies if the occupant is a penny, dimes if the occupant is
a dime); otherwise, it stays.
A main application of this model is to housing segregation, where the model shows
that even a weak preference for at least a minority of neighbors to be the same as
oneself can lead to segregated patches of like inhabitants. (Importantly, this does not
show such a weak preference was in fact what led to housing segregation in any
given instance.)
a. In this application, what does the checkerboard represent, and what do the
pennies and dimes represent?
b. What does the movement rule represent? (This one is tricky.)
c. List some of the idealizations needed to use the model to represent housing
segregation. (Idealizations were discussed in 3.1.)
d. We said that Schelling’s model doesn’t show that weak individual preference
in fact led to housing segregation. What are the implications of this model for
segregated housing?
3.17 Mathematical models are among the most abstract representations of target systems.
Describe how it is that mathematical models represent target systems. You might look
back at our discussion of the Lotka-Volterra model and/or the prisoner’s dilemma
model for help.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
116 Models and Modeling
like experimentation, modeling can provide evidence for or against hypotheses about
real-world systems.
For example, animal models like Drosophila melanogaster are used to indirectly test
expectations about the genetic and molecular mechanisms of human disorders, like
Parkinson’s disease and diabetes. And interventions were made on certain features of
the Bay Model to test expectations about the consequences the Reber Plan would
have for the real San Francisco Bay. The iterated prisoner’s dilemma has been studied
to test expectations about the conditions that enable cooperative behavior to emerge
among self-interested individuals. Each of these uses of models is a way to indirectly
test scientists’ hypotheses about real-world systems. And in some cases, the results were
quite surprising.
So, models can play a role similar to experiments. One big difference is that, with
experiments, interventions are performed directly on the experimental system, whereas
with models, interventions to models are used to draw conclusions about the target sys-
tem. This is why models must aptly represent their targets. As we have seen, the work of
modeling also includes gaining a better understanding of the phenomena under investi-
gation, and then constructing models to reflect that understanding, so the models more
accurately reflect the phenomena.
Indeed, sometimes getting a model to more accurately reflect its target is the primary
task of modeling. In such cases, a model of some phenomenon can play a role similar to
a theory; a model can be a way to capture a set of ideas about what that phenomenon
is really like. When a model is proposed as a theory about what some phenomenon is
like, data gathered about the phenomenon, and perhaps about the model, can be used
as evidence to confirm or disconfirm that theory. An example of such a theoretical
use of modeling is the Lotka-Volterra model of predator-prey interactions. Given an
initial setting of parameters in the equations, one can make predictions about changes
in the sizes of a given predator population and the population of its prey, say, polar
bears and seals. Those predictions can then be tested against observations of the actual
predator-prey system—polar bears and seals living in the same broad area. When a
model behaves similarly to the expected target system(s) in more and more instances
and across different circumstances, it may become accepted as an account of how the
target behaves.
So, models can play an experimental role by providing a way to empirically investigate
a phenomenon. Or they can play a theoretical role, by positing an account of some phe-
nomenon. Sometimes the same model can even play both a theoretical and experimental
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 117
tive way and to know which map to use for which purpose. The same goes for scientific
models. Social conventions in model construction and use help scientists understand how
a model is supposed to relate to one or more target systems; the similarities between
model and target aren’t enough by themselves.
We should also note that not all models have targets that actually exist. The Bay Model
was used to represent the Reber Plan, which, thankfully, was never implemented. The
Schelling segregation model represents how only a preference for not being too much in
the minority among your neighbors can lead to segregation, but, as we have mentioned, this
doesn’t mean that such a preference is in fact solely responsible for segregation. (It isn’t.)
And some scientific models aim to explore possibilities that are even more distantly related
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
118 Models and Modeling
to real occurrences. Regardless, those models are used to represent scenarios of scientific
interest, and the knowledge gained from them concerns natural phenomena.
Second, all scientific models are used to learn about the world. Data models represent
data in forms that advance hypothesis-testing. By constructing and investigating models
of phenomena, scientists can reason about the targets they represent in hopes of gaining
new scientific knowledge. In both cases, the models are used as vehicles for learning about
natural phenomena investigated in science.
Third, all scientific models involve abstraction and idealization. Recall that models
bear not only similarities to their targets but also differences from them. The differences
come in at least two varieties: abstraction and idealization, which are not always easy to
distinguish neatly. Roughly, in representing a target system, you may leave things out, or
you may introduce features that the system clearly does not possess. Omitting or ignoring
certain known features of the system is abstraction; including features the target system
doesn’t have is idealization.
Abstraction and idealization serve different goals. Modelers often disregard many prop-
erties of their targets to focus on a limited set of features deemed important for the
purposes at hand. The Lotka-Volterra model, for example, abstracts away from proper-
ties of prey and predators, like their speed; their size; their capacity for camouflage; their
particular senses of smell, sight, and hearing; their location; and much else. Those features
aren’t essential to how predator-prey interactions influence population size and so have
been abstracted, or removed, from the model.
Like abstractions, idealizations are a way of simplifying a model, enabling scientists to
focus on the bare essentials of the phenomenon they’re interested in, without getting lost
in complicating details. But whereas abstraction involves leaving features of the target out
of the model, idealizations are properties of the model that the target doesn’t actually
have. We encountered the concept of an idealization earlier, when the Lotka-Volterra
model was first introduced. There we defined idealizations as assumptions made without
regard for whether they are true and generally with full knowledge they are false. In mod-
eling, this results in the misrepresentation of certain aspects of the system being studied.
For the Lotka-Volterra model, idealizations include the assumptions that prey can find
food at all times, that predators are hungry at all times, and that both predators and prey
are moving randomly through a homogenous environment. Scientists don’t think these
assumptions are true. But, in many situations, the falseness of these idealizations doesn’t
interfere with the Lotka-Volterra model’s representation of the predator-prey dynamics.
To recap, the three features shared by scientific models are (1) they represent one or
Copyright © 2018. Taylor & Francis Group. All rights reserved.
more targets; (2) they are used to learn about natural phenomena under investigation
in science; and (3) they involve abstraction and idealization. These last two features are
also related to models’ representational purpose. Abstraction and idealization are features
of models that affect how they represent their targets, and the ways models represent
their targets partly determines what can be learned. Representation is, then, at the heart
of scientific modeling.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 119
from how a mathematical model of fluid dynamics does. And both are different from
the computer model that eventually took over the work of the Bay Model. There’s no
one perfect model of a given phenomenon. Instead, the goodness of a model is judged by
considering what the modelers want to learn from and do with the model and, perhaps,
the ease of developing or using the model. Sometimes one model will be enough for
learning about a target system; other times, multiple models of the same target will be
necessary to gain knowledge.
Several features are desirable for models to have. These include accuracy (a model
realistically representing its target), generality (applying to a range of related target sys-
tems), precision (providing exact information), tractability (ease of use), and robustness
(stable behavior across different assumptions). Each of these features helps make a model
valuable. And each of these features comes in degrees. A model isn’t simply general or not,
or precise or imprecise; instead, models vary in the extent of their accuracy, generality,
precision, tractability, and robustness.
Attempting to create the perfect model by maximizing all of these features is futile,
since these features usually trade off against one another; gaining more of some desir-
able feature of a model often requires losing ground on some other desirable features.
For example, a model that is more general, applying to more target systems, is also often
less precise and accurate of any one target system. This is because targets differ from
one another in some regards, so tailoring a model to be precise and accurate of a specific
target makes it ill-suited to represent a different system. For related reasons, a model
that is more precise and accurate is often less tractable and robust. So, when construct-
ing models, scientists must decide which desirable features to emphasize and which to
compromise on. In the rest of this section, we elaborate on how the desirable features
of models trade off against one another. (See Levins, 1966, on the issue of trade-offs in
model-building in population biology.)
Accuracy
Models representing more actual features of a target system tend to be more descriptively
accurate, or realistic. A model representing all and only the actual components and fea-
tures of its targets, as it actually has them, would be a model that is maximally accurate.
But this ideal is seldom achieved, and it’s unnecessary for practical success; recall that
models are improved by some differences from their targets. For example, a mathemati-
cal model of drought-resistant landscaping is improved by accurately accounting for how
Copyright © 2018. Taylor & Francis Group. All rights reserved.
water-intensive different plantings are. But such a model would be unwieldy if it included
a parameter for the number of blades of grass in order to be more accurate. Even if such
parameters increased the model’s accuracy, this wouldn’t give any additional insight into
drought resistance. And it would come at a tremendous cost to tractability and generality.
Each time you had a different number of blades of grass, the model would work differently.
However, for mathematical models of which kinds of turf are the most water-intensive,
it may be entirely relevant to know how many blades of grass there are per square meter
of sod or (perhaps) differences in the water absorption rates.
So, which features are important for models to represent will depend on which
phenomena modelers are interested in. Think of the Bay Model again. The engineers
cared about salinity and how water moved in the bay but not about the color of the
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
120 Models and Modeling
bay floor or the exact number of water molecules. The features worth modeling accu-
rately are the most relevant features for the modelers’ interests. Models are benefited
by accuracy because this increases their similarity to their targets, which in turn makes
findings about the model more certain to hold of the target as well. However, some
properties of a target are best excluded from a model because their exclusion has
compensatory benefits.
Generality
A model is more general when it applies to a greater number of target systems.
Generality is a desirable feature of models insofar as it enables models to be reused
in a variety of circumstances and, more significantly, because general models make it
possible for scientists to discern what a variety of phenomena have in common with
one another. This is a step toward formulating general theories or laws about phe-
nomena of interest.
Consider the prisoner’s dilemma model again. Because it can apply to humans, bacteria,
corporations, and many other entities, this is a general model with numerous applications.
That generality also reveals something which all those types of entities have in com-
mon: repeated interactions can enable cooperation to spontaneously emerge. However,
sacrificing some generality in a model can be worthwhile, if doing so enables the model
to more accurately represent its target. A general prisoner’s dilemma model might be
supplemented with information about, say, how natural selection favors bacteria that can
coexist in close proximity to one another (a form of cooperation). The resulting model
will give more insight into bacteria cooperation in virtue of this additional detail. But
it also will be less general—it will no longer apply to humans or corporations. Which is
better depends on the modelers’ aims.
Precision
A model is more precise to the extent that it more finely specifies features of the target.
For example, a climate model that allows scientists to predict how much warmer the
global average temperature will be in 30 years within a range of ±1° Celsius is more precise
than a model that allows them to predict a ±5° Celsius range of temperature increase in
20 years. Notice that precision is different from accuracy. Whereas accuracy is a matter
of a given value’s proximity to the true value, precision is a matter of the proximity of
Copyright © 2018. Taylor & Francis Group. All rights reserved.
values in a range. Think again of an archer loosing arrows at a target. Arrows that are
scattered all around the bull’s-eye but very near to it are accurate but imprecise. Arrows
that are tightly clustered together but off-center, away from the bull’s-eye, are precise
but inaccurate See Figure 3.9 for an illustration of this.
Consequently, a model could be very precise but still inaccurate. For example, the
prediction enabled by the more precise climate model might turn out to be wrong.
Greater precision benefits a model by enabling it to give a more specific characterization
of its target and to make more specific predictions about that target. But increasing preci-
sion usually comes at the cost of a model’s generality, its tractability, and sometimes its
accuracy. Like generality, precision often trades off against accuracy. The more specific a
prediction is, the easier it is for that prediction to be incorrect.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 121
accuracy
precision
Tractability
Tractability is the ease of developing and using a model. This could involve different con-
siderations, for example, the time it takes to run a model on a computer, or whether the
equations of a mathematical model have exact solutions. It could even involve whether a
modeler happens to already be familiar with one approach but not another. More tractable
models are easier to construct, manipulate, or analyze.
Consider, for example, that the iterated prisoner’s dilemma involves agents having repeated
encounters, and so this model is less tractable than the original prisoner’s dilemma. One
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
122 Models and Modeling
consequence of this decreased tractability is that scientists know exactly what the pos-
sible outcomes are for the original prisoner’s dilemma, but they cannot in general predict
the outcomes for its iterated version. This is why Axelrod ran a computer tournament to
explore some of the possible outcomes. For obvious reasons, tractability is never maxi-
mized though. The easiest thing to accomplish is usually nothing at all. And more com-
plicated models can result in more accurate, precise, and insightful findings. The iterated
prisoner’s dilemma reveals how repeat encounters (in certain circumstances) can over-
come the dilemma entirely, making cooperation directly beneficial.
Robustness
A more robust model is one that changes less despite variation in its assumptions.
Consequently, robustness is a measure of insensitivity to the features that differ from
the target, including the model’s abstractions and idealizations. Normally, scientists don’t
want their models’ predictions to be sensitive to such features. To be trustworthy, the
predictions should be based as much as possible on known similarities between the model
and target. But limited robustness is inevitable. Models incorporate assumptions that are
needed for them to produce the desired information, so to some extent, those assump-
tions always matter. What scientists aim to avoid is over-reliance on specific assumptions
that are unlike to be true or even known to be false.
Multiple models are sometimes used to determine how robust a model’s predictions
are. If different models, with different assumptions and details, all predict roughly the
same result, that prediction seems more trustworthy than if it had been generated by
just one model, with uncertain assumptions and parameters. Robustness analysis, which
was introduced in Section 3.1, capitalizes on this idea. Robustness analysis is possible
whenever multiple models are employed; it’s common in climate science, for example.
predictions; and general enough to be enlightening. The balance struck thus depends in
subtle ways on the phenomena under investigation, the scientists’ circumstances, and the
purposes to which the models are put.
EXERCISES
3.18 Describe the experimental use of models, and explain why models are well situated to
play this role. Then describe the theoretical use of models, and explain why models are
well situated to play that role. Can the same models play both roles? Why, or why not?
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
Models and Modeling 123
3.19 Think again about the use of the Bay Model in testing the Reber Plan. This bears
some similarity to an experiment, but it is conjectural in a way that directly experi-
menting on the actual San Francisco Bay would not be.
a. Characterize the experimental features of this use of the Bay Model: the inde-
pendent variable, the dependent variable, how the independent variable was
intervened upon, and what the findings were.
b. Describe at least one way in which the findings are less certain in their implica-
tions for the effects of the Reber Plan than an actual experiment would have been.
c. Describe at least three advantages to using this model instead of directly inves-
tigating the effects of the Reber Plan. You might consider the desirable features
of models described earlier in formulating your response.
3.20 What are the three features that we have said all models share? How do these three
features relate to one another?
3.21 In a paragraph, describe how models represent their targets. You should reference all of
the following: similarities, differences, social conventions, abstractions, and idealizations.
3.22 Define abstraction and idealization in your own words. What is the difference
between them?
3.23 Choose one of the models we have discussed in this chapter. Say which model
you’ll focus on and what target system(s) it represents. Then, formulate a list of the
abstractions involved in using that model to represent this system and a separate
list of the idealizations involved in using that model to represent this system. You’ll
need to think beyond what’s actually said about the model, considering especially
the differences between the model and its target(s).
3.24 Describe in your own words all five of the desirable features of models character-
ized in the last part of this section. Then, compare the classic game theory math-
ematical model of the prisoner’s dilemma and the computer model of the iterated
prisoner’s dilemma on each feature. For each feature, write down whether you think
one model is better and which one, if you think the two models tie, or if you don’t
have enough information to decide. In all cases, explain your answer.
3.25 Consider your answer to 3.24. Describe a purpose you think the classic game
theory model of the prisoner’s dilemma would serve better than the computer model
of the iterated prisoner’s dilemma. Then, describe a purpose you think the computer
model of the iterated prisoner’s dilemma would serve better than the classic game
theory model of the prisoner’s dilemma.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
3.26 Scientists have constructed models of atoms, genetic lineages, economies, rational
decisions, traffic, forest fires, and climate change. Locate and investigate a scientific
model we have not discussed in this chapter.
a. Identify the type of model it is and what target system(s) it’s used to represent.
b. Describe how the elements of the model represent features of the target system(s).
c. Describe what scientists have learned about the target system(s) from the model.
d. Why is this model a helpful way for scientists to investigate this phenomenon?
In answering this question, think back to the challenges of experimentation
discussed in Chapter 2, the advantages of modeling discussed in 3.1, and the
desirable features of models discussed in 3.3.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
124 Models and Modeling
FURTHER READING
For more on the use of models in science, see Weisberg, M. (2013). Simulation and similar-
ity: Using models to understand the world. Oxford: Oxford University Press.
For more on mechanistic models, see Glennan, S. (2005). Modeling mechanisms. Studies
in History and Philosophy of Biology and the Biomedical Sciences, 36, 443–464.
For a discussion of computer modeling and attention to climate change models, see Wins-
berg, E. (2010). Science in the age of computer simulation. Chicago: University of Chi-
cago Press.
For a more general discussion of computational methods in science, see Humphreys,
P. (2004). Extending ourselves: Computational science, empiricism, and scientific method.
Oxford: Oxford University Press.
For a classic treatment of scientific modeling, and especially models’ relationship to analo-
gies, see Hesse, M. (1963). Models and analogies in science. London: Sheed & Ward.
For more on how models represent target systems, see Giere, R. (2004). How models are
used to represent reality. Philosophy of Science, 71(Suppl.), S742–S752.
For an account of idealization and how it influences science, see Potochnik, A. (2017).
Idealization and the aims of science. Chicago: University of Chicago Press.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:09:41.
CHAPTER 4
Patterns of Inference
some other pre-existing substratum. Hence, if the universe is not eternal, an infinite regress
arises; the sequence of reasoning never terminates. Each purported material substratum
itself requires another substratum from which it comes. Aristotle concluded that matter
must be eternal and that the universe did not have any beginning.
From the early Middle Ages (roughly the 7th century) to the end of the Renaissance
(roughly the 16th century), scholars and theologians continued to engage with ques-
tions about the age of the universe. The structure of Aristotle’s reasoning was largely
kept, but the eternality of the universe was replaced by the eternality of God in order
to fit with various creation stories. The universe itself was often estimated to have come
into existence around 4,000 BCE (that is, 6,000 years ago). The estimate was derived
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
126 Patterns of Inference
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 127
Mount Wilson Observatory in Southern California, Hubble discovered evidence that the
universe is much larger than people previously thought and that the universe is expanding.
Pointing the telescope toward the Andromeda Nebula, Hubble saw stars similar to
those nearer to Earth, only dimmer. One of those was a Cepheid variable, a star whose
brightness as seen from the Earth changes periodically. Hubble knew of the relationship
between the period of time it takes a Cepheid’s brightness to change and the luminosity
of the star, which is the total amount of energy it emits in one second. Thus, from the
period of the Cepheid, Hubble could calculate its luminosity, thereby determining how
much brighter it was than the Sun.
Light travels at a constant speed of about 300,000 kilometers per second. Over the
course of a year, light travels nearly 9.5 million gigameters (a gigameter is one billion
meters); this distance is one light-year. Furthermore, the apparent brightness of a star—
that is, how bright a star appears to be as seen from a distance—depends on the distance
to the star. Once this relationship is known, it can be used, along with knowledge of
the speed of light, to determine the distances to stars and faraway galaxies. Hubble did
just that: he used his knowledge of the relationships between light’s speed of travel, the
apparent brightness of a star, and its distance to calculate the Cepheid’s distance from
Earth. Based on the distance of that Cepheid variable, Hubble reasoned that Andromeda
was in fact a different galaxy from our galaxy, the Milky Way. This discovery, announced
in 1925, demonstrated that the universe is much larger than had been thought.
Hubble also demonstrated that the universe has not always been this large. It’s expand-
ing. His reasoning started from the claim that light, like sound, will change its frequency
depending on the relative movement of the object emitting it and the observer. An
example is the change in frequency of an ambulance siren as it moves toward, and then
away, from an observer. The siren sounds higher pitched as it approaches, and then lower
pitched once it has passed. This frequency change, called the Doppler effect, was discov-
ered in the mid-19th century by the Austrian physicist Christian Doppler (1803–1853).
It has proven useful in a number of scientific investigations. For Hubble’s purposes, the
important implication was that a star moving away from Earth appears redder, while a
star moving toward Earth appears bluer. The degree of redness of receding stars is called
redshift.
Using the technique of astronomical spectroscopy, Hubble discovered that the redshift
of starlight from any galaxy increased in proportion to the galaxy’s distance from Earth.
This indicates that galaxies are moving further and further away from Earth. In 1929,
Hubble announced these findings, which suggest that the universe is expanding. This is
Copyright © 2018. Taylor & Francis Group. All rights reserved.
now known as ‘Hubble’s Law’. According to recent estimates, the universe’s expansion
rate, known as ‘Hubble’s constant’, is about 70 kilometers per second per megaparsec
(km/sec/Mpc), where 1 megaparsec (Mpc) is approximately three million light-years—an
extremely long distance!
So, Hubble showed that the universe is not only much larger than previously estimated
but also expanding. But how do these findings bear on the question of the age of the
universe? The answer again concerns the relationship between time and the movement
of starlight through space. The simple fact that astronomers like Hubble can observe stars
from very distant galaxies indicates something about the age of those stars and thus about
the age of the universe containing them too. No star can be older than the universe. So,
we can estimate the minimum age of the universe on the basis of the age of the most
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
128 Patterns of Inference
distant stars we can observe. In this way, Hubble was able to show that the universe was
at least 10 billion years old. Currently, the furthest objects that deep space telescopes
have detected are approximately 13.8 billion light-years away. Therefore, the universe
must be at least 13.8 billion years old. This finding has also been supported by convergent
evidence from sciences like cosmological physics and geochemistry.
The previous three chapters have focused in part on the importance of empirical
evidence in science. And indeed, empirical evidence is essential for developing scientific
knowledge. But for observations to lead to knowledge, scientists must assess their sig-
nificance and implications, and the relationships among them. In other words, scientific
knowledge comes not from mere observation, but from reasoning about observations.
Aristotle sought to establish that the universe is eternal by showing that the denial of
this would lead to an absurd infinite regress. Hubble combined empirical observations
with calculations of light’s travel over distances and through time to support a precise
estimate of the universe’s age. Hubble appealed to empirical evidence in ways Aristotle
did not, but both reasoned their way to conclusions.
from this data set to the conclusion that the universe is expanding was also developed
and refined over time, with not just Hubble but many other astronomers contributing
(Kragh & Smith, 2003).
Scientific reasoning involves the application of broad reasoning skills to the concerns
and content of science: to greenhouse gases, light-years, molecules, ecosystems, and, as
in Kahneman’s work, even to reasoning processes themselves. We have already encoun-
tered many examples of scientific reasoning. These include, to name a few, reasoning
from large-scale carbon release during the last two centuries to the dramatic increase in
the average global temperature (Chapter 1); reasoning from the temperature of colored
lenses to the hypothesis that light colors vary in temperature (Chapter 2); and reasoning
from the results of modeling the San Francisco Bay to the rejection of the Reber Plan
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 129
(Chapter 3). Chapter 4 began by describing how scientists reasoned from the speed of
light and observation of distant astral bodies to the conclusion that the universe must be
at least 13.8 billion years old.
Deliberative scientific reasoning involves making and evaluating inferences, and
inferences are the backbone of any argument. An inference is a logical transition from
one thought to another that obeys abstract rules. Whereas reasoning, as we’ve char-
acterized it, is a psychological process, the features of inference are instead logical.
An argument is a set of statements (stated propositions) with inferential structure.
You might think of an argument as a set of instructions for performing inferences
to reason your way to some conclusion. This differs from the everyday use of the
word argument to mean bickering—a quarrel one might have with friends or family.
An important part of scientific work is reasoning from empirical evidence in ways
that involve logical inferences, and assembling arguments reflecting the structure of
those inferences.
Making inferences and assembling arguments requires being able to distinguish the
roles of premise and conclusion. The premises of an argument are statements that provide
rational support, the basis for inference. The conclusion of an argument is the statement
that is supported by the premises, the endpoint of an inference. For example, recall
Aristotle’s reasons for thinking that the universe is eternal. These can be reconstructed
into an argument as follows:
1. If the universe is not eternal, then the universe came into existence.
2. Everything that comes into existence requires some pre-existing material
substrate.
∴ 3. If the universe is not eternal, then some material substrate existed before the
universe came into existence.
4. It cannot be the case that some material substratum existed before the universe
came into existence.
∴ 5. The universe is eternal.
The argument is written as an ordered list of statements. The first four statements are
the premises of the argument; the argument’s conclusion is the last statement in the list.
Statements inferred from one or more premises are marked with the symbol ‘∴’, which
is notation symbolizing words like therefore, so, or hence. As this example shows, an argu-
ment may involve more than one inference. The inference to the third statement is made
Copyright © 2018. Taylor & Francis Group. All rights reserved.
from the first two premises, and the inference to the fifth statement—the argument’s
conclusion—is made from the third and fourth premises.
Scientific reasoning involves three main patterns of inference: deductive, inductive,
and abductive. An argument is a deductive argument when the relationship of its prem-
ises to its conclusion is purportedly one of necessitation: the premises should together
guarantee, or make necessary, the conclusion. Inductive and abductive inferences are
non-deductive; the premises do not guarantee the conclusion, but they still give reason
to infer the conclusion. Inductive and abductive reasoning play a more central role in
scientific reasoning than deductive reasoning. We discuss these patterns of inference in
Section 4.3, and they also relate to the main topics of Chapters 5–7. But for now, let’s
concentrate on deductive inference.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
130 Patterns of Inference
Conditional Statements
Statements of the form ‘if …, then …’ are crucial elements of inferential reasoning. These
if/then statements are called conditional statements because one circumstance is given
as a condition for another circumstance. As an intuitive example, imagine the parents
of a young child asking her to eat her vegetables in order to get dessert: ‘If you eat your
broccoli, then you can have dessert’. The child then knows the ticket to dessert—shovel-
ing down that broccoli!
The first circumstance, following the ‘if’, is called the antecedent. This is the condition
upon which the other circumstance is introduced. The second circumstance, following the
‘then’, is called the consequent. This is the condition that arises from or hinges upon the
introduction of the antecedent. The latter term is closely related to the word consequence,
and in the previous example, it is just that: getting dessert is a consequence the parents
commit to on the basis of the antecedent condition, eating the broccoli.
Antecedent means existing prior to, coming first in time, and also being logically prior.
But for conditional claims, only the last meaning is relevant. Nothing guarantees that
an antecedent will come before its consequent. For example, consider the conditional
statement, ‘If Piet is a dog, then Piet is an animal’. This is a true conditional, because
being an animal is a guaranteed consequence of being a dog. But unlike broccoli and
dessert, being a dog doesn’t come before being an animal. Instead, in this example, if the
antecedent is true, the consequent is simultaneously true. Time-ordering of the anteced-
ent and consequent can also be reversed. For example, ‘If you are hungry now, then you
must not have eaten enough dinner’. In this case, the consequent (not eating enough
dinner) happened before the antecedent (being hungry now). But the antecedent is
still logically prior: being hungry is the condition placed on not eating enough dinner.
A good way to think about the logical relationship between antecedents and conse-
quents is in terms of requirements and guarantees, or, more formally, in terms of necessary
and sufficient conditions. For a conditional statement to be true, the antecedent occurring
guarantees that the consequent also occurs. The antecedent is thus a sufficient condition
for the consequent. Consider again the conditional statement, ‘If Piet is a dog, then Piet
is an animal’. Piet’s being a dog guarantees that Piet is also an animal; being a dog is suf-
ficient for being an animal.
This doesn’t work in reverse. For a true conditional statement, the consequent occur-
ring doesn’t guarantee the antecedent will occur. Piet might be an animal but not a
dog; the consequent might be true but the antecedent false. Instead, the consequent is
a requirement, or a necessary condition, for the antecedent. Piet’s being an animal is a
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 131
C if A
A only if C
A guarantees C
Without C, A is not the case
Not A unless C
And there are still further ways to express this same conditional relationship.
One approach to navigating these non-standard forms is to understand the
meanings of their parts. Suppose somebody states that you have an identical twin
only if you have a sibling. What must be the case for this statement to be true?
What about for it to be false? Without a sibling, you can’t have an identical twin
sibling; having a sibling is necessary for having a twin. Consulting Table 4.1,
you’ll see that a necessary condition is a consequent. So, this statement was the
same as saying that if you have an identical twin, then you have a sibling. If it
were possible to have an identical twin without having siblings, then the state-
ment—either in its original form or the if/then formulation—would be false. But
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
132 Patterns of Inference
old, it couldn’t contain any objects that old. The universe being 10 billion years old is
thus a requirement for any star to be that old. Notice, in contrast, that finding out the
universe is a certain age would not guarantee that any star is that old. It is possible that
the universe is, say, 15 billion years old and all stars are younger. The universe having a
given age is a necessary condition for there to be a star of that age, but it is not a suf-
ficient condition.
Evaluating Inferences
Scientific reasoning can be evaluated as good or bad based on the abstract rules and formal
properties of the inferences involved. The study of the rules and patterns of good and bad
inference is called logic. Logic is a subject that can, and does, fill many textbooks. We’ll
keep our discussion here as brief as possible, but some basic ideas of logic are important
for understanding successful scientific reasoning.
The evaluation of both deductive and non-deductive inferences focuses on two main
questions. First, are the premises sufficient to rationally support the conclusion? And sec-
ond, are those premises true? The first question assesses the logical relationship between
premises and conclusion, the grounds for inference. The second question assesses the
status of the inference’s premises themselves. Good inferences answer both questions
affirmatively: there is good reason to believe that all premises are true, and together,
those premises provide sufficiently good reason to infer that the conclusion is true. The
premises of a good inference should together provide a logically compelling reason for
thinking the conclusion either must be true (in deductive inference) or is likely to be
true (in inductive and abductive inference).
When the truth of the premises of a deductive inference guarantees the truth of the
conclusion, the inference has the property of being valid. This term has several differ-
ent meanings. In one non-technical use, it simply indicates something is reasonable or
understandable. In Chapter 2, we discussed the external and internal validity of experi-
ments; this is another meaning of validity. Here, in the context of deduction, validity
has a technical definition different from these other meanings. A deductive inference is
valid just when the truth of the premises logically guarantees, or necessitates, the truth
of the conclusion. In a deductively valid inference, it is impossible for the conclusion to
be false provided that the premises are true. To assess whether a deductive inference is
valid, first suppose all of its premises are true. You should imagine those premises are
the only things you know about the world. Then, ask yourself whether there is any pos-
sible way the conclusion could be false. If there is any way for the conclusion to be false
Copyright © 2018. Taylor & Francis Group. All rights reserved.
while the premises are true, say, by imagining strange things about the world, then the
inference is invalid. If not, if the truth of the premises alone guarantees the truth of the
conclusion, the inference is valid.
Any deductive inference is either valid or invalid. A valid deductive argument can-
not be made more valid, or rendered invalid, by adding more premises. This property
of deductive reasoning is called monotonicity. Reasoning is monotonic if the addi-
tion of new information never invalidates an inference or forces the conclusion to be
retracted. For this reason, deductive arguments are rock- solid; you might be wrong
about a starting point—one or more of your premises might be false—but if you have
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 133
a valid inference, you can be absolutely certain that your premises (if true) guarantee
your conclusion.
Some patterns of deductive inference are common enough to have been given names.
For example, one of the most basic patterns of deduction is affirming the antecedent of
a conditional statement (also known by its Latin name modus ponens). This is when a
conditional statement and its antecedent are used as premises for concluding the conse-
quent must be true. For example,
1. If a star is more than 10 billion years old, then the universe must be more than
10 billion years old.
2. This star is more than 10 billion years old.
∴ 3. The universe must be more than 10 billion years old.
1. If the universe is in a steady state, then astral bodies remain the same distance
from one another.
2. It is not the case that astral bodies remain the same distance from one another.
∴ 3. It is not the case that the universe is in a steady state.
Each of the previous two arguments is deductively valid. The premises may not be true.
But if they were true, they would logically guarantee that the conclusion must also be
true. This holds for every other instance of these general patterns of inference. No matter
how long and deep you think, you will not be able to find an instance of either pattern
that is invalid.
Affirming the consequent and denying the antecedent, as general patterns of deductive
inference, can be expressed as follows. (‘It is not the case that’ can be indicated with the
negation sign ‘¬’.)
1. If A, then C 1. If A, then C
2. A 2. ¬C
∴ 3. C ∴ 3. ¬A
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Keep in mind that to have a valid argument, it is not enough to start with all true premises
and to have a true conclusion. Rather, the truth of the premises must force the conclu-
sion to be true; there must be no way around having a true conclusion (if the premises
are true). Consider another example:
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
134 Patterns of Inference
Both premises are true: cats and tigers are kinds of mammals. The conclusion is true as
well: tigers are one kind of cat. But this is an invalid inference. Even though every state-
ment comprising it is true, the truth of the conclusion isn’t guaranteed by the truth of the
premises. To see this, substitute in ‘dogs’ for ‘cats’ in the argument. (Remember you can
do whatever you want, other than making a premise untrue, to try to make the conclusion
come out false. If you can accomplish this, the argument is invalid.) With this substitu-
tion, the two premises are still true, but the conclusion is not. The inference is invalid.
Here’s one more argument:
This argument is valid. If both premises were true, then the conclusion must also be
true. There is no possible way for both premises to be true but the conclusion false. Of
course, the premises aren’t both true. Buenos Aires is in fact in South America; but the
age of the Earth is approximately 4.54 billion years. So, even though this is a valid argu-
ment, we don’t have good reason to believe the conclusion.
The previous two examples illustrate that valid arguments can have false premises and
conclusions and invalid arguments can have true premises and conclusions. The best deductive
inferences are those that combine both validity and truth. These inferences are sound. A sound
inference is a valid deductive inference with all true premises. Being valid rules out inferences
like the cats and tigers example, where the conclusion is only accidentally true. Having all
true premises rules out inferences like the Earth and Buenos Aires example, where the infer-
ence is valid but the conclusion is nonetheless false because one or more premises are false.
A sound deductive inference takes all the guesswork out of establishing proof for a claim.
If you know both that all the premises are true and that the inference is valid, then you
know that the conclusion must be true. No additional evidence or reasoning can change
that. If it does, then either you didn’t actually have a valid deductive inference, or you
didn’t actually know that all the premises are true. Thus, if scientists know some inference
is sound, they can be certain that the conclusion is true beyond a shadow of a doubt.
psychology. People can fall for bad arguments, or they may not be persuaded by good ones.
But whether a deductive inference is good or bad is simply a matter of logic and truth. The
two main criticisms that can be made of a deductive argument are that (i) its premises are
false and that (ii) the conclusion isn’t validly inferred from the premises. When evaluating
a deductive argument, one should determine whether either or both of these criticisms
apply. And it is here that psychological reasoning and logical inference intersect. If you
think an argument is faulty on one or both of these grounds, you should consider whether
it can be repaired by replacing any false premises with true ones or whether additional
premises could be supplied such that there is a valid argument for the conclusion.
The valid inference patterns involving conditional statements discussed earlier—affirming
the antecedent and denying the consequent—have related invalid inference patterns
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 135
that result from confusing the roles of necessary and sufficient conditions in conditional
statements. Denying the antecedent occurs when a conditional statement and the negation
of its antecedent are used as premises for concluding that the consequent must be false
as well. Here is an argument that commits the error of denying the antecedent:
1. If a star is more than 15 billion years old, then the universe is more than 15 billion
years old.
2. No star is more than 15 billion years old.
∴ 3. It’s not the case that the universe is more than 15 billion years old.
This is an invalid argument. Even if the first two premises are true, that doesn’t guarantee
the conclusion is also true. As we have seen, the age of the oldest star is just a minimum
age for the universe. The conditional statement in the first premise reflects this, as the
consequent (the age of the universe) is a requirement for the antecedent (the age of the
oldest star). The antecedent guarantees the consequent but not the other way around. So,
denying the antecedent, as the second premise does, provides no good reason to believe
that the consequent is the case, but it doesn’t demonstrate that the consequent is not
the case either.
Affirming the consequent occurs when a conditional statement and its consequent are
used as premises for concluding that the antecedent must also be true. Here is an argu-
ment that commits the error of affirming the consequent:
1. If the Andromeda Nebula is 13.8 billion light-years away, then the universe is at
least 13.8 billion years old.
2. The universe is at least 13.8 billion years old.
∴ 3. The Andromeda Nebula is 13.8 billion light-years away.
This is also an invalid argument. Both premises are true, but they don’t guarantee the
truth of the conclusion. Some specific astral body that we can view from Earth being 13.8
billion light-years away does guarantee the universe is at least 13.8 billion years old, but
this is not required for the universe to be that old. The conclusion here is in fact false,
since Andromeda is around 10 billion light-years away.
Situations that you can describe, whether real or imagined, in which the premises of
an argument are true but the conclusion is false are called counterexamples to the argu-
ment. Counterexamples demonstrate that an argument or inference is invalid.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
So far, the defects in reasoning we have seen are with the form of the inference.
But sometimes the problem with an inference is an empirical one, not a logical one.
Sometimes, even when an argument is valid, the world doesn’t cooperate with the state-
ments made about it. This is one place where the detective work of science often comes
in. Consider, for example, the following argument about atoms (recall also that the word
atom means indivisible, from the Greek a- + temnein, meaning not + to cut.)
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
136 Patterns of Inference
This is a valid argument, which involves affirming the antecedent—a valid inference
pattern. Given premises 1 and 2, it follows that atoms are indivisible; and from the
conjunction of that claim with premise 3, it follows that atoms are the smallest type of
matter. The problem, of course, is that scientists discovered particles smaller than atoms
over a century ago. Electrons were discovered in 1897, followed by the subsequent dis-
coveries of protons, neutrons, neutrinos, positrons, muons, bosons, and hadrons, which are
all smaller than atoms. These discoveries show the conclusion to be false: atoms are not
the smallest type of matter. So, the argument is not sound. Because the argument is valid,
learning that the conclusion is false also tells us something about the premises: at least
one of the three premises is also false. Can you figure out which is to blame?
We have seen that reasoning can falter because of a defect in the form of the infer-
ence, or because they accidentally contain a false premise. In other cases, the defect in
reasoning owes to an informal fallacy, which is a faulty inference pattern where the
defect in reasoning lies with the inference’s content rather than its form, and which goes
beyond just merely have false premises. Unfortunately, there is no fully unified theory
of informally fallacies, nor any universally agreed upon definition (Walton, 1989/2008);
and there are hundreds of such fallacious patterns. Here are a few that are unfortunately
common in debates about science.
The strawman fallacy involves caricaturing someone’s thoughts in order to criticize
the caricature rather than the actual thoughts. Here is an example:
however, had no expertise in any academic subject whatsoever. Appeals to his book are
poor grounds for scientific conclusions about well-being, mind, or the cosmos. It’s some-
times difficult to assess whether some authority is legitimate. For example, sometimes
genuine experts in one scientific field make pronouncements about other fields in which
they have no authority. Uncovering appeals to irrelevant authority thus can require careful
analysis of credibility. This relates to Chapter 1’s discussion of how politicians should not
be viewed as experts on climate change science and the broader issues about expertise
introduced there.
Finally, appeal to ignorance is another informal fallacy. Arguments that commit this
fallacy conclude that a certain statement is true because there is no evidence proving
that it is not true. For example,
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 137
1. There is no compelling evidence that the pyramids were not built by extrater-
restrial creatures.
∴ 2. The pyramids were built by extraterrestrial creatures.
Plainly, this is a bad inference. Indeed, there’s a slogan that ‘absence of evidence is not evi-
dence of absence’. In other words, not having evidence that something is true isn’t necessarily
reason to think it isn’t true. For this example, we can imagine things that might provide
evidence that the pyramids were built by extraterrestrial creatures, but it’s hard to even
imagine how we could provide evidence that they weren’t. More generally, a lack of empirical
evidence in support of some scientific claim is usually reason not (yet) to believe the claim
is true. But this is generally not grounds for declaring the claim false, for the lack of evidence
may say more about the limits of our scientific knowledge than how the world really is.
The fallacy of appealing to ignorance highlights three interesting features of reasoning.
First, it is generally easier to prove that something is the case than that it is not the case.
Perhaps it would be better to examine evidence for who did in fact build the pyramids
than to simply look for evidence that it wasn’t aliens. Second, the burden of proof, or
the obligation to provide evidence in support of a belief, generally lies with the person
who makes an assertion. So, if you assert that the pyramids were built by aliens or that
genetically modified foods are risky for human health, then you should be able to provide
evidence in support of your assertion when asked to do so. Third, the more extraordinary
a statement is, the more evidence it requires. When a chemist asserts that a solution must
be acidic because the litmus paper turned bright red, there is usually little need to ask
her how she knows that the color was red. Extraordinary claims, however, such as that
all life on Earth has evolved from a single common ancestor, require a lot of evidence.
The English naturalist, geologist, and biologist Charles Darwin (1809–1882) spent years
assembling evidence for his theory of evolution and common ancestry, and many scientists
following Darwin have added and improved upon that store of evidence.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
138 Patterns of Inference
EXERCISES
4.1 Define reasoning, inference, and argument, and describe how they are involved in
science (even though science is based on empirical evidence).
4.2 The following statements concern necessary and sufficient conditions. For each state-
ment, rephrase it in the form of a standard if/then conditional statement and say
whether it’s true or false.
1. Being a mammal is a sufficient condition for being human.
2. Being human is a sufficient condition for being an animal.
3. Being alive is a necessary condition for having a right to life.
4. Being alive is a sufficient condition for having a right to life.
5. Having a PhD is necessary if you want to be a scientist.
6. It’s sufficient for being awarded the Nobel Prize in immunology that one gener-
ates the cure for cancer.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
4.3 Rephrase each of the following statements into standard conditional statements, and
then say whether they’re true or false.
1. P is a sufficient condition for Q if it is true that if P then Q.
2. It is true that if P then Q, but only if Q is a necessary condition for P.
3. It is true that P only if Q, but only if P is a sufficient condition for Q.
4. Not Q is a sufficient condition for P if it is true that P unless Q.
5. Something is a brother if and only if it is a male sibling. So, being a male sibling
is necessary for being a brother.
6. Something is a brother if and only if it is a male sibling. So, being a male sibling
is sufficient for being a brother.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 139
4.4 Define deductive inference, validity, and soundness, and then answer the following
questions. Explain each answer.
a. Is every deductive argument valid?
b. Is every deductive argument sound?
c. Is every valid argument sound?
d. Is every sound argument valid?
4.5 Rewrite each of the following arguments in standard form, with numbered premises
and a conclusion. For each argument, say whether it is valid and whether it is
sound. Give reasons to justify each of your answers.
1. LeBron James must be mortal. After all, all humans are mortal, and LeBron
James is a human.
2. God is often characterized as the most perfect being. A perfect being must
have every trait or property that it would be better to have than not to have.
Since one of those properties is existence—that is, it is better to exist than not to
exist—then God exists.
3. The number 1 is a prime number, and 3 is a prime number. So too are 5 and
7. Therefore, all odd integers between 0 and 8 are prime numbers.
4. Real Madrid has won more than 17 games every year for the past 30 years.
So, you can safely bet Real Madrid will win more than 17 games this year.
5. The universe cannot be younger than 11 billion years old because the age of
the oldest known stars is 11 billion years old.
6. The term tachyon refers to a particle that travels faster than light. Therefore, it’s
not the case that nothing travels faster than light.
4.6 Come up with an example argument employing the inference pattern of affirming
the antecedent. Do the same for denying the antecedent, affirming the consequent,
and denying the consequent. For each argument, say whether it’s valid. For each
invalid argument, provide a counterexample. For each valid argument, say whether
it’s sound.
4.7 Describe the three informal fallacies outlined in this section. Give a new example of
each. Try to think of a real instance you’ve encountered, but if you can’t, it’s fine to
make up an example.
4.8 Review the passage about Hubble’s discoveries in the first part of this section. Sum-
marize the inferences that led Hubble to conclude that the universe is over 10 billion
years old.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
4.9 Review the passage about Hubble’s discoveries in the first part of this section. Iden-
tify three conditional statements involved in Hubble’s inference that the universe is
over 10 billion years old. (These might not be written in the text in if-then form, and
some of the conditional claims involved in Hubble’s inference process might not
even be explicitly written out.) Write out the three statements in standard if-then form.
4.10 Review the passage about Hubble’s discoveries in the first part of this section. Sum-
marize the inferences Hubble made that led to the conclusion that the universe is
expanding. Then, put that argument into standard form, with numbered premises and
a conclusion. Are any premises needed for a valid deductive argument missing? If so,
add them, even if they weren’t explicitly stated in the description of Hubble’s reasoning.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
140 Patterns of Inference
4.11 Read the following passage, and try to understand the argument it makes.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 141
c. Assess the author’s reasoning. What are good points or inferences? What
weaknesses are there in the author’s reasoning?
d. Assess the author’s conclusion. Do you think the conclusion is right? Has the
author given adequate grounds for believing the conclusion?
but we do know that if the hypothesis is true, then the expectation will be true. This
conditional statement can be thought of as an answer to the question: ‘If this hypothesis
is true, what must be the case about the world?’
After deductively inferring expectations from the hypothesis, scientists make observa-
tions, perhaps by conducting an experiment. Those observations are then compared with
the expectations. Here too the H-D method sees a role for deductive inference. If the
observation does not match the expectation, that is, if the expectation is not observed,
then this enables a deductive argument for the conclusion that the hypothesis is false.
The inference pattern is denying the consequent, which we’ve learned is always a valid
form of deductive inference:
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
142 Patterns of Inference
Refutation
1. If H, then E
2. ¬E
∴ 3. ¬H
In this case, from the observations, we can deductively infer that the hypothesis is false.
In other words, the observations refute the hypothesis.
If instead the observations and expectations match, this enables the inference pattern
of affirming the consequent. Careful—that was an invalid form of deductive inference!
In this case, no deductive argument for or against the hypothesis is possible. A match
between expectations and observations is consistent with the truth of the hypothesis, but
it does not guarantee the truth of the hypothesis. If the evidence matches expectations,
the hypothesis is confirmed, but if not, it is refuted.
Confirmation
1. If H, then E
2. E
∴ 3. Probably or possibly H
Let’s work through a really simple example. Imagine the hypothesis is that all swans are
white. If it is true that all swans are white, then the swan you next observe will be white.
This is a true conditional claim: the antecedent guarantees the consequent. So, you go
out looking for swans, with the expectation that, if your hypothesis is true, you will see
a white one. Let’s say you instead encounter a black swan. This observation violates your
expectation; by denying the consequent, you’ve shown the antecedent (the hypothesis)
is false. Breaking news: it’s not the case that all swans are white! However, if the next
swan you see is white, then your observation matches the expectation. You haven’t proven
anything, but you do have a bit more evidence in favor of the hypothesis.
There is, then, a crucial difference between refutation and confirmation. Refutation is a
valid deductive argument that demonstrates the hypothesis is false. In contrast, confirma-
tion is not a deductively valid argument. The truth of the premises does not guarantee
the conclusion is true. The argument scheme for confirmation shown here reflects this by
concluding not H but ‘probably or possibly H’. An observation matching what a hypoth-
esis leads us to expect generally is taken to provide some evidence for the hypothesis.
But this isn’t always so, and it’s surprisingly tricky to articulate how this works. We will
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 143
puerperal or childbed fever. (Puerperium refers to the postpartum period following labor
and delivery.) A puzzling observation was that the mortality rate in the 1st Maternity
Division was about three times higher than in the adjacent 2nd Maternity Division. These
rates are shown in Table 4.3.
Why was the rate of puerperal fever so much higher in the first clinic? An answer to
this question might provide some insight into how to decrease the incidence of puerperal
fever overall.
Semmelweis (1861) made several observations that seemed potentially relevant.
Women with dilation periods longer than 24 hours during delivery died of puerperal
fever much more often. He also observed that patients in the first clinic fell ill in a
sequential manner, one after another. The health of patients and the skill and care provided
by their caretakers did not seem related to the incidence of puerperal fever. Finally, not
only was the illness rate in the 2nd Maternity Division lower, but women who instead
TABLE 4.2 Valid inference patterns, invalid inference patterns, and informal fallacies
1. If A, then C 1. If A, then C
2. A 2. ¬ C
∴ 3. C ∴ 3. ¬ A
1. If A, then C 1. If A, then C
2. ¬ A 2. C
∴ 3. ¬ C ∴ 3. A
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Appeal to irrelevant authority: appealing to the views of an individual who has no expertise
in a field as evidence for some view
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
144 Patterns of Inference
TABLE 4.3 Annual births, deaths, and mortality rates for all patients at the two clinics of
the Vienna maternity hospital 1841–1846
Year First Clinic Second Clinic
gave birth at home or elsewhere outside the clinic—even unattended on the street—were
unaffected by puerperal fever.
Semmelweis used these observations to rule out a number of proposed sources of the
illness. Puerperal fever wasn’t a city-wide epidemic. If it were, women who gave birth
outside the hospital would also suffer from the illness, but they didn’t. Nor was puerperal
fever triggered by psychological traumas during childbirth, like intense modesty from
being medically examined by male doctors (as had been proposed). If it were, surely
some women who gave birth in the streets would also experience puerperal fever, but
they didn’t. Most crucially, all proposed sources of the illness led to the expectation of
equal rates of the illness in the 1st and 2nd Maternity Wards. That expectation did not
match observations. So, reasoning in a way that is captured well by the H-D method of
refutation, Semmelweis rejected all these hypotheses about the cause of puerperal fever.
Semmelweis tried to develop hypotheses that were consistent with the observed differ-
ence in puerperal fever rates between the two maternity wards. One difference between
the wards was that the 1st Ward was staffed by male doctors and medical students, while
Copyright © 2018. Taylor & Francis Group. All rights reserved.
the 2nd Ward was staffed by female midwives. Women in the former gave birth on their
backs, women in the latter on their sides. Semmelweis changed procedures in the 1st
Ward so that all women there also gave birth on their sides. From the hypothesis that
giving birth on one’s back increases incidence of the illness, one can deductively infer the
expectation that changed birth position will decrease the incidence of the illness. Alas, this
expectation did not match Semmelweis’s observation: changing birth position in the 1st
Ward made no difference. Other hypotheses were similarly tested and similarly ruled out.
Then, at the end of March 1847, Semmelweis learned that his colleague Dr. Jakob
Kolletschka had died. Kolletschka was a professor of forensic medicine. He had been
performing an autopsy on a woman who had died from puerperal fever when a scalpel
had lacerated his finger. Kolletschka subsequently exhibited the same symptoms as the
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 145
FIGURE 4.2 Frieze at the Social Hygiene Museum in Budapest, honoring Ignaz Semmelweis
mothers and infants who had died of puerperal fever. Semmelweis was distraught by
his friend’s death, but he also saw the value of this information for the investigation of
puerperal fever. He hypothesized that the scalpel had contaminated Kolletschka’s blood
with ‘cadaverous particles’, and this caused the puerperal fever that led to his death.
Semmelweis also realized that this was supported by the observation of the difference in
illness rates between the two wards: doctors and medical students performed autopsies,
whereas midwives did not.
Semmelweis reasoned that if the hypothesis that cadaverous particles caused puerperal
fever were true, then the illness could be prevented by eliminating the cadaverous par-
ticles. To test this hypothesis, he required all students and midwives to thoroughly wash
their hands in a solution of chlorinated lime prior to examining patients. If this made no
difference, then cadaverous particles weren’t to blame, and this new hypothesis would also
be refuted. But, instead, the mortality from puerperal fever began to decrease, and the
incidence in the 1st Ward dropped to a similar level as in the 2nd Ward. Semmelweis’s
hypothesis was confirmed.
This is a good illustration of the H-D method and in particular the difference between
refutation and confirmation. Recall that, on the H-D account, refutation is decisive, as
it is the result of a valid deductive inference, whereas confirmation is weaker. It turns
Copyright © 2018. Taylor & Francis Group. All rights reserved.
out that Semmelweis’s confirmed hypothesis was wrong. Cadaverous material wasn’t
responsible for puerperal fever; it was a bacterial infection of the uterus. Luckily, chlo-
rinated lime is an antibacterial agent. Semmelweis thought the prescribed handwash-
ing worked because it removed cadaverous material, but instead, it worked because it
removed bacteria.
Some other important instances of hypothesis-testing are also well described by the
H-D method. Another example, which we encountered in Chapter 2, is the case of Arthur
Eddington’s confirmation of Einstein’s theory of relativity from the 1919 solar eclipse.
This was also a refutation of Newton’s cosmological theory. Einstein’s theory of general
relativity, as you may recall, implies that light will bend around a massive object like
the Sun. Newton’s theory also predicts light will bend because of gravity. However, the
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
146 Patterns of Inference
theory of general relativity implies that light will bend twice as much as the value pre-
dicted by Newtonian physics. Measuring how much light bends around the Sun allowed
Eddington to refute Newtonian physics and provide some confirmation of Einstein’s
theory of general relativity.
Auxiliary Assumptions
The H-D method seems to accurately capture something important about hypothesis-
testing in science, namely the distinctive power of refutation. Data that fit our expecta-
tions are well and good, but we can really learn something from data that contradict our
expectations. This also accords with the importance of hypotheses that are falsifiable, as
outlined in Chapter 1. The power of refutation is also what makes the idea of crucial
experiments compelling, as we discussed in Chapter 2 with the case of Newton’s prism
experiments. Yet, the H-D method also has its limitations. We’ll close this discussion by
describing one challenge to this account of hypothesis-testing; then, later in the chapter,
we will survey two powerful alternatives based on non-deductive patterns of inference.
The challenge to the H-D method is that the inference from a hypothesis to some
expectation is never truly deductive. Or, more precisely, additional claims are needed
in order to make a deductive inference from hypothesis to expectation valid. These
additional claims include background assumptions about how the world works, what in
Chapter 2 we called auxiliary assumptions. Lurking in the background of Semmelweis’s
inference about handwashing, for example, was the assumption that handwashing would
remove cadaverous material. Beyond Eddington’s refutation of Newtonian physics were
a number of assumptions about the behavior of instruments, the properties of light, the
location of certain astral bodies, and so on.
Such auxiliary assumptions often go unnoticed, either because they are assumed to
be true or, in some cases, simply because no one has noticed them. But because valid
deductive inference requires the premises to guarantee the conclusion, these auxiliary
assumptions are essential premises for the deductive inference from a hypothesis to some
empirical expectation, a key component of the H-D method. So, the schemes we identified
earlier for refutation and confirmation on the H-D account need to be adapted as follows:
Refutation Confirmation
∴ 3. ¬ H ∴ 3. Probably or possibly H
In this new formulation, the letter A stands for statements of whatever auxiliary assump-
tions are required as additional premises to validly deduce E from H. Required auxiliary
assumptions may include background ideas about the phenomenon under investigation,
as well as assumptions about the reliability of experimental instruments and measure-
ment procedures.
Taking into account auxiliary assumptions in the H-D schemes more realistically cap-
tures the type of reasoning that underlies hypothesis-testing. But this also introduced a
new problem. The refutation scheme with ¬ H, or it’s not the case that H, as its conclusion
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 147
∴ 3′. ¬ (H and A)
This amounts to the statement that it’s not the case that both H and A are true. In
other words, taking into account auxiliary assumptions, all you can deductively con-
clude from observations not matching expectations is that either the hypothesis is
wrong or one or more auxiliary assumptions is wrong (or both). Because of the need
for auxiliary hypotheses, the H-D method can’t provide a deductive argument that
the hypothesis is false.
This problem is known as the Duhem-Quine problem, named after the French physi-
cist, mathematician, and philosopher of science Pierre Duhem (1861–1916) and the
American philosopher and logician Willard van Orman Quine (1908–2000). One upshot
of the Duhem-Quine problem is that deductive logic alone is insufficient for successful
hypothesis-testing. In the face of refutation, scientists need to decide whether to give up
on a hypothesis or to question one or more of their auxiliary assumptions. It seems there’s
an element of choice. A scientist may well want to hold on to a hypothesis she likes and
look for another explanation for why the observations didn’t turn out as expected.
The hope of reasonably deciding whether to reject a hypothesis or an auxiliary assump-
tion isn’t entirely destroyed by the Duhem-Quine problem. Scientists typically have inde-
pendent evidence for many of their auxiliary assumptions. Instruments and measurement
procedures have been tested and employed in other circumstances, and background beliefs
about a phenomenon are often based on evidence. These considerations can be used to
help scientists decide whether, and when, to reject the hypothesis under investigation. Yet
the need for auxiliary assumptions limits the power of the H-D method of hypothesis-
testing. The Duhem-Quine problem makes clear that, just like confirmation, refutation
is messier than simple deductive inference.
Axiomatic Methods
Deductive inference plays a different kind of role in some fields of science. Progress in
scientific reasoning is sometimes achieved through formal axiomatization, a constructive
procedure by which statements are derived from foundational principles. The founda-
tional principles, called axioms, are accepted as self-evident truths about some domain.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
The axioms are then used to deductively infer other truths about the domain, called
theorems.
The most venerable example of axiomatization comes from the Greek mathematician
Euclid, who lived between the 4th and 3rd centuries BCE. Book I of Euclid’s Elements of
Geometry begins with 23 definitions and five axioms. The five axioms are the following:
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
148 Patterns of Inference
5. If two straight lines in a plane are met by another line, and if the sum of the internal an-
gles on one side is less than two right angles, then the straight lines will meet if extended
sufficiently on the side on which the sum of the angles is less than two right angles.
Together, these five axioms form the premises of Euclidean geometry. From these prem-
ises, one can validly deduce theorems about the congruency of figures, parallel lines, and
other results of Euclidean geometry. In turn, these theorems can be treated as premises
in new arguments aimed at validly deducing new theorems.
Euclid’s axiomatization of geometry was accepted as decisive for almost two millennia. It
was a clear example of rigorous scientific reasoning grounded in first principles, with the power
to systematize all existing knowledge of geometry. It deeply influenced Ibn al-Haytham’s
work in optics and Newton’s physical theory of mechanics. Since the 19th century, however,
non-classical geometries have been developed that diverge from Euclid’s axiomatization. Just
as Euclid’s geometry was central to earlier physics and astronomy, these non-Euclidean geom-
etries paved the way for Einstein’s radical new theories of the relativity of space and time.
This implies that the geometry of physical space itself is not in general Euclidean.
Another example of an important use of the axiomatic method concerns the founda-
tions of arithmetic. Concerned with questions about the exact nature of numbers, the Italian
mathematician Giuseppe Peano (1858–1932) employed axiomatic reasoning to give a rigor-
ous foundation for the natural numbers (0, 1, 2, 3, 4, …). Peano’s axiomatization of natural
numbers began with three primitive concepts, that is, concepts that were not defined in terms
of other concepts. Peano thought these primitive concepts were self-evident: the set of natural
numbers, N; the number zero, a member of the set N; and the successor function S. This
successor function can be applied to any natural number, and it will yield the next number
after it. For example, S(6) = 7. Likewise, S(0) = 1. From here, Peano laid down several axioms:
1. Zero is a number.
2. If n is a number, then S(n) is a number.
3. Zero is not the successor of a number.
4. Distinct natural numbers have distinct successors.
5. If 0 is an element in a set of numbers and the successor of every number is in that set,
then every number is in that set.
Given these axioms, the basic properties of natural numbers could be described and
theorems about them, including the arithmetic operations like addition and subtraction,
Copyright © 2018. Taylor & Francis Group. All rights reserved.
could be deduced. To take a simple example, the supposition that there is a number
preceding zero (S(k) = 0) would contradict axiom 3. Accordingly, the theorem that zero
has no predecessor in N can be derived from axiom 3.
EXERCISES
4.13 Summarize the H-D method. How does this method relate to hypotheses? How does
it relate to deductive reasoning? What’s the crucial difference between refutation
and confirmation?
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 149
4.14 There are supposed to be two applications of deductive inference in each H-D refuta-
tion. (a) What are those two deductive inferences, and how is each related to how we
have characterized hypothesis-testing in general? (b) Define the Duhem-Quine prob-
lem. Which application(s) of deductive inference does this problem interfere with?
4.15 Return to the description of Semmelweis’s investigation of puerperal fever. Identify
three inferences that can be described as uses of the H-D method (either refutation
or confirmation). For each, write out the inference as an argument in standard form
with premises and conclusion.
4.16 After reading the passage below, identify the hypothesis under investigation. What would
the researchers expect to find if the hypothesis is true? Finally, list five important auxiliary
assumptions required for a deductive inference from the hypothesis to the expectations.
4.17 Read the passage from exercise 4.16 above. Woodruff and Premack (1979) found that
after 120 trials, each of the four chimpanzees they tested showed a reliable tendency to
indicate the container with food in the presence of a cooperative human and an empty
container in the presence of the competitive human. Say whether the hypothesis under
investigation is confirmed, refuted, or neither by this evidence. Justify your claim.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
4.18 Imagine you want to estimate a rock’s age using the technique of radiometric dat-
ing. This technique allows scientists to estimate age from the known decay rate of
radioactive materials, given that traces of radioactive materials were incorporated
when the rock was originally formed. (a) What are some auxiliary assumptions
that you think are involved in a test like this? (b) You hypothesize that the rock is
3.8 billion years old, but the test results do not match your expectations. What
are some possible reasons that you could have gotten this result, even if the rock
is actually 3.8 billion years old? List at least three.
4.19 Describe the axiomatic method in your own words. How has this method been used
in science?
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
150 Patterns of Inference
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 151
FIGURE 4.3(a) Flint Michigan water crisis, with numbers indicating parts per billion (ppb); (b)
Lee Anne Walters, the Flint citizen-scientist who initially requested water-testing
sampling results.
Inductive Inference
Imagine you go to the grocery store, hankering for some grapefruit. The grocer takes one
grapefruit from the top of one box, cuts it open, and offers you a slice to taste. It tastes
good! What you may not notice is that the grocer is tacitly expecting you to making the
following inference:
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
152 Patterns of Inference
You draw three other grapefruits from the box at random. The three grapefruits look good,
like the grapefruit the grocer showed you. So, you buy that box of a dozen grapefruits.
What’s the inference you’re tacitly making?
1. The three grapefruits picked at random from this box are good.
∴ 2. The next nine grapefruits drawn from this box will also be good.
Neither of these is a valid deductive inference; the truth of the premises do not guarantee
the truth of the conclusions. Assuming the premises are true, the conclusions are at best
likely or probable. Accordingly, both inferences are inductive. An inference is inductive
when the inferential relationship from premises to conclusion purports to be one of
probability, not necessity. Even if the premise in each inference is true, the conclusion
may nonetheless be false. Perhaps not all grapefruits from the box are good, even if the
grapefruit the grocer showed you was good and even if you checked three other randomly
picked grapefruits from the box. For all you know, the rest of the grapefruits in the box
could be rotten.
Because the truth of the premises in inductive arguments does not guarantee the truth
of the conclusion, inductive inferences are always logically invalid. Nonetheless, reasoning
inductively is a primary form of making inferences in science and everyday life. Two com-
mon forms of inductive inference are generalizations and projections. Inductive generaliza-
tions are inferences to a general conclusion about the properties of a class of objects based
on the observation of some number of the objects in the class. If the conclusion applies
to all members of the class, the generalization is a universal inductive generalization. The
form of inductive generalizations is something like this:
Inductive generalization
The grocer’s inference was like this; it went from the premise that one grapefruit from the
box is good to the conclusion that all grapefruits are good. In contrast, inductive projections
(sometimes called next-case induction) are inferences to a conclusion about the feature of
some object that has not been observed based on the observation that some objects of
the same kind have that feature. The form of inductive projections is something like this:
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Inductive projection
1. O1, O2, O3, …, and On each have been observed to have property P.
∴ 2. The next observed O will have property P.
Your argument at the market was like this; it went from the premise that each of three
grapefruits you observed is good to the conclusion that the next nine grapefruits will be
good. These two patterns of inference are similar. The difference is that generalization
makes a prediction of some entire class of entities, whereas projection makes a prediction
of entities that have not yet been encountered.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 153
upon some 60-year-old unsecured vials of smallpox while cleaning out a storage closet
at the Bethesda campus of the National Institutes of Health in the USA. This discovery
undermined the reasonable inductive inference from the eradication of smallpox and
WHO’s strict control of remaining specimens to the conclusion that there were no other
smallpox specimens unaccounted for. It was a good inductive argument—until a premise
was added that directly contradicted its conclusion.
Third, and last, inductive inferences are of different strengths. The conclusion that
the grapefruits in the box are good would be stronger if the grocer had let you eat two
grapefruits from the box and both tasted good. Similarly, the conclusion that all of Flint’s
water is toxic was strengthened when the Virginia Tech team sampled water from nearly
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
154 Patterns of Inference
300 homes, compared to the earlier inference based only on problematic water samples
from a single home.
Good inductive inferences are strong, that is, likely to preserve truth. This means that
true premises are grounds for inferring the conclusion is probably true as well. Deductive
arguments are either valid or not, but this is not so for the strength of inductive argu-
ments. Strength comes in degrees: two arguments might be strong, but one might be even
stronger than the other. Further, any inductive argument, no matter how strong, can be
additionally strengthened. The degree of strength of an inductive inference may be mea-
sured by the probability that the conclusion is true given that all the premises are true.
Strong inductive inferences may nonetheless have false conclusions, as the smallpox
example shows. To take a more famous example, until the 17th century, Europeans
believed that all swan were white. Their belief was supported by strong evidence: no
European had ever observed a black swan, and no one they’d ever consulted had either.
However, in 1697, the Dutch explorer Willem de Vlamingh returned to Europe with two
black swans he had captured on Australia’s Swan River. The strong inductive argument in
favor of all swans being white was undermined by this development, and the conclusion
was shown to be false.
We discussed the hypothesis that all swans are white with the H-D model of hypothe-
sis-testing. The point there was to show the deductive force of refutation, or falsification, in
Copyright © 2018. Taylor & Francis Group. All rights reserved.
FIGURE 4.4 The black swan of the family (Black Australian swan surrounded by Bewick’s swans)
© Copyright Colin Smith and licensed for reuse under this Creative Commons License.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 155
contrast to confirmation. The discussion of inductive inference here sheds additional light
on the process of confirming hypotheses. Earlier, we merely pointed out that confirmation
does not involve a valid deductive inference. What it does involve is inductive inference.
From this perspective, inferring that a hypothesis is true from some observation(s) can
be judged according to the inductive strength of the inference.
is justified requires showing that it is generally reliable, which requires nothing other than
inductive inference. So, looking to a non-deductive justification for inductive inference
leads to circular reasoning: we would need to prove inductive inference is reliable in order
to justify inductive inference. In other words, we would have to assume the reliability of
the method whose reliability we need to establish. Consequently, inductive inferences
cannot be justified using non-deductive reasoning, either. Given that deductive and non-
deductive reasoning exhaust the possibilities, Hume concluded that inductive reasoning
cannot be rationally justified.
Hume also noted that the justification for induction appears to depend on what he
called the uniformity of nature assumption. This is the idea that the natural world is suf-
ficiently uniform, or unchanging, so that we are justified in thinking our future experiences
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
156 Patterns of Inference
will be consonant with our past experiences. The uniformity of nature assumption can’t
justify induction either, though, since this assumption is merely based in our past expe-
rience. We think nature is uniform because, so far, it has been. But what do we know
about tomorrow?
Philosophers of science have proposed several solutions to the problem of induction.
One possible solution begins from the observation that inductive inferences are intended
to warrant probable conclusions—not guarantees. And there are rational grounds for infer-
ring claims about the probability of something being the case on the basis of empirical
evidence. Perhaps, then, tools of statistical reasoning, which we focus on in Chapters 5
and 6, can justify some varieties of inductive inference. And statistical reasoning does have
a rational basis, provided by probability theory (an axiomatic theory).
A different approach to solving the problem of induction is simply to show that
inductive inference is the best we can reasonably hope for when it comes to making
reliable predictions (Reichenbach, 1938). Either nature is uniform, or it isn’t. If nature is
uniform and we want to make reliable predictions, then a non-inductive method like, say,
fortune-telling may or may not work. In contrast, inductive inference will clearly work.
(Remember the uniformity of nature assumption.) So, if nature is uniform, induction
will be more reliable than non-inductive methods. Now suppose nature is not uniform.
In that case, inductive inference will be unreliable, but so will any alternative methods.
Why is that? Well, suppose that fortune-telling were better than induction, that is,
that fortune-tellers were able to reliably predict the future. This success would imply
some kind of uniformity. But any uniformity in nature can be exploited by inductive
inference. You could, for example, inductively infer the future success of fortune-tellers
from their past successes. Consequently, whether or not nature is uniform, the best
approach one can take to making reliable inferences about the future or the unobserved
is inductive inference.
So, while the Duhem-Quine problem shows that deductive inference isn’t the full story
for hypothesis-testing, the problem of induction indicates inductive inference probably
isn’t the full story of scientific inference either. At the very least, both problems challenge
us to think more deeply about our grounds for inference. In the case of induction, we’ll
see in Chapter 6 how statistical inference may be able to support inductive reasoning
and make more precise its nature.
Abductive Inference
Copyright © 2018. Taylor & Francis Group. All rights reserved.
In 1915, the German scientist Alfred Wegener advanced a systematic proposal about
the geologic history of Earth. He proposed that a single landmass, named Pangaea, had
fragmented into the continents that we recognize today. Initially, Wegener’s hypothesis
was not widely accepted. At the time, most scientists accepted that the Earth’s molten
surface cooled billions of years ago and that the remnants of this cooling process are the
major landmasses that we recognize today. There were good reasons to accept the hypoth-
esis that, once encrusted, the Earth’s surface was relatively fixed and stable. But some
surprising geological features were left unaccounted for. For instance, if the continents
are fixed and stable and do not move, then the rough congruence of the shapes of some
continents (think of Africa and South America) is a puzzling coincidence; see Figure 4.5.
Further, some rocks that are now several thousands of kilometers apart have a variety of
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 157
Copyright © 2018. Taylor & Francis Group. All rights reserved.
FIGURE 4.5 (a) The Earth’s landmasses fit together a bit like puzzle pieces; (b) Marie Tharp
and Bruce Heezen
characteristics in common. And fossils of some early types of plants and animals were
distributed across continents.
Wegener hypothesized that the continents are not fixed on the surface of the Earth but
are instead very slowly drifting in relation to one another. If true, that hypothesis would
account for the puzzling observations that lacked an explanation if the Earth’s landmasses
are unchanging (see Wegener, 1929). In the 1950s, a few decades after Wegener’s initial
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
158 Patterns of Inference
proposal, the American geologists Marie Tharp and Bruce Heezen were working to map
the ocean floor, when they made a fascinating discovery about the Mid-Atlantic Ridge,
an extensive mountain range running the whole length (north to south) of the Atlantic
Ocean, almost entirely underwater. They learned that at the top of that ridge, running its
full length, was a valley, and many earthquakes originated in this valley. This, too, fit with
the hypothesis of continental drift. They had, it seemed, discovered that the seafloor was
spreading, further separating the landmasses on either side of the Atlantic Ocean, the
edges of which were roughly congruent, a bit like puzzle pieces.
Continental drift, if true, would account for all of this evidence. Like the shape of the
continents on Earth, everything—all the observations—would then seem to fit together.
Various other kinds of evidence came to light in investigations carried out in a diversity
of fields of science, all supporting continental drift. Today, continental drift is part of the
accepted theory of plate tectonics.
What kind of inference pattern was used when scientists eventually reasoned, from a
variety of evidence, that the hypothesis of continental drift was true? This is clearly an
ampliative inference, in that it goes beyond what’s contained in the evidence. So, it’s not
deductive reasoning. But this doesn’t correspond very well to the pattern we’ve seen of
inductive inference either; it’s not a generalization or projection from an observation of
a certain kind, like the quality of some grapefruit or water, to the expectation of more
observations of that kind. There’s a bigger leap involved in the inference from premises
about geologic features to the conclusion that landmasses have separated and moved apart
over the course of Earth’s history.
This is an abductive inference, a type of non-deductive inference that attributes special
status to explanatory considerations. Abductive inference is also called inference to the best
explanation. The conclusion is not validly deducible from the premises, nor is it a gener-
alization or projection on the basis of the premises. Instead, in reasoning abductively to
some conclusion, one considers whether or not the conclusion, if true, would best explain
the premises. Suppose, for example, that you know your roommate Theresa had a seri-
ous accident yesterday while preparing dinner. This morning, you see her walking down
the hallway with stitches in her hand. The best explanation for the stitches seems to be
that Theresa was cut with a kitchen knife, and that, because of the severity of the cut,
she sought medical attention. So, you hypothesize that Theresa accidentally cut herself
and got the stitches from having gone to the local hospital. This conclusion might not be
true. But if it were, it would account for the available evidence. Thus, abductive infer-
ence is characterized by an appeal to explanatory considerations to conclude that some
Copyright © 2018. Taylor & Francis Group. All rights reserved.
hypothesis is true.
Reasoning that corresponds to the form of abductive inference is quite common in both
everyday reasoning and scientific contexts. Recall from earlier in this chapter Hubble’s
reasoning from empirical data to the conclusion that the universe is more than 10 billion
years old (now recognized to be at least 13.8 billion). Support for this hypothesis included
that the universe being at least 10 billion years old best explained a rich body of other
data. Among these data were the pattern Hubble observed in the redshift in the spectral
lines of distant galaxies, observations about the life cycle of stars, and the observation of
microwave cosmic background radiation. The hypothesis also coheres with fundamental
theories of physics, like the theory of general relativity, as well as various dating methods
in geochemistry. That agreement with other theories is also best explained by the truth
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 159
of this hypothesis about the age of the universe. Abductive inference from observations
and other scientific theories thus confirmed Hubble’s hypothesis.
The Dutch mathematician and scientist Christian Huygens (1629–1695) said of abduc-
tive reasoning,
One finds in this subject a kind of demonstration which does not carry with it so
high a degree of certainty as that employed in geometry; and which differs distinctly
from the method employed by geometers in that they prove their propositions by
well-established and incontrovertible principles, while here principles are tested by
the inferences which are derivable from them. The nature of the subject permits no
other treatment. It is possible, however, in this way to establish a probability which
is little short of certainty. This is the case when the consequences of the assumed
principles are in perfect accord with the observed phenomena, and especially when
these verifications are very numerous; but above all when one employs the hypoth-
esis to predict new phenomena and finds his expectations realized.
(1690/1989, p. 126)
The principles Huygens discussed are hypotheses about the nature of light that could
explain experimental results in optics. One way to interpret Huygens’s suggestion is that
hypotheses that provide good explanations of these results are probably true. The rule of
inference he is suggesting is something like the following: from a given set of observations,
infer the best explanation of those observations.
I call all such inference by the peculiar name, abduction, because its legitimacy
depends upon altogether different principles from those of other kinds of infer-
ence. The form of inference is this: the surprising fact, C, is observed; but if A
were true, C would be a matter of course, [and hence], there is reason to suspect
that A is true.
(1903/1940, p. 151)
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
160 Patterns of Inference
method’s scheme for confirmation. Recall that the confirmation scheme was not deduc-
tively valid; the role for deductive inference was in the refutation scheme.
There’s an extra element present in the abductive inference scheme characterized
here though, beyond what’s contained in the pattern of affirming the consequent or
H-D confirmation. This extra element is the reference to a level of surprise regarding
the observed fact. An abductive argument can’t be used to infer that any antecedent is
true simply from the fact that its consequent is. Instead, the idea is that if the anteced-
ent accounts for a consequent that would otherwise be left unexplained, then this is
grounds for believing the antecedent is true. The power of a hypothesis to explain what
is otherwise unexplainable is a reason to infer it is (probably) true.
Abductive inference thus differs from both deductive and inductive inference.
Abduction looks like a kind of deductive inference, but it is deductively invalid. Like
inductive inference, abductive reasoning is thus a form of non-deductive inference. It is
thus ampliative and non-monotonic, and the quality of arguments is a matter of degree.
But unlike induction, abductive inference does not generalize or project from what has
been observed. The special weight abductive inference accords to explanatory consider-
ations means that its conclusions are harder to predict from existing observations.
It’s not clear how to characterize the idea of some hypothesis best explaining some set
of observations. How should a hypothesis relate to the observations in order to explain
them? Abductive inferences seem to rely on an inferential ‘leap’—a leap in the reasoning
of one or more scientists having an ‘aha!’ moment, of seeing how some new idea about
the world might explain otherwise puzzling observations. Scientists employing abductive
inference in favor of a hypothesis need to hope that their audience grasps the connection,
that their audience sees how the hypothesis accounts for the observations. It’s not clear
whether there is anything definitive that can be said about what it takes for a hypothesis
to accomplish that task.
One suggestion is that a hypothesis best explains a set of observations if it predicts
the observations, that is, if it shows why the observations were to be expected. By
itself, this isn’t enough to make for a good explanation. Just saying that the observa-
tions in fact occurred is a way to make those observations unsurprising, but it doesn’t
explain anything. Explanations must also have some other qualities. Perhaps explana-
tions should also be simple, fit with other explanations we already accept, and generate
new expectations for what we will observe. These qualities seem to make an abduc-
tively inferred hypothesis—a best explanation—enlightening, as well as a ‘bold and
risky conjecture’. We have emphasized the value of the latter periodically throughout
Copyright © 2018. Taylor & Francis Group. All rights reserved.
this book. Indeed, qualities like simplicity, coherence with other explanations, and
fecundity of new ideas have been shown to play central roles in people’s assessment
of explanatory goodness.
Like inductive inferences, the goodness of abductive inferences comes in degrees. Given
the difficulty of pinning down the definition of a best explanation, it’s worth consider-
ing what features of abductive inferences contribute to their strength. First, it seems the
number and variety of surprising observations that a hypothesis explains contributes to its
strength. The abductive inference to continental drift became stronger over the decades,
as geological observations accumulated that would be expected if continental drift had
occurred and that would be surprising otherwise. Second, the degree of an observa-
tion’s surprisingness and the degree to which the hypothesis dispels the surprisingness
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 161
contributes to the strength of abductive inference. The finding of a rift down the center
of the Mid-Atlantic Range with significant seismic activity is pretty shocking without
different parts of the Earth’s crust moving in (very slow) motion. Third, if appealing
to features of the hypothesis like simplicity, coherence, and fecundity is the right way
to characterize its value as an explanation, then the degree to which those features are
possessed by the hypothesis contributes to the strength of the grounds for the inferential
leap to the truth of this explanation.
An amazing discovery in Morocco illustrates how scientists can appeal to a hypothesis’s
explanatory virtues as evidence in support of the hypothesis. Before this discovery, fossils
from Ethiopia were commonly regarded as the first anatomically modern humans, early
representatives of our species Homo sapiens. These fossils indicated that humans evolved
relatively quickly in a specific region of Africa about 200,000 years ago. The discovery
of new fossils from an archeological site in Morocco, named Jebel Irhoud, challenged
this conclusion (Hublin et al., 2017). In Jebel Irhoud, archeologists and evolutionary
anthropologists found several specimens of stone tools and human bones, including a
remarkably complete jaw and skull fragments. The researchers used dating techniques to
determine that the remains were about 315,000 years old. If these were Homo sapiens,
this would push back the origin of our species by about 100,000 years. This would also
suggest that humans did not evolve only in eastern sub-Saharan Africa (modern Ethiopia)
but in multiple locations across the African continent.
The previously favored hypothesis that Homo sapiens evolved in eastern sub-Saharan
Africa around 200,000 years ago could explain the findings at Jebel Irhoud as remains
from some hominid species that lived prior to Homo sapiens, perhaps the Neanderthals.
The Jebel Irhoud findings also prompted a new hypothesis, though: that the Homo sapi-
ens species’ evolution was a pan-African process that occurred about 300,000 years ago.
This new pan-Africa hypothesis was simpler than the previously favored hypothesis, as
it doesn’t require positing an archaic hominid species in North Africa, later replaced by
Homo sapiens. The pan-Africa hypothesis also cohered with archeological and anatomi-
cal observations about Neanderthals and Homo sapiens. For example, the teeth found in
Jebel Irhoud better matched what would be expected for Homo sapiens than what would
be expected for Neanderthals. The morphology of the skull was almost indistinguishable
from that of anatomically modern humans. And the pan-Africa hypothesis is consistent
with geographical and ecological evidence that the Sahara was green, filled with rivers,
and hospitable around 300,000 years ago. Animals like gazelles and lions inhabiting the
East African savanna then also populated the Saharan region and migrated to northwest
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Africa. In fact, remains of plants and animals indicate biological and environmental con-
tinuity between those regions.
Finally, the pan-Africa hypothesis explained a greater number of diverse observations
about human origins than the East Africa hypothesis, including the mix of anatomical
features seen in the Jebel Irhoud remains and in other Homo sapiens–like fossils from
elsewhere in Africa. It also better fits with genomic evidence collected in South Africa
that seems to indicate that the lineage split between archaic hominid species and anatomi-
cally modern humans occurred more than 260,000 years ago. Explanatory considerations,
including simplicity, coherence, and fecundity, thus favored the pan-Africa hypothesis. The
researchers involved in the Jebel Irhoud discovery concluded that ‘the Garden of Eden
in Africa is probably Africa—and it’s a big, big garden’ (Callaway, 2017).
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
162 Patterns of Inference
Testimony
The testimony of others plays a central role in reasoning. Many of your beliefs actually
originate from what other people think is true, and this is the same in science. Belief
in others’ testimony is a key component involved in the system of trust and skepticism
we’ve said is crucial for science. Suppose that you are a resident of Flint, Michigan. You
attend a community meeting, where the governor, Rick Snyder, reports that the city water
is safe to drink. To demonstrate this, he himself drinks some tap water. On the basis of
this testimony, you infer that the water isn’t toxic. Later, you learn about the results of
the water quality testing by the EPA and Virginia Tech scientists. This new information
Copyright © 2018. Taylor & Francis Group. All rights reserved.
undermines your earlier inference on the basis of the governor’s testimony; you no lon-
ger believe Flint’s water is safe to drink. Later still, Virginia Tech scientist Marc Edwards
reports that Flint’s water is getting better and is far less risky to drink if one uses a high-
grade water filter. You are willing, again, to update your beliefs based on testimony. You
probably wouldn’t take the governor’s word for it at this stage, but given Edwards’s role
as an outside scientific investigator, you take his word for it.
Because science is so collaborative, several scientists—sometimes even thousands of
scientists, like at CERN or NASA—typically conduct research together. In these cases,
they rely on the specific expertise and the honesty of collaborators. This trust in the testi-
mony of collaborators—believing that collaborators are also operating, like yourself, under
norms of sincerity and accuracy—is essential for many scientific projects. Reliance on the
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 163
EXERCISES
4.20 Decide whether each of the following inferences is deductive, inductive, or abduc-
tive. Provide a justification for each of your answers.
1. Disorder in a system will increase, unless energy is expended. Your home is a
system. So, disorder will increase in your home unless energy is expended.
2. The president says that human activities are not a cause of global warming.
Therefore, human activities are not a cause of global warming.
3. There is no such thing as drought in Australia. The town of Darwin is in Australia.
Therefore, the town of Darwin needn’t ever make plans to deal with drought.
4. Bread appears to grow mold more quickly in the bread bin than the fridge.
Therefore, temperature determines the rate of mold growth.
5. Over two million people on Twitter say that aliens are coming to Earth, which is
Copyright © 2018. Taylor & Francis Group. All rights reserved.
more than the number of people on Twitter who are not saying it. So, aliens are
coming to Earth.
6. All mathematicians like math. Jun is a mathematician. Therefore, Jun loves math.
7. Gravity has always operated in the universe. So, gravity will continue to oper-
ate in the universe.
8. The weather forecast indicates that tomorrow will be sunny. So, tomorrow will
be sunny.
9. My brother has black hair, as does my father. Therefore, everyone related to me
has black hair.
10. The library has millions of books. I have a book in my hand, and I just left the
library. Therefore, the book was borrowed from the library.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
164 Patterns of Inference
4.21 Assess the quality of each of the arguments in 4.20, using the proper standard for
its form (deductive, inductive, or abductive). Explain your reasoning. For any bad
arguments, assess whether they would be better arguments of a different inferential
pattern (inductive instead of deductive, for example). If so, reclassify those argu-
ments to be of the pattern they are better at achieving.
4.22 Decide whether each of the following inferences is deductive, inductive, or abductive. If you
aren’t 100% sure of your answer, you should also provide a justification for your decision.
1. Whenever it rains, the streets get wet. The streets are wet now. Therefore, it must
have rained.
2. Of the students interviewed, 65% say that they prefer Italian to French wine.
Therefore, all students prefer Italian wine.
3. A medical technology ought to be funded if it has been used successfully to treat
patients. Adult stem cells have been used to treat patients successfully. There-
fore, adult stem cell research and technology ought to be funded.
4. The murder weapon has Pat’s fingerprints on it. Therefore, Pat is the murderer.
5. Sociologists agree that global inequality has decreased because of economic
liberalization in China and India. Therefore, it must be true that global inequal-
ity has decreased.
6. Studies found a strong correlation between IQ scores and language competence.
Therefore, if a person has a high IQ score, that person has high linguistic competence.
7. The witness testified that a paisley yellow car caused the accident. Given how unmis-
takable paisley is, it’s very likely that a paisley yellow car did cause the accident.
8. These beans have been randomly selected from this 25-pound bag, and they
are black. So, it is likely that all the beans from this bag are black.
9. The best explanation of the acquisition of language is that we possess an innate
universal grammar. So we must possess an innate universal grammar.
10. Leaded gasoline and lead pipes were both used for a while but eventually dis-
continued. So, all lead products are toxic.
4.23 Assess the quality of each of the arguments in 4.22, using the proper metric for its
form (deductive, inductive, or abductive). Explain your reasoning. For any bad argu-
ments, assess whether they would be better arguments of a different inferential pat-
tern (inductive instead of deductive, for example). If so, reclassify those arguments
to be of the pattern they are better at achieving.
4.24 Define deductive inference, inductive inference, and abductive inference in your
own words, and give an example of each.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
4.25 Consider each of the patterns of inference, deductive, inductive, and abductive,
as an account of hypothesis-testing. For each account, describe what features of
hypothesis-testing it captures well and at least one drawback or limitation it faces.
4.26 The conclusion that Flint’s water supply is toxic is based on substantial evidence.
Nonetheless, the inference to that conclusion is non-monotonic. What kinds of new
information could you learn that would undermine the inference to that conclusion?
Give three examples.
4.27 We have said that good inductive arguments are strong, but we haven’t said much
about what it takes for an inductive argument to count as strong. Consider what
we’ve learned about inductive inference and examples of inductive inference we’ve
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
Patterns of Inference 165
encountered, as well as the features of experiments and other studies from Chapter 2.
List at least three features an inductive argument could have that contribute positively
to its strength.
4.28 Consider this passage from Darwin’s Origin of Species (1872: 421):
It can hardly be supposed that a false theory would explain, in so satisfactory a
manner as does the theory of natural selection, the several large classes of facts
above specified. It has recently been objected that this is an unsafe method of
arguing; but it is a method used in judging of the common events of life, and
has often been used by the greatest natural philosophers. The undulatory theory
of light has thus been arrived at; and the belief in the revolution of the earth
on its own axis was until lately supported by hardly any direct evidence. It is
no valid objection that science as yet throws no light on the far higher problem
of the essence or origin of life. Who can explain what is the essence of the
attraction of gravity? No one now objects to following out the results consequent
on this unknown element of attraction; notwithstanding that Leibnitz formerly
accused Newton of introducing ‘occult qualities and miracles into philosophy’.
What ‘method of arguing’ do you think Darwin had in mind? What objections to this
method of arguing does he consider, and how does he dispute those objections?
4.29 Thinking about whatever examples of science you want, from elsewhere in this book
or other sources, come up with a clear instance of inductive inference. (This should
be a more realistic example than grapefruit choosing.) Put the inference in standard
argument form with numbered premises and a conclusion (as best you can), and
then assess its strength. If you were a scientist focused on this inference, what kinds
of steps could you carry out to additionally support the conclusion?
4.30 Thinking about whatever examples of science you want, from elsewhere in this book
or other sources, come up with a clear instance of abductive inference. Why should
this be viewed as an abductive inference? Assess the explanatory strength of the
inference. If you were a scientist focused on this inference, what kinds of steps could
you carry out to additionally support the conclusion?
4.31 Describe one instance in which you would, or in fact have, taken someone’s word
for something, that is, used testimony as grounds for belief. Then, try to characterize
this as an abductive inference. Is abductive inference a good way to think about this
use of testimony? Why or why not?
Copyright © 2018. Taylor & Francis Group. All rights reserved.
4.32 Describe an instance in which you would not, or in fact have not, taken someone’s
word for something, that is, used testimony as grounds for belief. What was differ-
ent between this situation and the situation you described in your answer to 4.31?
Does consideration of the features of good abductive inferences account for the dif-
ference? Why or why not?
FURTHER READING
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
166 Patterns of Inference
For more on conditional reasoning, see Nickerson, R. (2015). Conditional reasoning: The
unruly syntactics, semantics, thematics, and pragmatics of “if”. Oxford: Oxford Univer-
sity Press.
For an in-depth summary of the Flint water crisis, see www.cnn.com/2016/03/04/us/flint-
water-crisis-fast-facts/index.html
For Hume’s problem of induction, see Hume, D. (1748/1999). An enquiry concerning
human understanding, ed. T. L. Beauchamp. Oxford/New York: Oxford University
Press. Sections 4–6.
For a helpful guide to Hume’s problem, see Salmon, W. (1975). An encounter with David
Hume. In J. Feinberg (Ed.), Reason and responsibility (3rd ed., pp. 245–263). Encino:
Dickenson Publishing Co.
For a different version of the problem of induction, see Chapter 3, entitled ‘The new
riddle of induction’ Goodman, N. (1983). Fact, fiction and forecast. Cambridge: Har-
vard University Press.
For more on abductive reasoning, see Lipton, P. (2003). Inference to the best explanation.
New York: Routledge.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-24 00:11:18.
CHAPTER 5
• Give three new examples of situations involving statistical reasoning and describe
how statistical reasoning is involved in each
• Characterize the difference between descriptive and inferential statistics
• Define probability theory and say how it relates to statistics
help decide whether the hypothesis is a good one. We’ve seen that determining exactly
what expectations follow from a hypothesis can be tricky. This is especially the case
whenever there is variation in how events unfold. Variation means that we get different
results when we repeat measurements. As the study of variation is central to statistical
reasoning, statistics provides tools to help determine what a hypothesis should lead us to
expect. It thus contributes to hypothesis-testing in science. Because statistical reasoning
can be construed as a kind of inductive reasoning, it also helps us extend from what we
think we know about the world to make predictions that we’re less certain of—as when
we predict the weather or driving time based on traffic conditions.
All of this will become clearer as we dive into statistical reasoning in this and the next
chapter. For now, simply notice that, if all of this is true, then it’s no exaggeration to say
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
168 Statistics and Probability
that our health, wealth, and happiness all hinge upon understanding and communicating
statistics.
Up until the 18th century, the word statistics (from Latin for ‘state’) meant any data
relevant to running a nation or country. These data included demographic and economic
information relevant to the condition of the country—for example, about birth and death
rates, individual and national wealth, and level of employment. Today, statistical reasoning
is applied to virtually any kind of data—from data concerning the performance of bas-
ketball players to data about casinos, medical diagnoses, and issues of global importance
like anthropogenic climate change.
Consider these three scenarios:
1. Arguing about basketball: Your friend says that LeBron James is a better basketball
player than Michael Jordan. You disagree. You remind her that Jordan won the NBA
Championship for the last time in 1997–1998, playing 82 games. Over that season,
he scored a total of 2,357 points, rebounded 475 missed shots, and made 283 assists.
His free throw percentage was .784. No way LeBron is better than MJ! Your friend
responds that the first time LeBron won the NBA Championship in 2011–2012, he
played 79 games, scoring a total of 2,111 points, rebounding 590 missed shots, and
making 554 assists. LeBron’s free throw percentage was .759. And he’s a monster for
stealing balls and blocking shots.
As this example shows, people often appeal to statistical evidence in sports, perhaps to
support the claim that some sports player is best or to argue that some team is likely to
win the next game.
2. Playing roulette: Imagine that you’re at a casino in Monte Carlo, eager to play roulette.
The wheel includes 37 colored and numbered pockets, of which 18 are black, 18 are
red, and one is green. If you bet €10 on red, and the winning color is red, then you will
win €20—and likewise if you bet €10 on black and black wins. Now, imagine that the
winning color has been black for 26 times in a row. You might bet on red, reasoning
that red should come up very soon since there have been so many black wins.
You are making a prediction based on past occurrences, and your prediction is based on
statistical reasoning. (This is also flawed statistical reasoning, as we’ll see.)
Copyright © 2018. Taylor & Francis Group. All rights reserved.
3. A medical test: You have a sore throat, so you go to the doctor. The doctor examines your
throat and calls for a ‘rapid strep test’. While you wait for her to return with the results,
you ponder how you should react. What if she tells you the test was negative? Does this
mean you don’t have strep throat? Not necessarily. It means there’s an approximately 95%
chance that you don’t have strep throat. If you have all the symptoms of this illness though,
your doctor may want to follow up with another test—a ‘strep culture’—to verify the neg-
ative result. There might be strep bacteria lurking there, undetected. What if your doctor
tells you the test was positive? Then you can be pretty certain you do have strep bacteria
in your throat. However, about 20% of people are carriers for strep. This means that even
if strep bacteria are present, there’s a chance this isn’t the cause of your sore throat.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 169
The rapid strep test—like most medical tests—gives you statistical data. You and your
doctor then need to decide how to interpret the data, whether they are evidence for a
particular conclusion, and what steps to take next.
Each of these three cases involves the collection, presentation, analysis, and interpre-
tation of statistical information. Reasoning with statistical information is everywhere!
Learning to reason better with statistics can thus help you make good decisions about
questions concerning your health, wealth, and happiness—and basketball too.
question or how difficult it would be to collect data about all stars of the Milky Way
galaxy. For this reason, scientists regularly obtain data about a subset of the population
they are interested in. This subset is a sample of the population, and the data concerning
individuals in this subset are sample data.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
170 Statistics and Probability
randomness does not mean haphazard or lacking aim or purpose. Instead, randomness is
a measure of uncertainty of an outcome and applies to concepts of chance, probability,
and information.
The simplest examples of randomness are things like coin tosses and dice throws. In
a normal roll of a standard die, you can’t possibly know whether you’ll roll a one, two,
three, four, five, or six. But you do know that if you roll that die 500 times, or roll 500
dice, you probably won’t roll a six every time. The word probably is important there.
Probability theory actually enables us to calculate what that probability is; it can tell us
exactly how unlikely it is to roll a six 500 times in a row.
These kinds of probability calculations are put to work in statistical reasoning. For
example, suppose you use probability theory to work out the chance of all possible
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 171
different outcomes for 500 dice rolls. In working out these probabilities, you assume the
die is fair; that is, you assume each possible outcome—one, two, three, four, five, six—is
equally likely on each roll. If you then roll a die 500 times, and six comes up 200 times,
you can use those probabilities to infer that this is a somewhat improbable, or unlikely,
outcome. You can use statistics to decide, based on the level of improbability, whether
something’s fishy—whether, perhaps, your die isn’t fair after all.
So, statistical reasoning relies on mathematics and in particular on probability theory.
But it doesn’t just boil down to running calculations. It is much more important to
understand the meaning of the numbers, probabilities, and equations behind the statistics.
Acquiring this understanding will help make you a stats-savvy person, someone who can
critically examine claims based on statistical reasoning in science and in everyday life and
who can better handle the barrage of statistical information that fills our lives. This will
be our focus in this chapter and in Chapter 6 as well. In this chapter, we’ll work through
some basic concepts of probability theory, and then discuss descriptive statistics. Then, in
Chapter 6, we will turn our focus to inferential statistics.
EXERCISES
5.1 First, describe the difference between a sample and a population. Second, state
whether the following statements refer to a sample or to a population:
a. Researchers found that 2% of the Americans they interviewed believed they had
seen a UFO.
b. Based on their survey data, the researchers concluded that one in three of all car
crashes in the country are linked to alcohol impairment.
c. Two-thirds of the butterflies we observed were pink.
d. After reading four essays, the teacher expects that 85% of the class will pass the
exam.
e. Twenty-five percent of the planets in the Solar System have no moons.
f. More than one billion people in the world live on less than one dollar a day.
5.2 What is the difference between descriptive statistics and inferential statistics? Indicate
whether each of the following statements is based on descriptive or inferential statis-
tics, and explain why.
a. As of 2017, the director Quentin Tarantino has received a total of two Academy
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Awards.
b. Students with an undergraduate GPA of 3.00 are expected to have a starting
salary of $30,000.
c. In 2016, the population of São Paulo, Brazil, was 12,038,175.
d. The mean grade in the class was B+.
e. A study stated that British adults are nearly 12 kilograms (26 pounds) heavier
now than they were in 1960.
f. Economists say that mortgage rates may soon drop.
g. The gross national income per capita in South Sudan in 2013 was $2.
h. According to World Health Organization data published in 2015, life expec-
tancy in Bangladesh is 71.8 years.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
172 Statistics and Probability
5.3 Describe what probability theory is in your own words. Then, looking back at the defi-
nitions of descriptive statistics and inferential statistics, describe how you think statistics
relies upon probability theory. Name three everyday situations where probability
theory is used.
5.4 Find a news article or opinion column published in the past month that uses statistical rea-
soning of some kind. After citing the source, write a paragraph describing the following:
a. The main point of the article or column
b. What statistics are provided
c. How the author makes use of statistics in his or her reasoning
d. How good this use of statistical reasoning seems to be and why (or why not)
5.5 Statistical reasoning pervades our lives, often in ways we don’t realize. After reflect-
ing on your daily routine, write out a list of 10 ways in which variation, statistical
reasoning, and probability are part of that routine, either explicitly or implicitly.
• Define these seven terms: random variable, outcome space, mutually exclusive, collec-
tively exhaustive, total probability, statistical independence, and conditional probability
• Calculate the probability of multiple outcomes occurring (together or individually)
based on the probabilities of individual outcomes
• Calculate conditional probabilities
Random Variables
The number rolled on a die and whether a coin lands on heads or tails are both random
variables. Random variables have different values that are individually unpredictable
but predictable in the aggregate. You can’t predict whether a coin will land on heads or
tails, but you can predict that lots of coin tosses will give you roughly equal numbers
of heads and tails. The set of all values a random variable can have is called its outcome
space, or sample space.
Let’s work through these ideas using the simple coin-toss example. The random vari-
Copyright © 2018. Taylor & Francis Group. All rights reserved.
able involved in a coin toss is the figure shown on the top of the coin. We can refer to
this variable with a capital letter, say, X. The set of possible values of X—its outcome
space—is heads and tails: these are all the values our random variable can possibly take.
To distinguish the variable from its possible values, we will refer to the values of a ran-
dom variable with small letters, in this case, say, h and t. We can now define the outcome
space of a coin toss as follows:
X = {h; t}
(The symbols ‘{’ and ‘}’ are curly braces, which is the conventional notation used to indi-
cate a set, that is, any abstract grouping of items.)
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 173
Random variables are the building blocks of probability theory and, in turn, of statisti-
cal reasoning. Probabilistic reasoning begins with the observation of how probable it is
for a random variable to take on any given value. For our coin-toss example, there’s a
100% chance that the coin lands on either heads or tails (since this is the whole outcome
space). Probabilities vary between 0 (maximally improbable) and 1 (maximally probable),
so we write this as follows:
Pr(X=h or t) = 1
No matter how many values a random variable can have, that whole set of values—its
whole outcome space—has a probability of 1. This means it’s guaranteed that the variable
will take on one of those values. The total probability of an outcome space is always 1.
The outcomes in any outcome space have two important properties: they are mutually
exclusive and collectively exhaustive. Mutually exclusive outcomes occur when no more
than one of the outcomes can occur at any given time. On a single coin toss, you might
get heads or tails, but you will never get both. Heads and tails are mutually exclusive
outcomes. Collectively exhaustive outcomes occur when at least one of the outcomes
must occur at any given time. For a successful coin toss, the coin must land heads up
or tails up—there is no third option. This means that heads and tails are collectively
exhaustive outcomes.
Now, if the coin is fair, then the probability of the coin landing on heads will equal
the probability of it landing on tails. That is, for a fair coin, Pr(X = h) = Pr(X = t). Since
we already know the probability of the whole outcome space together is 1, and there are
two equally probable outcomes in that outcome space, we can calculate that:
Because there are two equally probable outcomes, each outcome has a probability of ½
(.5 or 50%). That’s just the total probability for the outcome space (which is always 1)
divided by the number of possible outcomes (which is two, in this case). To generalize,
for any random variable with equally probable outcomes, the probability of one of those
outcomes is one divided by the number of possible outcomes. So, for a fair, six-sided die,
the probability of rolling any one number is one divided by six, or 1⁄6.
A random variable that is not fair is biased in favor of one or more outcomes. This
means one or more outcomes are more likely—have a higher probability of occurrence—
Copyright © 2018. Taylor & Francis Group. All rights reserved.
than other outcomes. French roulette is fair, but American roulette is not—at least not
in the statistical sense of fairness. This is because a French roulette wheel has 37 pockets,
numbered zero through 36, whereas an American roulette wheel has 38, two of which
are zeroes. In the latter case, the roulette is biased toward zero, because zero will occur
more often than any other number on the wheel if we spin the roulette over and over
again. More precisely, in American roulette, the probability of getting any number from
one to 36 is 1⁄38, while the probability of getting zero is 2⁄38, or 1⁄19.
There is another way in which a roulette, or any series of outcomes, might be unfair.
A series of outcomes might have ‘memory’, in the sense that previous outcomes might
influence future outcomes. In one of the scenarios described at the beginning of the
chapter, we imagined a person who thought roulette wheels work in this way. This person
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
174 Statistics and Probability
reasoned, ‘Red should come up very soon since there’s been so many black wins. So, I’ll
bet on red!’ If the roulette were unfair because it had memory, this might be good rea-
soning: the roulette might change to red because there had been lots of black wins. But
fair roulette spins have independent outcomes: the probability of each outcome is not
influenced by past outcomes. So, in order to be fair, roulette spins and any other random
variables must be independent of one another.
To summarize, a random variable must be unbiased and its outcomes must be inde-
pendent for the random variable to be fair. Coin tosses, dice throws, and French roulettes
are all examples of fair random variables.
Lots of random variables are unfair. For example, LeBron James’s free throw success
is a random variable. Let’s call this variable Y. There are only two possible outcomes:
LeBron either misses the free throw or scores. So, this random variable has an outcome
space of Y = {miss; score}. So far, this is simple. The problem is that the chance of LeBron
scoring versus missing is probably not 50⁄50. There is a bias in favor of the outcome of
scoring; for LeBron James, this is more likely than missing. The outcomes might also fail
to be independent: missing a shot might make LeBron more, or less, likely to score on
the next free throw.
It’s much more difficult to calculate probabilities for unfair variables like free throw
success. So, for now, we’ll stick with fair random variables, like coin tosses and dice throws.
These probabilities can be found using simple addition. Consider the example of roll-
ing an even number. This can be expressed as: Pr(D = 2 or D = 4 or D = 6). We already
know that each of those three outcomes has a probability of 1⁄6. The probability that
any of those outcomes occurs on a given roll is just the probability of each outcome, all
added up together as follows:
Beware! Adding probabilities in this way only works for mutually exclusive outcomes.
If we wanted to ask about the probability of rolling an even number or a five, we could
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 175
just add in another 1⁄6, yielding 4⁄6 or 2⁄3 as the probability. But this doesn’t work if someone
asked us about the probability of rolling an even number or a six. Because six is one of
the even numbers on the die, the outcomes of rolling a six and rolling an even number
are not mutually exclusive. You can’t simply add up the different probabilities to find
the answer. In the case of rolling an even number or a six, the probability is the same as
it was for rolling an even number (since rolling a six is one way to roll an even number).
The probability is still ½.
This way of calculating probabilities is called the addition rule. This rule says that
the probability that any of a series of outcomes will occur is the sum of their individual
probabilities. It’s very important to ensure that the requirement of mutually exclusive
outcomes is met. If not, addition will lead you astray.
The probability of 1⁄36 is a lot closer to zero than to 1⁄6. That’s why rolling two sixes or
two ones—‘snake eyes’—is exciting. It seldom happens!
Beware though! There’s an important condition for multiplying probabilities as well.
They must satisfy the independence condition. This means that the probability of each
outcome must be independent from one another. Each outcome must not influence the
probability of the other outcomes. Think of it this way. If, instead of calculating the prob-
ability of rolling two sixes on two dice, we wanted to calculate the probability of rolling a six
on one die roll but also a one on the very same die roll—Pr(D1 = 6 and D1 = 1)—we can’t just
multiply 1⁄6 × 1⁄6. These outcomes aren’t independent. In fact, they are mutually exclusive:
if one occurs, the other is guaranteed not to occur. This means the probability in question is
maximally improbable: it’s zero. So, we can only use multiplication to find the probabilities
of a series of outcomes all occurring if the outcomes in question are independent.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
According to the multiplication rule, the probability that all of a series of outcomes
occurs is the result of multiplying their individual probabilities. Again, if the requirement
of independent outcomes is not met, then multiplication will lead you astray. When two
events are not independent, the probability that both happen depends on the nature of the
connection between the events. Simple multiplication won’t work.
Let’s take a moment to compare the multiplication rule with the addition rule. We
saw that the addition rule is used to calculate the probability of any of a series of mutu-
ally exclusive outcomes occurring. You could ask about the probability of getting a six
or a one on a given roll. (They have to be on the same roll to be mutually exclusive
outcomes.) To calculate this, we would add 1⁄6 and 1⁄6 to get 2⁄6, or 1⁄3. The multiplication
rule is instead used to calculate the probability of all of a series of independent outcomes
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
176 Statistics and Probability
TABLE 5.1 Addition, multiplication, and subtraction rules and their conditions
occurring. You could ask about the probability of getting a six on one roll and a one on
a different roll. (They have to be different rolls or different dice to be independent out-
comes.) To calculate this, we would multiply 1⁄6 and 1⁄6 to get 1⁄36.
In these two examples, notice that the addition rule led to a larger probability (closer
to 1) and the multiplication rule led to a smaller probability (closer to 0). This will always
happen. Addition will always increase probability, and multiplication will always decrease
probability. This is because probabilities are always positive numbers between 0 and 1,
and multiplying two numbers in that range (such as two fractions) always yields a smaller
number while adding two positive numbers of any kind always yields a larger number.
This can provide a quick way to remember when to add and when to multiply. Do
you expect the probability to get larger or smaller for the occurrence you’re calculating,
compared to the outcomes that generate it? It’s easier (more probable) to get any of a
one, two, or three on a die roll than each one of these numbers individually: use addi-
tion. Any, or, addition, and larger probabilities go together. And the outcomes linked with
the word or need to be mutually exclusive. It’s harder (less probable) to get a six on all
the first roll, second roll, and third roll than on a single roll: use multiplication. All, and,
multiplication, and smaller probabilities go together. And the outcomes linked with the
word and need to be independent. This is all summarized in Table 5.1.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 177
The subtraction rule is only for collectively exhaustive outcomes. Just as with the require-
ments placed on the addition and multiplication rules, the requirement of collectively
exhaustive outcomes is crucial. This is what makes probabilities sum to 1. This require-
ment is most easily satisfied with the use of the word not—rolling a two and not rolling
a two, rolling an even number and not rolling an even number, rolling two sixes in a row
but not rolling two sixes in a row. Each of these pairs is collectively exhaustive; any pos-
sible outcome would fall in one or the other category.
So, the main word to prompt you to use the subtraction rule is not, which is
one way of guaranteeing collectively exhaustive outcomes. This is summarized in
Table 5.1.
Conditional Probability
There’s one final probability concept we need to discuss: conditional probabilities.
Sometimes it can be useful to know how the probability of some event changes in light
of other events occurring. The conditional probability of an event is the probability of
its occurrence given that some other event has occurred. In the notation we’ve been
developing, we can write the conditional probability of a random variable Y taking the
value y, given that a variable X takes the value x as Pr(Y=y | X=x). The symbol ‘|’ can be
read as given that.
Notice that, for two independent events, the conditional probability of one event given
the other’s occurrence will be the same as the original probability of the event. Indeed,
the concept of conditional probability enables us to more exactly articulate what inde-
pendence amounts to. Two random variables X and Y are statistically independent when
Pr(Y=y | X=x) = Pr(Y=y) and Pr(X=x | Y=y) = Pr(X=x). This means that the outcome x
occurring doesn’t make the outcome y any more or less likely, and the outcome y occur-
ring doesn’t make the outcome x any more or less likely.
If an event y is not statistically independent from an event x, then the probability of
y occurring goes up or down if x occurs. In extreme cases, one event can result in the
probability of another event becoming 1 or 0. For example, the probability of a die roll
resulting in an even number is ½. But the probability of an even number given that you
roll a two is 1, since rolling a two is one way of rolling an even number. The probability
of an even number given that you roll a three is 0, since three is odd. (In both cases,
we’re assuming there’s only one roll of the die.) That is:
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Pr(D1=2 or 4 or 6 | D1=2) = 1
Pr(D1=2 or 4 or 6 | D1=3) = 0
In other cases, the statistical dependence is subtler. The probability of an event might
be raised or lowered by the occurrence of another event, but not all the way to 0 or 1.
Consider again the probability of getting two sixes when two dice are rolled, which we
previously calculated to be 1⁄36. We can also ask what the probability of getting two sixes
on two rolls is, given that the first roll yielded a six. The chance of getting two sixes has
gone up if one roll is a six, but it’s still not guaranteed.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
178 Statistics and Probability
Figuring out the conditional probability in cases like this one requires calculation. For
x and y, the values of two random variables, the probability of y occurring given that the
other event x occurs can be calculated using the following conditional probability formula:
This calculation only works when the probability of x is greater than 0. Think of this for-
mula as a two-step procedure for finding the probability of y given x. First, you limit your
attention only to cases when x occurs. This is the role of Pr(X = x) as the denominator
(the bottom) of the equation. Second, you look within those cases of x occurrences for
occurrences of y. This is the role of Pr(Y = y & X = x) as the numerator (the top) of the
equation. The basic idea is that if the outcomes are restricted to only those cases when
x occurs, this becomes the new outcome space for the variable Y.
Let’s try this out to find the probability of getting two sixes in two dice rolls, given
that the first roll is a six. (To make the scenario more intuitive, perhaps imagine that you
decide to roll the dice one at a time and you’ve rolled the first but not yet the second.)
Plugging this example into the formula gives us:
Before moving on, take a moment to figure out why this equation is the right version of
the formula for calculating conditional probabilities.
We can solve this equation by plugging in the probabilities we already know and doing
some simple math. Notice that Pr(D1 = 6 & D2 = 6) and Pr((D1 = 6 & D2 = 6) & D1 = 6)
will be the same probability; in the second, D1 = 6 is just listed twice. The reason why it
shows up twice is because the first roll had to be six (D1 = 6) in order for it to be pos-
sible for both rolls to be sixes. So, plugging in the probabilities:
One nice thing about starting with this simple example is that we can check the answer.
What is the probability of rolling two sixes given that you’ve already rolled one six? This
is the same as the probability of getting a six on one roll, since that’s exactly what needs
to happen if you are to get two sixes, given that you already have one six. And we know
the probability of getting a six—or any other number, one through six—in a single die
Copyright © 2018. Taylor & Francis Group. All rights reserved.
roll is 1⁄6. So, our calculation of the conditional probability gave us the right answer.
Let’s try our hand at finding a slightly more difficult conditional probability for dice
throws. What’s the probability that you roll a number that is less than four on a die throw,
given that you roll an odd number on that throw? This is the same as asking about the prob-
ability of rolling a one, two, or three (the outcomes less than four) given that you roll a one,
three, or five (the odd outcomes). Applying our conditional probability formula, this yields:
Notice that the probability of rolling a one, two, or three and rolling a one, three,
or five is the same as the probability of rolling a one or three. Why? Because those
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 179
Pr(D<4|D=odd)
Pr(D<4) Pr(D=odd)
FIGURE 5.1 Visualization of the conditional probability of rolling a number less than four given
that you roll an odd number
are the only two ways of rolling both an odd number and a number less than four;
see Figure 5.1. Using the addition rule, the probability of rolling a one or three is
1⁄6. We can also use the addition rule to get the probability of rolling a one, three,
or five; this is 3⁄6. Plugging these into the formula above yields (1⁄6)/(3⁄6), which is
equivalent to 2⁄3. The probability of rolling a number less than four given that you’ve
rolled an odd number is 2⁄3; consulting Figure 5.1 should help you convince yourself
that this is the right answer.
Conditional probabilities are a core part of statistical reasoning. Recall the medical test
example at the beginning of this chapter; this involved conditional probabilities. We talked
about the chance that you have strep throat given that the rapid stress test was positive
and also the chance given that the test was negative. We stressed that a positive result
doesn’t guarantee strep throat, and a negative result doesn’t guarantee its absence. So:
We don’t know these probabilities exactly, so we must estimate them from data. This
is because the variables of test (rapid strep test result) and strep (having strep or not)
are like LeBron James’s free throws: the outcomes aren’t equally likely, so the variables
aren’t fair. For this reason, we can’t know the probabilities of the outcomes for rapid
strep tests, having strep throat or not, and free throw success without crunching the data.
But conditional probabilities still give us a way of thinking about your chance of having
strep throat given your test result.
Conditional probabilities also give us a way of formulating our earlier question about
whether LeBron’s free-throw outcomes are independent. We can ask whether this is so
by determining whether the following equality holds:
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
180 Statistics and Probability
This equation states that the probability of LeBron scoring on a second free throw is the
same whether or not he scored on his first free throw. This may not be true; the success
of earlier free throws may well influence the success of later free throws.
We are now set up with some basic concepts and tools for thinking about probabilities.
This is pretty much all you need to know about probability theory in order to understand
the basics of how statistical reasoning works.
EXERCISES
Copyright © 2018. Taylor & Francis Group. All rights reserved.
5.6 Define the following seven terms in your own words: random variable, outcome
space, mutually exclusive, collectively exhaustive, total probability, statistical inde-
pendence, conditional probability.
5.7 The outcome space for a standard die, D = {1, 2, 3, 4, 5, 6}, is both collectively
exhaustive and mutually exclusive. For each of the following sets of outcomes, indi-
cate whether the set is mutually exclusive, collectively exhaustive, both, or neither
(for a single die roll).
a. The outcomes 1 and 6
b. The outcomes even and not 6
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 181
marbles, bag #2 contains two black marbles, and bag #3 contains one white mar-
ble and one black marble. You pick one of the bags at random and take out one
marble. It is white. Given this, what is the probability that the remaining marble from
the same bag is also white? Explain your reasoning.
5.12 Consult the definition of statistical independence. For each of the following pairs of
events, indicate whether you expect them to be statistically independent, and say why.
a. One roulette spin and the roulette spin that comes next
b. Rolling a six and, on that same roll, getting an even number
c. What you ate for breakfast today and what Kate Middleton ate for breakfast
today
d. What you ate for dinner last night and what you will eat for dinner tonight
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
182 Statistics and Probability
• Define and calculate for a data set three measures of central tendency: mode, median,
and mean
• Define and calculate for a data set three measures of variation: range, variance, and
standard deviation
• Describe and draw a visualization of a data set based on raw data or from information
about its shape, central tendency, and variation
• Evaluate the direction and strength of correlation using a scatterplot, regression
analysis, or correlation coefficient
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 183
shapes, weather changes, and biological populations vary in countless ways. Such varia-
tion is, ultimately, what makes statistics vitally important for understanding our world.
In Chapter 2, we defined a variable as anything that can vary, change, or occur in
different states and that can be measured. By measuring a variable, we obtain the value
of the variable. A variable can have different values in different instances. For example,
different people can have different weights, and one person can have different weights
over time.
Many variables have values that are numbers, amounts, or magnitudes; these are called
quantitative variables. The values of quantitative variables like height, temperature, and
points scored in a season can be expressed with numbers, or ‘scores’. For example, the
number of points scored by a basketball player in an NBA season might be 0, 99, or
4,029. Quantitative variables can be discrete or continuous. The values of discrete vari-
ables, like points scored in a season, can be counted. One can score 99 points or 100
points in a basketball season but not 99.5 points. Variables like temperature or height are
continuous, because the set of possible values they can take consists of all real numbers
in some interval. The temperature might be somewhere between 36 and 37º Fahrenheit;
the only limitation on the number of decimal places is how precise the readings of your
thermometer are.
Not all variables have values that can be captured numerically. Instead, some must be
described with labels and categories. The values of such qualitative variables, such as
gender, blood type, and sport, are descriptive categories. For example, some values of the
variable blood type include A-positive, O-positive, and AB-negative. What are some values
of the variable sport or dietary restriction?
Collecting data is the process of eliciting information from the target phenomenon—
usually by observing or experimentally intervening—and then recording the values of
variables. In our variable world, data can be collected by measuring, counting, interview-
ing, and in many other ways. The proper method of data collection depends on whether
a variable is quantitative or qualitative, along with other considerations. Data about dif-
ferent individuals (people, firms, countries, bacteria, and so on) can give us insight into
differences among those individuals. In contrast, data collected at different times enable
researchers to study changes in the value of a variable over time. Both are important.
We might study how point-scoring varies among NBA players or how LeBron James’s
performance changed over the years.
Notice that it’s possible to get quantitative data about all variables, even qualitative
variables. This is because you can count how many individuals or how often over time a
Copyright © 2018. Taylor & Francis Group. All rights reserved.
variable takes on some value or falls within some range of values. Blood type is a qualita-
tive variable, but it’s a quantitative observation that 38% of people are O-positive (the
most common blood type). Philosophy major is another qualitative variable, the value
of which can be either yes or no. But we can go on to consider what percentage of all
students are philosophy majors.
Quantitative data about how often a variable takes on different values enable it to be
treated as a random variable. Recall from earlier in this chapter that we can use prob-
abilities to reason about random variables. The different values a variable can take on
constitutes its outcome space, and we’ve already learned how to reason about the prob-
abilities of different outcomes within that outcome space. Further, recall that the values
of random variables are individually unpredictable but predictable in the aggregate. Such
a variable might be actually random, in the sense that it’s purely a matter of chance
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
184 Statistics and Probability
what value it has in a given instance; perhaps coin tosses are like this. Or a variable may
instead be random just in the sense that we don’t know why it has the value it does in
a given instance. No one thinks that LeBron’s scoring so many points is pure chance, for
example, but we can still treat the number of points scored by an NBA player in a season
as a random variable.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
caffè macchiatos
7%
smoothies
19% americanos
36%
americanos
cortados
cappuccinos
chai tea
lattes
chai tea lattes
11% smoothies
caffè
macchiatos
cortados
3%
cappuccinos
24%
160
142
140
120 114
105 105 104
98 97
100
80
60
Copyright © 2018. Taylor & Francis Group. All rights reserved.
40
20
0
Czech Seychelles Austria Germany Namibia Poland Ireland
Republic
FIGURE 5.2 (a) Pie chart of a coffeeshop’s sales; (b) Bar chart of per capita national beer
consumption
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
186 Statistics and Probability
(For a pie chart, the categories need to add up to a whole pie, but bar charts are not
limited in this way.)
When dealing with a quantitative variable with values that don’t obviously fall into
discrete categories, a histogram is better. Like bar charts, histograms are graphical displays
of data that use bars of different heights. Unlike bar charts, the bars of a histogram are not
distinct categories. Instead, the values are grouped into numeric intervals. One example
of a variable for which this works well is the height of students in your class. It is up to
you to decide what numeric intervals to use—whether to group together 5 foot (152.4
centimeters) to 6 foot (182.88 centimeters) tall or to consider 20 centimeter groupings,
like 150–170 centimeters tall. Your decision will partly depend on the range of values, that
is, the difference between the largest and smallest values you consider. If everyone in your
class is between 5 and 6 feet tall, grouping that range together will result in an uninforma-
tive histogram. Like bar charts, bar height may reflect the total number in each interval or
their percentage.
Histograms and bar charts display visually some important features a data set can
have. If a histogram has a single peak, this shows that one value is the most common.
The most common value is called the mode, and a data set with just one such peak is
called a unimodal distribution. If there are two different peaks of similar heights, cor-
respond to the two most common values of the variable, then the histogram displays
a bimodal distribution. A histogram for class grade percentages nicely illustrates the
difference between these distributions. A common distribution of grades is unimodal:
it has one peak where the most common grades occur—often somewhere in the range
of B to C. Math and logic courses tend to have bimodal distributions instead: they
have two peaks, one near the top of the grading scale and the other in the middle or
lower part of the scale. See Figure 5.3 for example histograms of unimodal and bimodal
25
20
Number of Students
15
10
Copyright © 2018. Taylor & Francis Group. All rights reserved.
0
0 1 2 3 4 5 6 7 8 9 10
Test Score
FIGURE 5.3 (a) Histogram of a unimodal grade distribution; (b [below]) Histogram of a bimodal
grade distribution
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 187
18
16
14
Number of Students
12
10
8
6
4
2
0
0 1 2 3 4 5 6 7 8 9 10
Test Score
distributions. Finally, if the height of the different bars in the histogram is the same
for all values, then it shows a uniform distribution where all values are equally likely.
Grading distributions are rarely uniform. In contrast, 1,000 dice throws should have an
approximately uniform distribution across the individual outcomes of one, two, three,
four, five, and six.
Apart from examining the number of peaks in a histogram or bar chart, it can be
useful to determine whether the graph is symmetric—that is, whether its right and
left portions are the same. Symmetric graphs can have (1) a uniform distribution, (2) a
ᑌ-shape, or (3) a ᑎ-shape. A ᑌ-shape is a bimodal distribution, where large and small
values are the most common. A ᑎ-shape is a unimodal distribution with the most com-
mon values clustered around the middle, with decreasingly common outcomes as the
values get higher and lower. This ᑎ-shaped distribution, called a bell curve or normal
distribution (also known as Gaussian distribution), is especially important in statistics.
We’ll discuss normal distributions further in Chapter 6. For examples of these three
shapes of symmetric distributions, as well as three examples of asymmetric distribu-
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
7
Number of Students
5
0
0 1 2 3 4 5 6 7 8 9 10
Test Score
9
8
Number of Students
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
Test Score
16
14
12
Number of Students
Copyright © 2018. Taylor & Francis Group. All rights reserved.
10
0
0 1 2 3 4 5 6 7 8 9 10
Test Score
FIGURE 5.4 Examples of (a) uniform, (b) ᑌ-symmetric, and (c) ᑎ-symmetric distributions;
(d [below]) Examples of asymmetric distributions
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 189
14
12
Number of Students
10
0
0 1 2 3 4 5 6 7 8 9 10
Test Score
9
8
Number of Students
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
Test Score
variable. There are also two ‘tails’, the values that are less and less common the fur-
ther away they are from the most common group in the middle. The distribution of
occurrences across all these values, even the uncommon ones, is the variability of the
data set.
The central tendency and variability of a data set can both be measured. This enables
us to summarize the main features of a data set with just a few simple numbers. Let’s
consider measures of central tendency first.
Imagine this situation. Your instructor has just returned the class’s first quiz. You
see that your grade is 6⁄10; your percentage of correct answers is 60%. How should
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
190 Statistics and Probability
TABLE 5.2 Imagined data set and central tendencies for 17 student scores on 10-point Quiz 1
#9 7.5
you react? Perhaps you judge that you performed poorly—a 60% is quite low, isn’t
it? Yet another reaction might be to withhold judgment until you have additional
information. You might want to compare scores with your classmates or inquire how
the class performed as a whole or ‘on average’. This additional information about the
distribution of scores would help you know whether you did poorly on the quiz and,
if so, how poorly.
Imagine the students’ grades are as shown in Table 5.2. Your instructor can provide
you with three different answers to the question of how the class on average did on the
quiz. These correspond to three different measures of the central tendency of a distribu-
tion (of grades, or anything else). These measures are: the mode, median, and mean of
the distribution.
The mode is the most frequent or most numerous value in the data set. As you can
see from the data in Table 5.2, the mode of the class’s scores is 7⁄10. Four students scored
a 7, which makes this more common than any other score.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
The mode can be informative, and for some qualitative variables, it may be the only
measure of central tendency that can be employed. However, even for a unimodal bell
curve, the mode may not reflect the central tendency of a distribution well. Notice
from the list of ordered scores that a 7, although the most frequent score, is lower
than more than half of the students’ grades. It is also possible for there to be more
than one mode in a distribution, which also limits the ability of a mode to capture
the central tendency. What if four students also had scored a 9? Finally, if all values
were different, then none would be more common than any other; such a distribution
would have no mode at all.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 191
The median is the middle value in a distribution when the values are arranged from
the lowest to the highest (or from highest to lowest). The median value splits the dis-
tribution exactly in half: half of the values are on one side, the other half on the other
side. In our example, the median will be whatever score was earned by the student who
did the ninth best/worst. There were 17 students total, and so eight scored above that
ninth student while eight scored below. Student #9 earned a 7.5 on the quiz; this is the
median quiz score. When the distribution has an even number of values, the median is
halfway between the two middle values.
The median is often the preferred measure of central tendency when the distribution
is not symmetrical. This is because the median is not strongly affected by outliers,
that is, by data values that are remarkably different from the rest (like student #1,
who scored a 0 on the quiz). But that strength is also a weakness, depending on the
nature of the information you want to capture. You might want the central tendency
measure to be different when some students bombed the quiz instead of all having
scores grouped around the middle value. Unlike the mode, the median cannot be
identified for qualitative variables, since the values of these types of variables cannot
be ordered from lowest to highest. (It makes little sense to ask whether cappuccino is
lower or higher than cortado.)
The mean, also called the average, is the sum of all values in the data set divided by
the number of outcomes. All together, the students’ scores sum to 121.5; dividing that
sum by 17, the number of students, gives us a mean grade of 7.1 on the quiz. Like the
median, the mean cannot be calculated for categorical or qualitative data, as such values
cannot be used in addition. Unlike the median, the mean is affected by outliers; the
mean is pulled in the direction of the distribution’s longer tail. The student who scored
a 0 on the quiz pulled down the mean score of 7.1 by nearly half a point, compared to
the median score of 7.5.
When a distribution is unimodal and perfectly symmetrical, the mode, median, and
mean coincide, and they are all exactly in the middle of the distribution. Asymmetric
and multimodal distributions can lead these measures of central tendency to be radically
different from one another.
dom variable. Like central tendency, variation can be measured. Measures of variation
provide us with a summary of the ‘spread’ of the values in a data set—that is, the degree
to which they vary.
Information about variation can differentiate data sets that have the same mean, median,
and mode. Let’s return to our simple imaginary example of quiz grades. Suppose the next
quiz has the exact same mode, median, and mean as the scores shown in Table 5.2. This
suggests the class did equally well on this next quiz. And, on average, they did. But this
isn’t the whole story; compare the quiz grades in Table 5.3 to those in Table 5.2. What
differences do you notice?
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
192 Statistics and Probability
TABLE 5.3 Imagined data set and central tendencies for 17 student scores on 10-point Quiz 2
#9 7.5
This second data set has the same mean, median, and mode as the quiz scores from
Table 5.2 but with much less variation in score. There is more variation in the grades on
Quiz 1 than Quiz 2. Consider that the lowest score on Quiz 1 was a 0 and the highest
grade a 10, while the lowest score on Quiz 2 was a 4 and the highest grade a 9. Visualizing
the two data sets with a histogram makes it easier to spot the difference; see Figure 5.5.
As this illustrates, measures of central tendency don’t capture all the information about
a distribution; you also need measures of variability.
There are three primary measures of variability: the range, variance, and standard
deviation of a distribution. The range is the difference between the smallest and largest
values in a data set. For Quiz 1, the range was 10, since the lowest score was 0 and
the highest score was 10; for Quiz 2, the range was 5, since the lowest score was 4 and the
highest score was 9.
Range does not take outliers into account very well, since it doesn’t specify anything
about the distribution of scores within the range. In other words, range won’t tell you
Copyright © 2018. Taylor & Francis Group. All rights reserved.
whether the distribution’s tails are skinny or thick—the ‘spread’ of the data. This can be
done with a measure of the distance of values from the mean, which is what the measures
of variance and standard deviation do. These measures of variation summarize the spread,
or how close the various values are to the mean.
Population variance (σ 2) is the average of the squared differences of values from the
mean, that is:
σ 2 = ∑(value − mean)2 / n
The capital symbol sigma (‘∑’) indicates that you should sum all instances, and n is the
number of values in the data set. Let’s discuss this formula in more detail.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 193
7
Number of Students
0
0 1 2 3 4 5 6 7 7.5 8 8.5 9 10
Test Score
9
8
7
Number of Students
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 7.5 8 8.5 9 10
Test Score
FIGURE 5.5 (a) Histogram of the Quiz 1 grade distribution in Table 5.2; (b) Histogram of the
Quiz 2 grade distribution in Table 5.3
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Notice that calculating variance requires knowing the mean of a data set. After cal-
culating the mean, the first step to finding the variance is to find the difference of each
value from that mean; this is the distance between the mean and each value in the data
set. Finding this difference will show whether the values tend to vary a lot or only a little
from the mean. Next, each difference is squared. (Otherwise the differences on either side
of the mean would cancel each other out, since the difference for values greater than the
mean is positive and the difference for values less than the mean is negative.) Finally, find
the average of those squared differences by adding them together (∑) and then dividing
by the number of values (n).
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
194 Statistics and Probability
Let’s find the variance for the scores on Quiz 1 and Quiz 2. For each, the mean is 7.1
(rounded to one decimal point).
For Quiz 1, the population variance calculation looks like this:
Comparing the variances for the two quizzes makes it clear that the scores on Quiz 1
had more variation than those on Quiz 2: 5.0 to 1.9, respectively.
The final measure of variation we will discuss is standard deviation (σ), which is
calculated directly from the variance. The standard deviation is just the square root of
the population variance:
σ = √[∑(value − mean)2 / n]
The standard deviation for Quiz 1 is 2.2 (the square root of 5.0). For Quiz 2, it is 1.4
(the square root of 1.9).
The standard deviation provides us with a sort of yardstick for measuring variation.
It is a number against which you can assess individual values or groups of values to see
how far they are from the mean, relative to total variation in the data set.
If the histogram describing our data set is bell-shaped (unimodal and roughly sym-
metric), then around 68% of the values fall within one standard deviation of the mean,
and around 95% of the values fall within two standard deviations of the mean—that
is, fall within the distance that’s twice as long as the standard deviation value. And
virtually all of the values lie within three standard deviations of the mean. Look at
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Figure 5.6. This shows the locations of one, two, and three standard deviations for a
probability distribution with a bell-shaped histogram. The standard deviation distances
will, of course, change depending on the spread of the data. The standard deviation
value reflects this by being a relatively large number (lots of spread) or a relatively
small number (little spread).
The scores in our quiz examples are roughly normally distributed. So, using the stan-
dard deviations we have calculated, we can say that, for Quiz 1, roughly 68% of the quiz
scores are between 4.9 and 9.3 (7.1, the mean, ± 2.2, the standard deviation). Roughly
95% are between 2.7 (the mean minus two standard deviations) and 10 (the maximum
score, which is less than the mean plus two standard deviations). For Quiz 2, in contrast,
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 195
FIGURE 5.6 Standard deviation in a normal distribution; the values within one standard devia-
tion of the mean account for 68.27% of the values in the data set, while those within two
standard deviations account for 95.45%, and those within three standard deviations account
for 99.73%
roughly 68% of the scores are between 5.7 and 8.5, 95% between 4.3 and 9.9. This is a
smaller range than for Quiz 1. The standard deviation nicely captures the difference in
score variation between these two quizzes.
Mean and standard deviation are the most commonly reported summary statistics for
a data set. Together, the mean and standard deviation capture the central tendency and
variability around that central tendency in a way that is informative and—as we will see
in the next chapter—central to statistical inference.
Correlation
Most research in the natural and social sciences is concerned not just with variables but
Copyright © 2018. Taylor & Francis Group. All rights reserved.
also with the relationships among them. For instance, some years ago, French researchers
studied whether people drink more alcohol when they hang out in loud bars. They found
a positive correlation between the variable decibel level in bar and the variable alcohol
consumption. If you ask whether level of marijuana consumption is different in different
states in the US, you are interested in the relationship between the variable marijuana
consumption and the variable state. Or if you wonder whether being able to read at a
younger age predicts salary level in adulthood, you are again asking about the correlation
between the values of two variables.
Recall our earlier definition of statistical independence. When two variables are sta-
tistically independent, the value of one variable does not raise or lower the probability
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
196 Statistics and Probability
of the other variable taking on any given value. Variables that are not statistically inde-
pendent are correlated variables: the value of one raises or lowers the probability of the
other having some value. For example, the correlation found by those French researchers
between loud bars and alcohol consumption means that a person going into a loud bar
is more likely to have, say, five alcoholic drinks than is a person going into a quiet bar
(Guéguen et al., 2008).
When greater values for one variable are related with greater values for a second
variable, these are said to be directly or positively correlated. The decibel level of a bar
and alcohol consumption were found to be positively correlated. When greater values
for one variable are related with smaller values for a second variable, these are said to
be indirectly or negatively correlated. Perhaps level of alcohol consumption on a given
evening is negatively correlated with waking up early the following morning; the more
alcohol someone drinks, the less likely that person is to wake up early.
For quantitative variables, scatterplots can provide a visual representation of whether
they are correlated and how. Scatterplots are graphs in which the values of one variable
are plotted against the values of the other variable. For example, the horizontal axis of
the plot, the X-axis, may indicate the decibel level in different bars, and the vertical axis,
the Y-axis, the average number of drinks consumed in those different bars, such as in
Figure 5.7.
A scatterplot that shows a positive correlation between variables will have dots that
tend to form an upward-sloping line from left to right. As the values of one variable
get larger, the values of the other variable also tend to get larger. However, there can be
exceptions—dots that vary from that general pattern. Some very quiet bars may serve a
lot of drinks, and some very loud bars may serve few drinks. But this doesn’t eliminate
the general correlation between decibel level and alcohol consumed.
9
8
7
6
Copyright © 2018. Taylor & Francis Group. All rights reserved.
5
4
3
2
1
0
0 20 40 60 80 100 120
Loudness of a bar in decibels
FIGURE 5.7 An imagined scatterplot of the relationship between alcohol consumption and
decibel level in bars
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 197
A scatterplot that shows a negative correlation between variables will have dots
that tend to form a downward-sloping line from left to right. As the values of one
variable get larger, the values of the other variable tend to get smaller. Of course, there
can be dots that vary from this pattern as well without interfering with the negative
correlation.
What would you expect for a scatterplot of two variables that aren’t correlated? Well,
there won’t be an upward sloping line, and there won’t be a downward sloping line. What
you usually see are dots all over the place, with no pattern between the values of one
variable and the values of the other variable.
Measures of Correlation
One way to summarize the relationship between variables is called regression analysis.
The basic idea is to find the best-fitting line through the points on a scatterplot.
Modern regression analysis was invented by Sir Francis Galton (1822–1911). Galton
was the half-cousin of Charles Darwin, and he had many interests: he was a geographer,
meteorologist, tropical explorer, inventor of fingerprint identification, eugenicist, and best-
selling author. Galton was obsessed with measurement (he even tried to measure physical
beauty). In 1875, Galton began to investigate heredity: Why do successive generations
remain alike in so many features? And how do offspring vary from their parents? One of
his projects was to measure the diameter and weight of thousands of mother and daughter
sweet pea seeds (see Table 5.4).
After plotting his results, Galton hand-fitted a line to his data as best as he could. He
wanted to find the line that best fit his data. Intuitively, this is the line that runs closest
to the points scattered on a plot. Galton aimed to draw a line that minimized the sum
of the distances of the points on the plot from that line, while still maintaining a straight
TABLE 5.4 Average diameter of parent/offspring sweet pea seeds measured in 1/100ths
of an inch (Adapted from Galton, 1889, p. 226)
#1 15 15.3
#2 16 16.0
#3 17 15.6
#4 18 16.3
#5 19 16.0
#6 20 17.3
#7 21 17.5
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
198 Statistics and Probability
18.5
17.5
17.0
16.5
16.0
15.5
15.0
14 16 18 20 22 24
Diameter of mother sweat pea seeds
FIGURE 5.8 A regression analysis of Galton’s data on the diameter of sweet pea seeds
line. In this sense, it can be considered the best fit. Figure 5.8 shows the best-fitting
straight line for Galton’s data.
As with the dots of a scatterplot, when there is a positive correlation, the best-fitting
line will have an upward-sloping trajectory as it moves right, and when there’s a nega-
tive correlation, the line will have a downward-sloping trajectory as it moves right. The
size of parent and offspring sweet pea seeds are positively correlated: the slope of the
line goes from the bottom left to the top right of a scatterplot. In contrast, speed and
accuracy in carrying out a task are negatively correlated: as speed increases, accuracy
decreases. In this case, the slope of the line goes from upper left to lower right of the
scatterplot.
A regression analysis also gives information about the correlation strength: how
Copyright © 2018. Taylor & Francis Group. All rights reserved.
predictable the values of one variable are based on the values of the other variable.
The closer the dots are to the best-fitting line, the stronger the correlation, that is, the
more linked the values of the two variables. (Notice that the slope of the line is not
related to correlation strength; the slope only gives information about how the values
of the variables tend to relate to each other.) A maximum strength correlation, often
called a perfect correlation, will have all the dots directly on the regression analysis
line. A very weak correlation will have dots that almost look uncorrelated; they fall
all over the place, far from the line, but there’s just a hint of a relationship between
the values of the two variables. In Figure 5.9, you can see examples of very strong
and weaker correlations with the same relationship among variables and so identical
regression analysis lines.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 199
From his regression analysis, Galton saw that as the size of a mother sweet pea seed
increased, so did the size of its daughter sweet pea seed. However, the daughter seeds
tended to be less extreme in size compared to their mother peas: they ‘regressed’ back
toward average pea size. Extremely large mother seeds grew into plants whose daughter
seeds tended not to be as extremely large, and extremely small mother seeds grew into
plants whose daughter seeds tended not to be as extremely small. Galton called this loss
of extremity the regression to the mean. It can be explained as just an effect of variabil-
Copyright © 2018. Taylor & Francis Group. All rights reserved.
ity: if a variable has an extreme value, then most other values that variable can have are
less extreme. So, even though mother and daughter pea sizes are positively correlated,
extreme-sized peas tend to have less extreme-sized daughter peas (but even this is some-
thing that can vary). The same also holds true in reverse: extreme-sized peas usually have
less extreme-sized mother peas.
Galton also determined a correlation coefficient for mother and daughter pea size. A
correlation coefficient provides information about the direction and strength of correla-
tion. It has two parts: a positive (‘+’) or a negative (‘−’) sign to indicate positive or nega-
tive correlation respectively and a number between 0 and 1 to indicate the strength of
the correlation. This is a measure of the dispersion of the points on the scatterplot. The
stronger the relationship between the two variables, the closer the correlation coefficient
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
200 Statistics and Probability
is to 1, when the value of one variable is a perfect predictor of the value of the other
variable. A value of 0 means that the points on the plot are randomly scattered, and the
two variables are statistically independent: the value of one gives no information about
the value of the other.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 201
evidence was fabricated. These and other methodological problems were related
to an even bigger issue. Galton’s findings tended to confirm the superiority of
white, wealthy Englishmen. Women were omitted from his genealogical analysis
because he maintained that notable achievement was principally a male preroga-
tive. The shortcomings of his methods allowed Galton’s science to simply confirm
his biases and expectations—that white, upper-class men are superior to the poor,
to people of color, and to women.
Worse yet, this shaky research was used to promote unethical programs. Galton
coined the term eugenics in 1883 (from the Greek for ‘well-born’), which was
initially a social philosophy aiming to improve the genetic pedigree of societies
by pairing certain individuals and not others. Galton advocated eugenic marriages
and the use of social incentives to encourage ‘able’ couples to have children. In
the 20th century, eugenics movements in the US, Britain, and other countries
adopted various policies that restricted human liberties while threatening human
dignity. These included forced birth control, marriage restrictions, racial segrega-
tion, compulsory sterilization, and even genocide (see Gillham, 2001). Distress-
ingly, this is just one example from a long string of scientific research that has
been misused to justify racism, sexism, and classicism. The role of values in science
and the role of science in society are a main focus of Chapter 8.
Summary
Let’s end with a summary of the measures of central tendency and variation introduced in
this section. You calculate a population’s mean by summing all the values, xi, and dividing
by the total number of outcomes, n, in your data set.
To find the mode or the median, you should begin by ordering all the values in the data
set from smallest to largest. To find the mode, count how many times each value occurs;
the value that occurs most often is the mode. There is no mode if no value appears more
often than any other, and there are two (bimodal) or more modes if two or more values
Copyright © 2018. Taylor & Francis Group. All rights reserved.
occur most often. To find the median, search for the value in the very middle of the list;
the middle value is the median. If there is an even number of outcomes, then the average
of the two values in the middle is the median.
To find the range, a measure of variation, you also need to begin by ordering the values
from smallest to largest. Then, simply subtract the smallest value from the largest value
in the data set.
You can calculate the population variance, σ2, by first finding the mean. Then, for each
value in the data set, subtract it from the mean and square each result. Finally, calculate
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
202 Statistics and Probability
the average of those squared results (that is, sum the results and divide by the number
of values):
σ 2 = ∑(value − mean)2 / n
To find a population’s standard deviation, σ, compute the square root of the population
variance:
σ = √[∑(value − mean)2 / n]
For a normal or bell-shaped distribution, 68% of outcomes fall within one standard devia-
tion and 95% of the outcomes fall within two standard deviations. Virtually all (99.7%)
fall within three standard deviations.
EXERCISES
5.14 Define the concepts of central tendency and variability in your own words, and
describe the importance of each.
5.15 List three measures of central tendency and three measures of variability. For each
measure, describe its advantages and any drawbacks or limitations.
5.16 Divide the following list into qualitative variables and quantitative variables. For
each quantitative variable, say whether it is discrete or continuous.
a. The height of a mountain
b. The color of starfish
c. The breed of a dog
d. The winner of Wimbledon
e. The population of a city
f. The outcome of a throw of a die
g. The GDP (gross domestic product) of a country
h. Type of pizza
i. The number of pizzas one person eats per week
j. The amount of salt in the Atlantic Ocean
Copyright © 2018. Taylor & Francis Group. All rights reserved.
5.17 Label the type of visualization found in each of the following figures as a bar chart,
scatterplot, pie chart, or histogram. Then, for each, describe the data portrayed,
including variable(s), characteristics of the distribution, and anything notable or
surprising about the data.
5.18 Look back at Table 5.4, a data set used by Galton in his studies of heredity. Calcu-
late the mean, median, and mode of this data set. Next, calculate the range, vari-
ance, and standard deviation.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
City/Town
Special District 19%
7%
TIF
8%
County
17%
Schools
42%
Library
4%
Township
3%
FIGURE 5.11
(a) Average expenditure per dollar of Indiana property tax, 2013
(b) Composite score GRE and academic major based on college graduates who tested
08/01/11–04/30/14
Chart 2015 Philosophy at University of New Orleans; Data 2014 ETS
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
204 Statistics and Probability
team A =
team B =
team C =
number of digs performed
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
Statistics and Probability 205
using each of a pie chart, bar chart, histogram, and scatterplot. Then, choose which
approach is best for this data and draw that type of visualization of the data.
Explain your choice of visualization type.
5.23 Answer the following questions based on the data from Table 5.5 and/or your visu-
alization of the data in exercise 5.22.
a. What is the percentage of survivors for each class, gender, and age group?
Note that you’ll need to find the mean for one grouping across the other group-
ings to calculate these percentages.
b. Which group had the highest mortality rate in this disaster? Which group was
most likely to survive?
c. Write out the different values for each of the variables: class, gender, and age
group. Order these from those that correlated most with survival (either posi-
tively or negatively) to those that correlated the least with survival. For each,
indicate whether the correlation was positive or negative.
d. Can you guess from the data anything about the code of conduct on the Titanic
for who should be saved first in a life-threatening situation?
<https://en.wikipedia.org/wiki/RMS_Titanic#Survivors_and_victims>
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
206 Statistics and Probability
FURTHER READING
For more on the importance of probability and statistics in your life, see Gigerenzer, G.
(2002). Calculated risks: How to know when numbers deceive you. New York: Simon &
Schuster.
For an introduction to probability theory, see Olofsson, P. (2007). Probabilities: The little
numbers that rule our lives. Hoboken: Wiley & Sons.
For a concise discussion of graphs that badly visualize a data set with several real-life
examples, see the website URL = <www.statisticshowto.com/misleading-graphs/>
For more on misleading or ‘spurious’ correlations, see Vigen, T. (2015). Spurious cor-
relations. New York: Hachette Books. Several fun examples of spurious correla-
tions are also available at the following website URL = <www.tylervigen.com/
spurious-correlations>
For additional information about Francis Galton and the birth of eugenics, see Gillham,
N. W. (2001). Sir Francis Galton and the birth of eugenics. Annual Review of Genetics,
35, 83–101.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-25 02:23:28.
CHAPTER 6
Statistical Inference
data from these pre-election polls. For instance, we have represented them numerically
using percentages. They also can be represented visually—say, in a bar chart showing the
comparative proportion of votes for Clinton, Trump, and third-party candidates in a given
poll or averaged across polls.
But this wasn’t really what anyone ultimately wanted to know. No one cared much
about the voting intentions of the individuals who happened to be polled or about how
those potential voters felt two weeks before the election. What everyone was interested
in knowing was something about all US voters: how they would actually cast their bal-
lots. Everyone wanted to use the data collected in these pre-election polls not just to
describe but to make predictions about how the election itself would turn out. But we
can’t get answers like that using descriptive statistics. It requires looking beyond the data
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
208 Statistical Inference
we have, using it to make inferences about a larger group or about new observations. For
these kinds of interests—predicting the future and generalizing from a sample—we need
inferential statistics.
Inferential statistics is an important form of inductive reasoning that extends the reach
of descriptive statistics with the use of probability theory. Some of the basic principles and
concepts of probability theory were discussed in Chapter 5. Recall that random variables
have values that can’t be predicted individually but that can display patterns over many
instances. Despite the name, the values that random variables take on may not truly be
random. Coin tosses, dice throws, LeBron’s free throws, voting intentions, temperatures
on the days of September, and the decibel level in a bar can all be treated as random
variables. For any of these variables, inferential statistics allows us to analyze relevant
data sets to predict yet-to-be-measured values of those variables. For example, one might
assess from a sequence of heads and tails whether the coin is fair, predict from LeBron’s
past record whether his free throw success will improve over time, infer the efficacy of
a medical drug from observed treatment effects, or predict from an opinion poll which
candidate will win an election.
In brief, statistical inference is a form of inductive inference that employs probability
to better understand the real-world phenomenon underlying a known data set. It allows
scientists to formulate expectations about what they would observe in a new data set or
in the larger population and to assess how confident they can be about those expectations.
frequencies of each color. This frequency distribution can be turned into a relative fre-
quency distribution as in Table 6.1b, which displays the proportions of the colors of the
M&M’s in your bag. These proportions can also be put as percentages. Insofar as every
M&M is one of six colors, the six proportions shown in Table 6.1b should sum to one;
that is, the percentages of the different colors add to 35⁄35, or 100%.
Relative frequency distributions can be used to estimate the probability distribution
for the variable—that is, how probable it is for different values to occur in general. For
example, the relative frequency distribution of the colors of M&Ms in your bag can be
used to estimate the probability distribution for the colors of M&Ms in any bag of M&Ms.
Based on your sample bag of M&Ms, you may estimate that if you take a different bag of
M&Ms, open it, and choose one M&M at random, the probability of getting a blue M&M
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 209
Bag of 35 M&Ms
Color Frequency
Blue 1
Orange 3
Yellow 4
Red 5
Green 5
Brown 17
Bag of 35 M&Ms
Proportion Percentage
1/35 2.86%
3/35 8.57%
4/35 11.43%
5/35 14.29%
5/35 14.29%
17/35 48.57%
is about 3%. Your estimate may or may not be very good. It may be that distributions of
M&M colors are quite similar across bags, or they may deviate quite a bit.
As this simple example illustrates, probability distributions indicate the probabilities
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
210 Statistical Inference
coin toss comes up tails, then there are zero instances of heads. What is the probability
of that ever happening, assuming the coin is fair? Recall that the probability of getting
tails on a given throw is always .5, or ½, and that we multiply when calculating the prob-
ability of multiple independent events all occurring. The result would be a really, really
tiny number: ½ × ½ × … × ½ for all 100 throws, or ½100. This is also the same as the
probability that heads comes up 100 times.
In between 0 and 100, the calculation for the probability of each value of the number
of times heads comes up is much more complicated. We won’t carry out those calcula-
tions, but considering how they would go gives us a sense for how the probability changes
for intermediate numbers of heads. To begin, notice that there is only one way to get
zero heads and only one way to get 100 heads; in the former case, the coin never lands
on heads, and in the latter case, the coin lands on heads 100 times in a row. In contrast,
there are 100 different ways to get heads on only one toss; it might be the first toss or
the second toss or the third toss or the 37th or any other single toss. So, Pr(heads = 1) is
equivalent to Pr[(heads1 and tails2 and tails3 and … tails100) or (tails1 and heads2 and tails3
and … tails100) or …] and so on, up until the circumstance of getting heads only on the
100th toss. Using our calculation from earlier, and because we add when calculating the
probability of one of several mutually exclusive events occurring, this is ½100 + ½100 + … +
½100, or 100 × ½100. This is still a really, really tiny number, but it’s 100 times bigger
than the probability of heads coming up no times. Notice also that the same calculation
gives us the probability that heads comes up 99 times. So we’re building our probability
distribution from both ends at the same time.
There are even more ways for heads to come up twice (or 98 times) and even more
ways than that for heads to come up three times (or 97 times). Each time we add another
outcome of heads, the calculation becomes more complicated, and the probability of get-
ting that number of heads increases. Further, the increasing probability of each of these
outcomes isn’t linear; the increase gets bigger each time.
We already know the distribution is symmetric, since the calculation is the same
whether the number of heads = 0 or 100, whether the number of heads = 1 or 99, and
so forth. The middle of the distribution, the most probable outcome, is thus 50: that you
get heads on 50⁄100, or ½ of the coin tosses. Figure 6.1a shows a histogram of the whole
probability distribution. Notice that the shape of the histogram approximates a bell curve.
With even larger numbers of coin tosses, the distribution becomes closer and closer to a
bell curve. A bell-shaped curve or normal distribution, briefly introduced in Chapter 5, is
a perfectly symmetric, unimodal distribution for continuous variables, like in Figure 6.1b.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Again, this is also called a Gaussian distribution, after the German mathematician Carl
Friedrich Gauss (1777–1855).
Normal distributions—unimodal and symmetric distributions—are especially impor-
tant for statistical reasoning. Like coin tosses, the behavior of random variables over many
repeated, independent trials tends to have a probability distribution that is normal. This
result depends on what is known as the central limit theorem, a statistical theory that
samples with a large enough size will have a central tendency approximating that of the
population (Le Cam, 1986). As a result, the probability distribution of random variables
is a normal distribution or bell curve. What varies for different random variables is the
central tendency and variability of the normal distribution, which—as we saw in Chapter 5—
can be described with mean and standard deviation. Whereas the mean value of heads
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
FIGURE 6.1 (a) Probability distribution of heads for 100 coin tosses—the mean of this distribution
is 50 and its standard deviation is 5
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
212 Statistical Inference
on 100 coin tosses is 50, the mean value of 6s on 100 dice rolls is 16.67 (1⁄6 × 100).
The standard deviation, but not the mean, is also influenced by the number of trials. You
are more likely to get none or all heads in five coin tosses than in 100 coin tosses; the
standard deviation is larger for the former.
Probability distributions for coins, dice, and roulettes (so long as they are fair) can be
calculated directly from the probabilities of the individual outcomes, as sketched earlier
for 100 coin tosses. This is not so for variables like success rate with respect to free throws.
This is why the frequency distribution observed in some data set, as characterized by its
mean and standard deviation, is important for many random variables (such as success at
free throws or number of blue M&Ms in a bag). In these and many other cases, the relative
frequency distribution can be used to estimate the probability distribution. The predicted
probability distribution can then connect the frequency distribution of an observed data
set to expectations for some new, relevantly similar data set.
the probability distribution associated with the feature of interest in the population. The
sample mean is the most likely average value of the feature in the population; in other
words, the sample mean is the estimate of the population mean. It is called sample mean
because this estimate is based on the mean of the observed sample. In other less strict
contexts, someone might use the term sample mean to refer to the mean of a sample,
but again, the sample mean is really a prediction about a feature of the population. The
predicted mean might not in fact turn out be the mean value in the population, but it’s
the most likely value and thus our best guess.
Imagine scientists have a sample of 100 university students, and they want to use that
sample to estimate the range of political views among all students at that university. They
might administer a questionnaire to the individuals in the sample, with each individual’s
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 213
1 0
2 2
3 5
4 7
5 10
6 15
7 22
8 18
9 13
10 8
responses scored between 1 and 10, where 1 is most politically conservative and 10 is
most politically liberal. Imagine the questionnaire scores are as shown in Table 6.2. From
this data, scientists can estimate the mean degree of liberalness (or, equally, conservative-
ness) in the full population of university students.
Now, remind yourself of how to calculate the mean. You can do this by adding all
the scores (or, multiplying each score by the number of students who got that score and
adding those up) and then dividing by the total number of students. It turns out that
the mean score is 6.82—college students tend to be a rather liberal bunch, on average.
This score is the mean value of the sample. If the 100 students in the sample have been
appropriately selected—an idea we’ll unpack later—this score is also the sample mean:
it’s most likely to be the average value in the population of university students.
The sample mean plus the individual scores and the sample size n can be used to
calculate the standard deviation. In Chapter 5, we characterized the standard deviation
as a measure of the ‘spread’ of the values of some variable within a data set and defined
Copyright © 2018. Taylor & Francis Group. All rights reserved.
σ = √[∑(value − mean)2 / n]
This would be what we would use to find the standard deviation from data about an entire
population. But, with estimation from a sample, scientists have data from a sample only
and not from the entire population. So, they estimate the population standard deviation
from the standard deviation of an observed sample. This estimate is called the sample
standard deviation (s instead of σ) and is calculated in a slightly different way:
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
214 Statistical Inference
The important change in this formula is n − 1 instead of n, where n is the number of data
points. This is a way to correct for systematic underestimation of the population mean.
We won’t ask you to perform this calculation here, but the sample standard deviation of
scores on the questionnaire works out to be 1.98.
Like sample mean, the terminology of sample standard deviation might be confusing.
In descriptive statistics, mean and standard deviation are used simply to summarize the
central tendency and variability of an actual frequency distribution. In inferential statistics,
sample mean and sample standard deviation instead provide estimates of the central ten-
dency and variability of the probability distribution for a random variable. The probability
distribution is the ‘middleman’, enabling a prediction of the characteristics of interest in
the population. So, the sample mean and sample standard deviation are not descriptive
measures of the sample but predictions about the population.
In making these predictions, a helpful rule of thumb for getting a rough probability
estimate of a characteristic of interest is called the 68–95–99.7 rule. This rule can be used
to remember the percentages of values expected to lie within a certain range around the
mean in a normal distribution. It says that about 68%, 95%, and 99.7% of the values lie,
respectively, within one, two, and three standard deviations of the mean (Pukelsheim,
1994) (the other 32%, 5%, and 0.3% being equally scattered on either side of these ranges).
Applying the rule to our example of political views of university students indicates
that any given student at the university has a 68% probability of having a score between
4.84 and 8.80 on our conservative/liberal scale. This is the mean (6.82) ± one standard
deviation, calculated by subtracting and adding the standard deviation (1.98) to the mean
(6.82). Given that 5.00 is the dividing line between liberal and conservative, a student
thus has roughly a 68% chance of being more liberal than conservative. Any given student
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 215
has a 95% probability of having a score of 2.86 to 10.00 (within two standard devia-
tions of the mean). We can be much more confident that some student will fall within
this range, but it is also a wider and less informative range. The only thing this tells us
is that most (95%) college students are predicted to be outside the most conservative
part of the scale.
Let’s return to the polls leading up to the 2016 US presidential election. The well-
regarded statistical blog FiveThirtyEight determined that of 22 pre-election polls, the
mean gap between the two main candidates was a 5.3% lead for Clinton, with a 3.6%
standard deviation (Enten, 2017). This means, assuming there are no underlying problems
with the polls and no voters changed their minds in the remaining two weeks before the
election, there was a 68% chance that Clinton would get between 1.7% and 8.9% more
votes than Trump. There was a 95% chance that Clinton would get somewhere between
1.9% fewer votes than Trump and 12.5% more votes. Clinton in fact got 2.1% more of
the popular vote than Trump, comfortably within the 68% interval. Nothing about the
election outcome was (statistically) surprising. This oversimplifies the situation a bit, and
there is much more to know about polling and statistical analysis of the 2016 election. But
this gives you a rough sense for how inferential statistics was employed in this context.
There is one more complication we should mention about the sample mean as a
prediction of the population: the sample mean can vary from sample to sample. In our
example of the questionnaire about political views, we sampled a specific 100 students
and found a sample mean of 6.82. If we sampled a different 100 students, we may get
a slightly different sample mean just by sheer chance. We wouldn’t be surprised to get
a sample mean of 6.56 for this new group, but we might be baffled if our new sample
mean were 4.22. Inferential statistics gives us a way to think about these possibilities
as well.
Imagine we repeatedly invite samples of 100 students to take this questionnaire about
political views, and we calculate the sample means for each. This results in the sampling
distribution of the sample mean. The standard deviation of this distribution provides an
estimate of the variation of the sample means. This estimate, the standard deviation of
the sampling distribution of the mean, is called the standard error and is calculated from
the sample standard deviation and sample size as follows:
SE = s / √(sample size)
The standard error is a measure of the precision of the sample mean, or the uncertainty
Copyright © 2018. Taylor & Francis Group. All rights reserved.
about the estimate of the mean of a population. The standard error, and hence uncer-
tainty about the sample mean, decreases as the sample size increases. This is because a
large sample size helps control for chance variation in the traits of sample. You can be
more certain about average political views among all students with a sample of 100 than
with a sample of 10.
Representative Samples
There’s a lingering issue about estimation from a sample that we’ve brushed under the
rug. Recall that inferences about the sociopolitical views of a student population depended
on whether the sample of students interviewed was ‘appropriately selected’. The issue
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
216 Statistical Inference
is that sound estimation via statistical inference requires that the sample be representa-
tive: the sample should accurately reflect the target features in the general population.
Samples chosen in ways that make some individuals in a population less (or more) likely
to be included than others will introduce bias in the inferences made about the popula-
tion based on the sample. This bias may lead to incorrect predictions. A poll that only
solicited the views of Republicans couldn’t accurately predict the outcome of a general
election. Similarly, if you are interested in studying the political opinions of the general
population of India, studying only the wealthiest Indian individuals may well lead you
astray, as this group may have some political views that are much less common in India’s
entire population.
Incorrect conclusions resulting from non-representative sampling are called sampling
errors. Here’s a historically significant case of a serious sampling error. In 1936, a magazine,
Literary Digest, sent out 10 million postcards asking Americans how they would vote in
that year’s presidential election. They received almost 2.3 million back, which is a very
large sample. In that sample, Alfred Landon had a decisive lead over Franklin Roosevelt:
57% to 43%. The Digest did not gather information that would allow it to judge the rep-
resentativeness of its sample. A young pollster, George Gallup, estimated from a much
smaller sample of 50,000 (which is still larger than most modern political polls). His
sample was representative, and it predicted Roosevelt winning by a landslide. That was,
of course, the eventual outcome of the election. The Literary Digest closed down soon
after, and Gallup’s name lives on in the well-known Gallup poll approach to measuring
public opinion based on surveying a sample (Squire, 1988).
The requirement of representativeness was discussed in Chapter 2 in the context of
forming experimental and control groups with similar ranges of values for any extrane-
ous (or confounding) variables. Also discussed there was how representative groups can
be achieved by random assignment to groups. The similar step for statistical inference
is called random sampling, where the individuals composing the sample are selected
randomly from the population. This protects against bias. Our discussion about estima-
tion has presumed random sampling. In Chapter 2 we also discussed the importance of
sample size for representativeness. The upshot was that larger samples can be expected to
be more representative than smaller samples, helping to control for the unwanted effects
of possible confounds. As we saw in the above discussion of standard error, the tools of
statistical inference enable us to explicitly take this into consideration.
Statistical analysis presumes random sampling, but truly random sampling is difficult to
accomplish in many circumstances. For example, in a telephone poll of voter preference
prior to an election, the phone numbers dialed can be randomly selected. But who picks
up the phone, whether a person hangs up immediately or answers the questions, and even
who has a phone and who doesn’t are all non-random influences on the people sampled.
And those influences may be confounding variables, since they might correlate with voter
preferences. As we’ve seen, these are called sampling errors.
There’s evidence of sampling error in polls leading up to the 2016 US presidential
election. Here’s one example. In general, Trump did particularly well with white voters
without college degrees. The FiveThirtyEight blog showed that Trump performed better
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 217
than the polls predicted in states with a greater concentration of voters in that demo-
graphic. This would seem to suggest that white voters without college degrees were under-
represented in pre-election polls. FiveThirtyEight also showed how Clinton did significantly
worse in the Midwest than polls predicted. This might in part be due to sampling error,
such as the underrepresentation of white voters without college degrees in the polls. But
there are also other factors. For one thing, election results in midwestern states are highly
correlated across those states. So, a statistical error affecting one of those states is likely
to impact the others, magnifying its effect on the election.
Using statistical inference to estimate features of a population from existing data about
a sample is a powerful extension of statistical description. We have characterized some
of the central features and uses of this form of reasoning. As with much else in science,
there are ideal methods that can be described in the abstract—in this case, involving large,
random samples—and then there are the real-world considerations that regularly lead to
deviations from these ideals. The tools of inferential statistics can also help us in assessing
the impact of such deviations.
EXERCISES
6.1 Define frequency distribution, and describe in your own words how mean and standard
deviation in descriptive statistics relate to frequency distributions. Then, define probabil-
ity distribution, and describe in your own words how inferential statistics makes use of
probability distributions, including how mean and sample standard deviation relate.
6.2 In class, or with a group of classmates, find a coin for each person and carry out
the following steps, with each person recording the answers individually. To instead
do this exercise individually, perform multiple series of four coin tosses on your own,
recording the outcomes of each series of tosses separately.
a. Determine your expectations about how frequently any given coin will land heads
up. How many heads would you expect on four coin tosses?
b. Each person should toss his or her coin four times, recording each result as either
heads or tails. Summarize your individual result as the ratio of heads to tails. This
will be either 0:4, 1:3, 2:2, 3:1, or 4:0.
c. Record how many people in your group got each of the possible ratios: 0:4, 1:3,
2:2, 3:1, and 4:0. Draw a histogram showing these results.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
218 Statistical Inference
6.3 Convert the following data about the outcomes of 11 rolls of a die to a relative fre-
quency distribution:
First roll: 4; second roll: 3; third roll: 4; fourth roll: 6; fifth roll: 2; sixth roll: 1; seventh
roll: 6; eighth roll: 4; ninth roll: 5; 10th roll: 5; 11th roll: 6
Now, draw a histogram showing this relative frequency distribution.
6.4 From what you know about probabilities for fair die rolls, construct a histogram show-
ing the probability distribution for how many times a four is rolled on five die throws.
Show the calculations you used to generate the histogram (which will involve the ad-
dition and multiplication rules of probability).
6.5 Draw a histogram depicting a normal distribution. Draw a small line on the x-axis
where the mean is. Then, draw a second histogram depicting a non-normal distribu-
tion. Describe what feature(s) of the non-normal distribution distinguish it from the
normal distribution.
6.6 Figure 6.3 shows four histograms depicting normal distributions and four pairs of
means and standard deviations. Match each histogram with the mean and standard
deviation it depicts, and briefly explain your choices.
1. mean = 4, standard deviation = 2
2. mean = 4, standard deviation = 0.5
3. mean = 23,standard deviation = 10
4. mean = 50, standard deviation = 5
6.7 Consider each of the inferences 1–4, then answer the following questions (a–d) about each.
1. Almost all Italian football players are good, so those two Italian football players
are probably good.
2. All Italian football players I’ve seen have been good, and I’ve seen at least 10.
So the next Italian football player I see will be good.
3. Approximately 12.4% of women will be diagnosed with breast cancer sometime
during their lifetimes. Of a group of 100 randomly selected women, it’s likely that
approximately 12 will develop breast cancer at some point during their lives.
4. Among your classmates, 89% have seen the most recent Tarantino movie. So
almost all people in town must have seen that movie.
a. Is the inference from sample to population, from population to sample, or
from sample to sample?
b. What is the exact conclusion of the inference?
c. Describe the sample size and representativeness of the sample.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
d. Assess the quality of each inference, attending to the strength of the conclu-
sion, the sample size, and how representative the sample seems to be.
6.8 There are 3,000 people at a party. (It’s a very large party!) 100 are interviewed at
random, and it is discovered that 80 are philosophers, 10 are geologists, and 10 are
artists. The sample standard deviation is ±12%.
a. What’s the percentage of philosophers in this sample of 100 party guests?
b. What’s the probability that the percentage of philosophers at the party is in the
range of 68–92%? (Hint: consult the discussion about the 68–95–99.7 rule.)
c. Within what range does the percentage of philosophers lie with 95% probabil-
ity? How about with 99.7% probability?
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 221
6.9 Consider the following statements in light of the data provided in Exercise 6.8. For
each, say whether the data support the conclusion. Describe your reasoning for
each answer with reference to the information provided and what you know about
statistical estimation.
a. It’s highly probable that the majority of party guests are philosophers.
b. Eighty percent of the people at the party are philosophers.
c. It’s more likely than not that at least 8% of the guests are non-philosophers.
d. It’s highly likely that the geologists are outnumbered at this party.
e. It’s highly probable that most people in the world are philosophers.
6.10 Describe in your own words how statistical methods are used to make estimates
about a population from sample data and some ways in which this can go wrong.
Come up with a simple example to illustrate these ideas.
6.11 Find an article in a newspaper, magazine, or reputable online source that draws
conclusions from a poll. Alternatively, your instructor may provide one article for the
whole class to use for this exercise. Answer the following questions; if you can’t find
the answer, say so, and provide your best guess if possible. If you selected your own
article, please submit a copy or printout of it with your responses.
a. What variable was under investigation? What were the researchers interested
to know?
b. What was the sample size? How was the sample selected?
c. Is the sample likely to be representative? Why or why not?
d. What data did the researchers collect about the sample?
e. What conclusions about the population did the researchers draw from the sample?
f. Assess the poll, the results, and the researchers’ conclusions. Are there any
problems with any of these? How could the poll or the conclusions be improved?
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
222 Statistical Inference
protons (one kind of particle in the nucleus of atoms). These collisions produce a shower
of new particles, most of which are unstable and decay into other particles in a tiny fraction
of a second. Decay products can give insight into the particles that had been created,
as each particle type has its own signature decay products. In particular, the trajectory,
energy, and momentum of these new particles can be detected in this way. Detecting
their mass is particularly relevant to distinguishing between different types of particles.
However, because subatomic particles are so small, it’s a challenge to distinguish the
signature properties of new particles from background events.
In the summer of 2012, scientists recorded a ‘bump’ in their data corresponding to
a particle with a mass between 125 and 127 GeV/c2 (one gigaelectronvolt, or 1.783 ×
10−27 kg). This is about 133 times heavier than protons. It was thought that this recorded
‘bump’ could provide evidence of a new particle—perhaps of the long-sought Higgs
boson. The data were consistent with hypothesized properties of a boson, but consistency
alone is not strong enough justification. Using statistical reasoning, the scientists calcu-
lated that this bump would occur by chance, emerging from only background events in
the collider without the presence of a boson, only once in three million trials. So, the
scientists rejected the idea that the bump occurred by chance. Instead, they concluded
that the data indicated the discovery of the Higgs boson.
Scientists all over the world were thrilled with this news. Its discovery could lend
additional support to the ‘Standard Model’ of particle physics. The discovery of a Higgs
Copyright © 2018. Taylor & Francis Group. All rights reserved.
FIGURE 6.4 Fabiola Gianotti, project leader and spokesperson for the ATLAS experiment at
CERN involved in the discovery of the Higgs boson in July 2012
CERN Creative Commons http://cds.cern.ch/record/1326962
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 223
boson indicates the existence of the Higgs field, an invisible energy field postulated by
the Standard Model as present throughout the universe and the source of other particles’
mass. The hypothesized Higgs boson was supposed to be like the glue of the universe—
what joined everything together and gave it mass. And now, it seems, this hypothesis was
tested and confirmed (Chatrchyan et al., 2012).
The groundbreaking discovery of the Higgs boson is just one example illustrating
how fundamental statistical reasoning is to scientific inquiry. Inferential statistics lever-
ages probability theory to enable sophisticated forms of inductive inference. In the last
section, we discussed its use in estimation. Another primary use is in hypothesis-testing,
that is, in deciding whether the available evidence confirms or disconfirms a hypothesis.
This was the form of statistical inference used when scientists rejected the possibility that
the bump in data was due simply to chance and instead posited the presence of a Higgs
boson as the reason those data were observed.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
224 Statistical Inference
have if the null hypothesis were true. This is where statistics comes in. Inferential
statistics can be used to generate a probability distribution for possible outcomes on
the basis of the null hypothesis. The scientists at CERN set out a protocol for sta-
tistical analysis before gathering any data that specified, for any bump in data they
might observe, how probable it would be given the null hypothesis that no boson
was responsible.
The third step is to gather data about the outcome of the random variable in question
using experiment or observation. These data can be used to evaluate whether and to what
degree the observed data violate the probabilistic expectations based on the null hypoth-
esis. The CERN scientists observed an unexpected bump in data, and they used their
statistical analysis protocol to determine that the probability of this observation would
be extremely low (about one in three million) if the null hypothesis were true—that is,
if no boson were responsible.
So, probabilistic expectations are developed from the null hypothesis with the use of
inferential statistics, and actual data are compared with those expectations. The fourth
and final step of statistical hypothesis-testing is to draw a conclusion from that com-
parison. This final step is always a judgment call. Scientists have to decide how unlikely
the data should be, given the null hypothesis, before they have sufficient grounds to
reject the null hypothesis. If the data are not too far from what is expected given the
null hypothesis, then scientists have no reason to reject the null hypothesis, the default
expectation, in favor of the alternative hypothesis. If, however, the observations do violate
expectations, then this provides a reason to reject the null hypothesis in favor of the
alternative hypothesis. This was the exciting scenario encountered at CERN: faced with
a bump in data that would have been exceedingly unlikely without a boson responsible
for it, the scientists rejected the null hypothesis and declared that they had evidence
for the alternative hypothesis. That is, they declared that they had discovered the long-
sought Higgs boson.
These steps of statistical hypothesis-testing are summarized in Table 6.3. There, we
have emphasized how these steps conform to the basic recipe of developing a hypothesis,
Step Procedure
Expectations Determine probability distribution for the range of outcomes expected if the
null hypothesis is true
Conclusion Evaluate whether the actual outcome is unlikely enough given expectations
from the null hypothesis to provide grounds for rejecting the null hypothesis
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 225
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
226 Statistical Inference
likely to do each. So, on to the second cup of tea. Your friend guesses correctly again,
and this is only slightly more impressive. The probability of guessing correctly purely
by chance for both cups is the probability of a correct guess for the first and a correct
guess for the second, so .5 × .5, or .25 (by the multiplication rule for probabilities
introduced in Chapter 5).
Notice that we have been describing steps 2 and 3 from Table 6.3 in tandem: first, we’ve
described an observation (your friend’s guess), and then we’ve determined the expecta-
tion given the null hypothesis by calculating the probability of obtaining the observation
if your friend is merely guessing at random. So far, the observations conform perfectly
to the alternative hypothesis that your friend really can tell the order the tea and milk
were added—she has guessed right twice in a row. But this isn’t sufficiently unlikely to
rule out the null hypothesis that she’s merely guessing at random.
At this point, you decide to make the test more rigorous by finding a way to make it
very unlikely for your friend to guess correctly by chance. You prepare eight new cups
of tea at once, tossing a coin to determine milk-first or tea-first for each. You put the
cups of tea in front of your friend and ask her to say of each whether the milk or the
tea was added first.
What does the null hypothesis lead us to expect? If your friend is merely guessing
at random, she is most likely to be right about four of the eight cups. This is the mean
expected outcome. The way to think about the calculation of this is the number of trials
multiplied by the probability of success on each trial:
In this context, the mean is the most likely outcome. If your friend were to make repeated
guesses about the eight cups, and the null hypothesis were true, the most common out-
come would be for her to be right about four cups and wrong about the other four. Since
your friend is only making guesses about one series of eight cups, the mean indicates the
most likely outcome. But this outcome is not assured, even if the null hypothesis were
true. By sheer luck, she might still guess correctly more often or less, just as you might
happen to get more or fewer heads over a series of coin tosses.
Just as we can determine the probability distribution for different numbers of heads
over 100 coin tosses, we can also determine the probability distribution for different
numbers of correct guesses out of eight trials on the null hypothesis that your friend has
no exceptional tea-tasting discriminatory ability. The probability of each number of cor-
Copyright © 2018. Taylor & Francis Group. All rights reserved.
rect guesses can be calculated based on the .5 probability of success on each guess and
the number of different ways to get each number of correct guesses. (There’s only one
way to guess none or all eight correctly, but eight different ways to guess one or all but
one correctly, 56 ways to guess two or all but two correctly, and so on.) This probability
distribution is shown in Figure 6.6. It indicates what data we should expect if your friend
is guessing randomly.
But instead of calculating the probability distribution in this way, we could instead
simply find the mean outcome and standard deviation of the probability distribution. In
many instances of statistical hypothesis-testing, this will be the only feasible approach to
developing the probability distribution. We already know the mean outcome. Your friend
is most likely to have four correct guesses out of eight if she is guessing randomly. The
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 227
0.3
0.2
Likelihood
0.1
0
0 1 2 3 4 5 6 7 8
Number of correct guesses
FIGURE 6.6 Probability distribution of the number of guesses your friend will get correct if she
is randomly guessing
standard deviation for the probability distribution can be calculated using the following
formula:
σ = √[mean × (1 − Pr(O=success)]
For the tea-tasting experiment, the standard deviation is √[4 × (1 − .5)] = √2 = 1.414.
Notice that this standard deviation formula is very different from the other standard
deviation formulas we have encountered in this chapter and Chapter 5. In this case, the
task is to calculate the probability distribution of getting the outcome or set of outcomes
you’re interested in, ranging from that happening in no trials to it happening in every trial,
assuming the null hypothesis of chance outcomes (true if your friend is merely guessing).
In the formula, 1 is the total probability, and Pr(O = success) is the probability of success
(here, guessing correctly) in a single trial. Multiplying that by the mean number of suc-
cesses yields the variance; the square root of that number is, then, the standard deviation.
Now, armed with the probability distribution or simply its mean and standard deviation,
we can establish the significance level we’ll require as the line for rejecting the null hypothesis.
This is a decision about how improbable, given the null hypothesis, an experimental result
must be to warrant rejecting the null hypothesis. Different levels of significance can be used
in hypothesis-testing, but .05 is the most common choice. This is a convenient choice because
of the 68–95–99.7 rule: there’s a .95 probability of your friend guessing correctly within two
standard deviations of the mean if she’s guessing randomly. So, there’s only a .05 chance of
Copyright © 2018. Taylor & Francis Group. All rights reserved.
her guessing outside that range if she’s guessing randomly. Another convenient choice would
be to set .003 as the significance level, which corresponds to getting a result outside three
standard deviations of the mean value. We’ll say more about the decision of what significance
level to require later. For now, let’s stick with the customary threshold of .05.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
228 Statistical Inference
collecting the data. You ask your friend to judge ‘tea-first’ or ‘milk-first’ for all eight cups
of tea. She correctly judges all eight cups! Given this data, would you be tempted to
conclude that you were wrong and your friend was right, that maybe she can perceptually
discern something about tea and milk order? Deciding this is the fourth step of statistical
hypothesis-testing.
There are two possibilities: (1) the null hypothesis is true, which means that your friend
has no exceptional ability, or (2) the null hypothesis is false, and the alternative hypoth-
esis that your friend can discriminate between milk-first and tea-first cups of tea is true.
The goal of statistical hypothesis-testing is to do our best to decide whether (1) should
be rejected in favor of (2). This is the question of whether you should believe that your
friend can actually distinguish the order tea and milk were added to a cup.
From the probability distribution developed in step 2 and the outcome of your experi-
ment, when your friend correctly judged all eight cups of tea as tea-first or milk-first, we
can calculate a p-value. This is the probability of the observed data assuming the null
hypothesis is true. The smaller the p-value, the more unlikely the data if the null hypoth-
esis is true. Whether one should reject the null hypothesis is determined by comparing the
p-value and the significance level selected in step two. If the p-value is less than or equal
to the significance level, we can reject the null hypothesis with reasonable confidence. If
the p-value is greater than the significance level, we can’t rule out the null hypothesis.
In the tea-tasting experiment, there are actually two different ways of establishing how
the p-value relates to the significance level. One way is to consult the probability distribu-
tion shown in Figure 6.6. We can see from that figure that there is only a very small chance
of guessing all eight cups correctly via random guesswork, so the p-value of our data is very
low. The precise p-value can be calculated from the probability of guessing all eight cups
correctly. This is easy to find with the multiplication rule; it’s just the probability of the
first guess being correct and the probability of the second guess being correct, and so on,
so it’s 0.5 × 0.5 × 0.5 × 0.5 × 0.5 × 0.5 × 0.5 × 0.5, or 0.58, which is .0039. The p-value,
the probability of your friend guessing correctly on all eight cups by random guesswork,
is only .0039. This means there’s only a 0.39% chance of this happening.
Put another way, if your friend tasted many series of eight cups of tea, she could get
this outcome by guessing randomly only about one out of 256 times of tasting eight cups
of tea. If she guessed all eight correctly by sheer luck, on her first try, she’s really lucky!
And, indeed, the p-value of .0039 is lower than our chosen significance level of .05. The
outcome of this experiment is thus statistically significant: unlikely enough if the null
hypothesis were true that it provides grounds for rejecting the null hypothesis.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
The second way to establish how the p-value relates to the significance level for the
tea-tasting experiment is by using the mean and standard deviation rather than the prob-
ability distribution. By the 68–95–99.7 rule, outcomes that are two standard deviations
away from the mean are the threshold for statistical significance at the significance level
of .05. So, we can check to see whether the observed value is inside or outside that line.
Two standard deviations in this case is 2.828 (or 1.414 × 2), so outcomes outside the
range of 4 (the mean) ± 2.828 are statistically significant. That range is 1.17 to 6.828.
If your friend guessed zero or one cups correctly or seven or eight cups correctly, this
would be grounds for rejecting the null hypothesis. Since she in fact guessed all eight
cups correctly, you should reject the null hypothesis. It looks like you need to believe in
your friend’s tea-tasting superpower!
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 229
We’ve compared the p-value of the experimental outcome to the significance level
using both the probability distribution and the mean and standard deviation, but either
one of these is enough on its own. This comparison provides a simple way to decide
whether to reject the null hypothesis on the basis of the data. Nonetheless, as we remarked
earlier, there is a role for choice in exactly what level of statistical significance to require.
One can always ask whether this outcome is unlikely enough to reject the null hypothesis.
This is a version of the more general decision we’ve seen elsewhere in this book regarding
when there is sufficient evidence to accept some hypothesis.
As we have said, it’s common to draw the line at a significance level of .05. Observed
results with a probability of less than .05 given the null hypothesis are said to be sta-
tistically significant at the .05 level. One can abbreviate this: p < .05. This is, of course,
true of the outcome of our tea-tasting experiment, which is why we rejected the null
hypothesis. Notice that we could still be wrong; it’s always possible that our friend really
was just extraordinarily lucky. But if we instead decided to play it safe and not reject
the null hypothesis, we could be wrong about that as well. We might have then failed to
detect our friend’s tea-tasting superpower. By its very nature, statistical hypothesis-testing
gives no guarantees.
The risk of erroneously rejecting the null hypothesis when it is true is called a type I
error. The risk of erroneously failing to reject the null hypothesis when it is false is called
a type II error. These are the two different ways you could be wrong, and one or the
other is always a risk. The choice of significance level indicates the degree to which you’re
willing to accept the risk of a type I error versus a type II error. Requiring a higher sig-
nificance level—that is, requiring a lower probability for statistical significance—reduces
the chance of a type I error, but it simultaneously increases the chance of a type II error.
Lowering the significance level to a less extreme value reduces the chance of a type II
error, but it increases the chance of a type I error. So, whatever significance level you
settle upon, you might draw the wrong conclusion from your data (cf., Benjamin et al.
2018; Lakens et al. 2018).
Scientists sometimes adjust the conventional .05 line for statistical significance in light
of whether type I or type II errors are riskier. Imagine a new drug is being tested. If the
drug is for a life-threatening illness with no treatment options otherwise—say, pancreatic
cancer or Ebola—and experiments regarding the efficacy of the drug find it works better
than a placebo with a p-value of .055, just missing the line for statistical significance,
researchers may still be inclined to bring the drug to market or at least continue testing.
In contrast, if scientists are thinking about announcing a new particle, and they know
Copyright © 2018. Taylor & Francis Group. All rights reserved.
their colleagues will scrutinize their findings, they may require much greater statistical
significance. Recall that the Higgs boson discovery was announced after the finding that
the probability of the data observed was only one in three million without a boson pres-
ent. This significance level is so close to zero it’s difficult to even display numerically; it
just rounds down to zero.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
230 Statistical Inference
with the chance of someone smoking cigarettes (Thorgeirsson et al., 2010). If you have
this gene, you are more likely to smoke cigarettes. Can researchers tell from your genes
whether you have smoked, or will smoke? As the researchers acknowledged, absolutely
not: there was only a very weak relationship. This, too, can be an advantage but also a
drawback. It can be useful to detect subtle statistical relationships, but weak statistical
relationships are often uninteresting or unimportant. It’s also possible to take too seriously
a statistical relationship that actually has a very small effect size.
Armed with this account of statistical hypothesis-testing and taking into consideration
features like effect size and sample size, you will be better able to critically assess sci-
entific findings based on statistical reasoning, and to distinguish the truly surprising and
important discoveries from the confused and inconsequential results.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 231
EXERCISES
6.12 In your own words, define the terms statistical significance, p-value, type I error, type
II error, and effect size.
6.13 (a) Give an example of a finding that is or would be statistically significant but is
not important. (b) Give an example of a factor that you expect would have a small
effect size.
6.14 Consider a new version of the tea-tasting experiment. Suppose that your friend
samples 10 cups of tea, among which five had the tea poured first and five had the
milk poured first.
a. Calculate the mean outcome and the standard deviation, showing your work.
b. Find the range that defines the .05 significance level.
c. Imagine your friend correctly identified exactly three of the cups. Is the p-value
of this outcome small or larger than the .05 significance level?
d. Should you reject the null hypothesis? Why or why not?
e. Do you risk making a type I error or a type II error? Why?
6.15 It’s estimated that 10% of the general population is left-handed. Imagine testing whether
your group of friends contains an unusually large number of left-handed people. Let’s
say that, in this age of social media, you have 75 friends, 14 of whom are lefties.
a. Write out your null hypothesis and alternative hypothesis, making clear which
is which.
b. Calculate the mean and standard deviation for how many of your group of
friends would be expected to be lefties if the null hypothesis were true.
c. From the information you calculated in (b), set an appropriate significance level
for your test and say why you chose that level.
d. Based on the information you calculated in (b) and the significance level you set
in (c), decide whether to reject the null hypothesis, and justify your decision with
statistical considerations, including citing p-value and/or statistical significance.
6.16 You find yourself wondering whether left-handed people have an unusually high
chance of accidental injury. Recall from Exercise 6.15 that 14 of your 75 friends are
left-handed. Of those 14 left-handed friends, seven have been injured in accidents
of one kind or another in the past two years.
a. Write out your null hypothesis and alternative hypothesis, making clear which
is which.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
232 Statistical Inference
6.17 Each of the following is a bold conjecture that can serve as an alternative hypoth-
esis. For each, (a) formulate the null hypothesis, (b) describe what a type I error
would be and what a type II error would be, and (c) say which kind of error would
be more serious and why.
a. Adding water to toothpaste helps protect against cavities.
b. This man is guilty of murder.
c. The use of social media makes users depressed.
d. The new drug is more effective than the old drug.
e. The new drug is more dangerous than the old drug.
f. Reading books promotes happiness.
6.18 Scientific journals tend to publish statistically significant results much more often than
they publish findings of statistical insignificance. Why do you think this might be?
Considering the earlier discussion about power, type I and II errors, and effect size,
can you think of any potential problems with this practice?
6.19 Classify each of the following statistical techniques as belonging to descriptive sta-
tistics, statistical estimation, or statistical hypothesis-testing. Give your rationale for
each answer.
a. Displaying a data set in a chart
b. Surveying a group about their pizza preferences to decide if they have an
unusual preference for anchovies
c. Surveying a group about their pizza preferences in order to place an order
d. Calculating the sample mean and sample standard deviation
e. Surveying a group about their pizza preferences in order to guess what all
Canadians’ pizza preferences are
f. Finding the mean level of preference for anchovies on pizza among a group
and the standard deviation in that level of preference
g. Rejecting a null hypothesis on the basis of data
h. Finding the correlation coefficient of a data set
6.20 In Chapters 5 and 6, we have seen different formulas for mean and standard devia-
tion and different uses for these as well. Write out the proper mean and standard
deviation formulas for each of the following: (a) representing the frequency distribu-
tion of a data set, (b) estimating mean and standard deviation in a population from
a sample, and (c) establishing the probability distribution for outcomes given the
null hypothesis. Describe how each relates to observed frequency distributions and
probability distributions. Finally, describe how the differences among the formulas
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 233
extraordinarily lucky, or maybe she had a way of cheating. Regardless, perceptually dis-
cerning tea-first or milk-first by taste seems nearly impossible. In contrast, a friend who
could discern two different types of wine in a blind taste test wouldn’t really be that
surprising. The same success rate may thus lead us to want to reject the null hypothesis
of random guessing for wine-tasting but not reject the null hypothesis of random guessing
for tea-tasting. That is, we may have different expectations—prior to any experimenta-
tion—regarding different hypotheses. The trouble is that statistical hypothesis-testing as
we have described it doesn’t have any way to take account these different expectations
(Lindley, 1993).
Finally, here’s a third problem. The probability of the observation given the null
hypothesis, the p-value, doesn’t directly relate to the alternative hypothesis at all. This only
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
234 Statistical Inference
tells you something about the relationship between the null hypothesis and the observed
data. And yet, the alternative hypothesis, the bold and speculative conjecture, is what
scientists are truly interested in knowing about. How likely is the alternative hypothesis
to be true? This is the million-dollar question in hypothesis-testing. But classical statistics
gives us no way to answer that question.
Pr(H) is called the prior probability of the hypothesis; Pr(H|O) is called the posterior
probability of the hypothesis. This is because Pr(H) is our rational degree of belief before
making the observation, that is, prior to the observation, while Pr(H|O) is our rational
degree of belief after (posterior to) making the observation. Taking into account the
prior probability of a hypothesis enables us to hold implausible hypotheses to a higher
standard of evidence than plausible hypotheses are held to. We’d look for more support
before agreeing that our friend can tell the order in which milk and tea were added to
her cup than we would before agreeing that our friend can tell the difference between
the tastes of two different kinds of wine. Our prior probability for the former is lower
than it is for the latter.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 235
Bayes’s theorem takes three things as input: the prior probability of the hypothesis
under investigation, Pr(H); the probability of the observation given the hypothesis (in
other words, if the hypothesis is true), Pr(O|H); and the probability of the observation
under all possible hypotheses, Pr(O). When these numerical values are available—a
major source of controversy for Bayesian statistics—we can use them to calculate the
probability of the hypothesis given the observation that has been made (or the data
gathered). And this, again, is the main thing scientists want to discover from statistical
hypothesis-testing.
If Pr(H|O) > Pr(H), then we say that the observation O confirms hypothesis H.
That is, an observation confirms a hypothesis if the probability of the hypothesis,
a rational degree of belief that the hypothesis in question is true, goes up once
the observation has been made. So, comparing the prior and posterior probabilities
shows us whether an observation confirms or disconfirms a hypothesis and by how
much. A big increase in probability implies a large degree of confirmation, and a
small increase implies a small degree of confirmation; a big decrease in probability
implies a large degree of disconfirmation, and a small decrease implies a small degree
of disconfirmation.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
236 Statistical Inference
the observation that’s been made. Unlike classical statistics, this provides a comparative
approach to hypothesis-testing.
Another approach is a kind of shortcut. This approach is to compare not posterior
probabilities of different hypotheses but the probability of the observation given each
hypothesis, or Pr(O|H1) versus Pr(O|H2). These probabilities are usually easier to find
than posterior probabilities.
Consider this example. Imagine that Lasha and Janine are interested in public opinion about
the theory of evolution. Based on their separate research, Lasha believes that 70% of the public is
convinced by the theory of evolution, while Janine believes that 60% of the public is convinced.
They decide to query 100 randomly selected people about their opinions. Based on their differ-
ent hypotheses, Lasha and Janine can predict what they will observe: Lasha predicts that about
70 / 100 will say they believe the theory of evolution; Janine’s prediction puts that number
at about 60 / 100. In fact, using tools of inferential statistics described earlier in this chapter,
we can find the probability distribution each predicts for this random sample of 100 people.
As it turns out, of the 100 people in the sample, 62 said they are convinced by the
theory. According to the probability distribution based on Lasha’s hypothesis of 70% belief
in evolution, this observation has a probability of .02, that is, Pr(O|H1) = .02. According
to the probability distribution based on Janine’s hypothesis of 60% belief in evolution,
this observation has a probability of .08, that is, Pr(O|H2) = .08.
An observation favors one hypothesis over a second hypothesis to the degree that the
first hypothesis predicts the observation better than the other hypothesis. Given Janine’s
hypothesis, the observed result is much more likely than it is given Lasha’s hypothesis.
This can be expressed numerically with the Bayes factor, which is the ratio of the prob-
ability of the observation given the first hypothesis to the probability of the observation
given the second hypothesis, that is Pr(O|H1) / Pr(O|H2). The Bayes factor expresses the
discriminatory power of the evidence with respect to the two hypotheses. In this case,
the Bayes factor is .08 / .02, or 4. This means that the result of the survey favors Janine’s
hypothesis over Lasha’s by a factor of four.
Here’s a shorthand method for calculating the Bayes factor in circumstances like this
(random sampling, independent outcomes, and different hypotheses about the distribution
of the values of a binomial random variable). If Lasha’s hypothesis is right, each individual
has a 0.7 probability of saying he or she believes the theory of evolution; if Janine’s hypoth-
esis is right, each individual has a 0.6 probability of saying he or she believes the theory of
evolution. The Bayes factor can be found by comparing these probabilities. In particular:
× (Pr(no|H2)(# of no answers)]
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 237
Bayesian conditionalization shows us how. These updated beliefs are our new prior prob-
abilities for hypotheses, which are then the basis for assessing how to respond to the next
observation. There’s even a slogan for this: ‘Today’s posteriors are tomorrow’s priors’.
Here’s an example. Around age 40, most women undergo routine mammography
screening. Mammograms are x-ray photographs of the breast tissue, which can be used
to screen for breast cancer in women who otherwise have no signs or symptoms of the
disease. Suppose you are a doctor and that one of your patients is a 50-year-old woman
with no symptoms who is participating in routine mammography screening. She tests
positive. She is alarmed for obvious reasons and immediately wants to know from the
doctor—you—whether she has breast cancer. You can’t tell her that without more test-
ing, but you can tell your patient the probability that she has breast cancer given the
positive test result and the probability that the result was a false positive. That is, you
can calculate Pr(H1|O) and Pr(H2|O), where the first hypothesis is that she has breast
cancer and the second hypothesis is that it was a false positive. You need three pieces of
information for the calculation:
1. The probability that a 50-year-old woman has breast cancer is around 1%.
2. If a woman has breast cancer, the probability that she tests positive is around 90%.
3. If a woman does not have breast cancer, the probability that she tests positive anyway
is around 9%.
Given this data set (which is always being updated; visit <www.cancer.gov/types/breast>
for current statistics), how should you answer the patient’s questions in light of the
screening results?
Here, again, is Bayes’s theorem: Pr(H|O) = Pr(O|H)Pr(H) / Pr(O). This theorem can
be rewritten in a form that’s easier for the task at hand:
Pr(O|H) and Pr(O|not-H) are the probabilities of the observation, given a specific hypothesis
and the negation of that hypothesis, just like the alternative and null hypotheses. We have
this in this example. The two hypotheses under consideration are that your patient has
breast cancer (H1) and that the test was a false positive (H2), which is another way of saying
that your patient doesn’t have breast cancer. This version of Bayes’s theorem simplifies the
calculation by eliminating the need to find Pr(O), the overall probability of the observation.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Your patient is looking for Pr(H1|O) and Pr(H2|O), so we’ll need to use Bayes’s theo-
rem on each hypothesis. To start, for each, we need to find the prior probability of the
hypothesis in question and the probability of the observation given the hypothesis. From
these numbers, we can calculate the posterior probability of each hypothesis, given the
observation of the positive test result.
For the first hypothesis, that your patient has breast cancer, the prior probability, Pr(H1),
is the rate of breast cancer in the general population (#1 above). Before the exam, the
rational degree of belief in the hypothesis that your patient has breast cancer is just the
disease’s incidence in the population, so Pr(H1) = .01. The likelihood of the positive test
result given the first hypothesis (that is, if it’s true that your patient has cancer) is 90%
(#2 above). So, Pr(O|H1) = .90.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
238 Statistical Inference
For the second hypothesis that the test was a false positive, the prior probability,
Pr(H2), is the rate in the general population of not having breast cancer, which is 100%
of the population minus the 1% that does have breast cancer, or 99%. So, Pr(H2) = .99.
The likelihood of the positive test result given the hypothesis of a false positive is 9%
(#3 above). So, Pr(O|H2) = .09.
Now we can calculate both Pr(H1|O) and Pr(H2|O):
We are imagining that your patient has just tested positive for breast cancer. We have
found that, given this positive result, she has a .0917, or 9.17%, chance of having breast
cancer and a .908, or 90.8%, chance of getting a false positive on the test. It’s true your
patient should be concerned; her chance of breast cancer just increased from 1% to more
than 9%. But she shouldn’t be as concerned as she no doubt is: there’s no guarantee she
has breast cancer, and in fact, there’s over a 90% chance that she does not.
probabilities influence posterior probabilities, and so subjective starting points can find
their way even into conclusions based on data. This possibility seems to undermine the
objectivity of Bayesian reasoning. This is perhaps the main challenge facing Bayesian
statistics, and it’s received a lot of attention.
Some responses to this challenge about subjectivity have been to develop rules for
how prior probabilities should be established. A different kind of response is to argue that
the variability in prior probabilities is a good thing. Different people often have different
background beliefs, and one might think these different background beliefs should be
taken into account. Different choices of priors make it transparent how two scientists’
judgments differ. So instead of lurking in the background, with unclear influence on sci-
ence, different background beliefs and how they influence scientific judgment are brought
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 239
into explicit consideration by Bayesian statistics. What’s more, this transparency in prior
beliefs enables rational disagreement. Scientists should be able to provide justification for
particular choices of prior probabilities, articulating what sorts of theoretical or empirical
considerations informed their choice. In this respect, Bayesian and classical statistics are in
similar situations. When testing hypotheses or making general inferences, scientists using
classical statistics must decide on sample size, which kind of statistical test to employ, and
so forth. These decisions are also open to criticism, and scientists making these decisions
should be able to justify them.
However, some remain unconvinced by this argument for subjectivity. The choice
of prior probabilities in Bayesian statistics is a kind of direct influence of background
beliefs on scientists’ beliefs about hypotheses under investigation, which many scien-
tists are uncomfortable with (Gelman & Hennig, 2017). And so far, no rule for how
prior probabilities should be established is both broadly applicable and enjoys broad
support.
A second problem for Bayesian statistics is that it’s not obvious that Bayesian con-
ditionalization, in which one updates one’s belief in accordance with posterior prob-
abilities, is always the right thing to do. Some have suggested that abductive reasoning,
or inference to the best explanation, is a better alternative. Recall from Chapter 4 that
when people engage in abductive reasoning, they use explanatory considerations as
evidence to support one hypothesis over others. You see cheese crumbs, small drop-
pings, and some chewed-up paper, and so you might reason that a mouse resides in
your kitchen. But does that inference follow Bayesian conditionalization? It’s not clear
it does. The kind of work and reasoning performed by some scientists, such as paleon-
tologists, is akin to CSI-style forensic work. They gather different pieces of evidence
from several fields, and on the basis of that evidence and explanatory considerations,
they weed out implausible hypotheses and develop the most plausible hypothesis
about the distant past of life on Earth. Bayesian conditionalization may not capture
this explanatory leap.
There is no universal method for statistical inference. There are different approaches
to classical statistics, an alternative framework of Bayesian statistics, and even differ-
ent approaches to Bayesian statistics. All of these offer tools that scientists can use
in hypothesis-testing, depending on the type of hypothesis to be tested, the type of
experiment or observational study that will be run, and the nature of the relevant back-
ground knowledge. Bayesian statistics is perhaps a better guide to belief than classical
statistics when prior probabilities can be reliably estimated, as with medical diagnoses
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
240 Statistical Inference
EXERCISES
6.21 In your own words, describe three problems for the classical statistics approach to
hypothesis-testing.
6.22 Write out the mathematical formula for Bayes’s theorem, then state what it means in
your own words. Write out the definition of conditional probability from Chapter 5.
Bayes’s theorem can be derived from this definition; describe anything you notice
about how the two relate.
6.23 Describe two ways in which Bayes’s theorem can be used in inferential statistics,
including what you can accomplish with each. Illustrate each of these two main uses
of Bayes with a simple example (imaginary or real).
6.24 Suppose that you are being screened for a disease that affects about one person in
1,000. You have no symptoms, and the test is accurate 90% of the time. That is, if you
actually have the disease, then the test result is positive with 90% probability, and if
you do not actually have it, the test result is negative with 90% probability. After several
anxious minutes, the test results come back: positive! How worried should you be?
a. Find the prior probability of the hypothesis that you have the disease, the prior
probability of the hypothesis that you don’t have the disease, the probability of
the test result given the hypothesis that you have the disease, and the probability
of the test result given the hypothesis that you don’t have the disease.
b. Use Bayes’s theorem with these probabilities to calculate your chance of having
the disease given your positive test result. Describe how concerned you think
you should be in light of your positive test result.
c. Consider that, out of 1,000 people, 100 will test positive. About how many of
those people will actually have the disease? Does this consideration change
your reasoning in (b)?
6.25 A small company has bought three software packages to solve an accounting prob-
lem. These packages are called Fog, Golem, and Pear. On first trials, Fog crashes
10% of the time, Golem 20% of the time, and Pear 30% of the time. Of 10 employ-
ees, six are assigned to Fog, three are assigned to Golem, and one is assigned to
Pear. Jan was assigned a program at random. It crashed on the first trial.
What is the probability that Jan was assigned Pear? You can answer this question
by finding the posterior probability of Jan being assigned to Pear given that the
program crashed from the prior probability of Jan being assigned to Pear and the
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
Statistical Inference 241
c. Do you find the change in Bayes factor in this alternative scenario surprising?
Why or why not?
6.27 Imagine that you are a lawyer with a client who has been accused of commit-
ting a heinous crime. Your client’s DNA matches some of the traces found on
the victim. This is the only piece of evidence against her, but it is a serious one.
The court is told that the probability that this match occurred by chance is one in
100,000 (or 0.001%). Do you believe this proves your client is guilty? Why, or why
not? (Hint: consider what the numbers mean in terms of frequencies. Out of every
100,000 people, one will show a match. If you live in a city with two million people,
for example, how many will have DNA matching the trace on the victim?)
6.28 In your own words, describe (a) three different types of problems for the Bayesian
approach to statistics and (b) three different advantages that the Bayesian has over
the classical approach to statistical testing.
FURTHER READING
ing the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York:
Routledge.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:03.
CHAPTER 7
Causal Reasoning
• Describe the difficulty about causal claims that worried David Hume
• Give three reasons why correlation and probabilistic dependence don’t guarantee
causation
• Describe the physical process and difference-making accounts of causation
• Indicate how each of the following informs the investigation of causal relationships:
spatiotemporal contiguity, correlation, probabilistic dependence, causal background
• Analyze whether a cause is necessary, sufficient, or probabilistically related to an
effect, and gauge the strength of a probabilistic causal relationship
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
FIGURE 7.1 Annual seismic activity in Oklahoma 1978–2017
Sources: USGS-NEIC ComCat & Oklahoma Geological Survey; Preliminary as of July 4, 2017
Copyright © 2018. Taylor & Francis Group. All rights reserved.
FIGURE 7.2 USGS map showing locations of wells related to seismic activity 2014–2015
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
244 Causal Reasoning
The answer may seem obvious. How could there be such a dramatic rise in earthquakes
in Oklahoma if fracking weren’t the cause of them? Lobbyists and other advocates for the
US oil and gas industry are quick to remind everyone that correlation does not guarantee
causation. Just because fracking has increased and seismic activity has increased—that is,
just because these are correlated—doesn’t mean that one caused the other. Some other
unknown cause might be responsible for the earthquakes. However, while not all cor-
related types of events are causally related, correlation does raise the question and can
even be the proverbial ‘smoking gun’ for causation. We need to look more closely to
know whether fracking causes earthquakes or if there is an alternative explanation for
the correlation. And, in fact, the answer is a bit subtle.
There is evidence of some type of relationship between fracking in particular and the
uptick in seismic activity in Oklahoma. But how are they related? Since 2009, most of
the Oklahoma earthquakes have been located very close to fracking wells, which pump
massive volumes of liquid up to the surface. The spatial correlation of wells and earth-
quakes adds some support for the idea that fracking is, in some sense, involved in the
rising numbers of earthquakes.
However, geologists, hydrologists, and the other scientists involved in the US Geological
Survey—a federal agency devoted to the scientific study of the American landscape and
the natural hazards that can threaten it—have concluded that earthquakes resulting
directly from fracking tend to be relatively minor in Oklahoma. So fracking operations
are highly correlated with the dramatic increase in seismic activity in Oklahoma, but they
are not directly responsible for causing many of those earthquakes. Instead, wastewater
injection from both fracking and non-fracking wells appears to be more directly respon-
sible for the increase in earthquakes in Oklahoma.
During hydraulic fracturing, some of what’s pumped up is oil, while some is the by-
product of fracking: salty, sandy, chemically treated wastewater. After capturing oil and
gas, corporations inject large volumes of this wastewater back into disposal wells. Doing
this raises the pressure within the pores of a hydrocarbon reservoir over large areas, which
tends to shift subterranean stress. And shifting stress tends to destabilize preexisting faults.
This subterranean stress from wastewater injections back into the Earth’s sedimentary
formations has been implicated as one cause of seismic activity. The US Geological Survey
results identifying wastewater injection as the primary cause of the significant uptick in
Oklahoma earthquakes are fairly conclusive.
Of course, energy industry operations involved in fracking are still the culprit. Even if
fracking does not directly cause earthquakes, it is not causally unrelated either. Cease all
Copyright © 2018. Taylor & Francis Group. All rights reserved.
fracking activity, and the volume of wastewater injected back into the earth will signifi-
cantly diminish; so too will the risk of earthquakes. So while not solely to blame, fracking
is one of several oil and gas operations that are together causing increased seismic activity.
Over the last decade, Canada has seen a similar increase in seismic activity, which has
also been tightly correlated in time and space with fracking. Researchers documented
more than 900 seismic events near shale drilling sites in northwest Alberta and observed
a pattern between the timing of fracking operations and the timing of earthquakes (Bao
& Eaton, 2016). In this case, scientists found that both the increase in pressure during
fracking operations and the increase in pressure from wastewater injection induced seis-
mic activity. The triggers for induced seismicity in Alberta may be different from those
in Oklahoma.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 245
Thus, existing evidence indicates that fracking does play some causal role in producing
seismic activity but that the role it plays may be not be simple. Fracking’s causal role may
be modulated by other factors, such as the local geology of Alberta versus Oklahoma,
and the extent to which it plays that causal role may be changed by other causes of
earthquakes, like wastewater injections back into the earth. It is an important task for
seismologists, geologists, and hydrologists to clarify the complex web of causal relation-
ships that lead to earthquakes. More generally, unravelling the causes underlying complex
phenomena like polio or cholera epidemics, global warming, or economic crises is usually
a tricky process; scientific investigation is our best hope for doing so.
we can make things happen—and prevent things from happening—in the world. Besides
the effects of fracking on seismic activity and other features of our environment, causal
reasoning is also crucial for inferring the effects of economic policies like tax rates, for
inferring medical conditions from symptoms, and for establishing legal responsibility or
liability, among many other things. Good causal reasoning thus can be an urgent matter
of scientific and practical importance.
It’s no wonder, then, that causal reasoning is a central feature of science. This chapter
explores how the scientific tools encountered thus far in this book—especially experi-
mentation and observational studies, modeling, and inference using logic, probability, and
statistics—are used to identify causal relationships. This chapter thus refers to several ideas
from earlier chapters that are helpful to clarify what’s involved in good causal reasoning.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
246 Causal Reasoning
that causal perception depends on spatial and temporal information. If two events—for
example, pressing a piano key and hearing B sharp—are spatially and temporally contigu-
ous, that is, if they happen at the same time and place, then we perceive them as causally
related without requiring repeated exposure to those events. When there is a spatial or
a temporal gap between two events, we are much less likely to perceive the one event
as causing the other.
Although spatiotemporal cues can be an important element of the perception of causal
relations, they are not always a reliable guide. Sometimes they mislead us. It can be
mistaken to conclude that one event causes another simply because the events occur in
succession close to each other. A child in Oklahoma might stamp her foot right before
an earthquake, but we know the stamp couldn’t have caused the quake. The mistake of
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 247
reasoning from spatiotemporal succession to causation is named, from Latin, the post hoc,
ergo propter hoc fallacy (‘after this, therefore because of this’).
So, spatiotemporal contiguity doesn’t guarantee causation. It’s not necessary for cau-
sation either. Many causes are separated from their effects in time and even space. For
instance, when you hang out with a friend who has the flu, you may begin to feel ill a
few days later. In this case, your friend’s flu caused your own illness, despite an interven-
ing delay. And when you play a video game, pressing buttons on a remote control causes
changes in the game, even though the two events happen in different places.
Indeed, many of the cause-effect relationships investigated in science, and important
for everyday life, are spatiotemporally separate to some degree. Sometime the degree
of temporal separation is used to distinguish among the causes of an event. Proximate
causes are those that occurred more closely in time and place to the event that was
caused, while distal causes occurred further back in time or place from their effects. For
example, when asked about the cause of your illness, you may cite your friend’s recent
case of the flu. Or you might instead reply that we’re in the midst of flu season, and this
year’s seasonal flu has spread extensively. The former cause is proximate, the latter distal.
As the fracking example illustrates, identifying a cause of some event doesn’t imply
that you have identified the cause or have ruled out other causes. The distinction between
proximate and distal causes shows that any event may have multiple causes. One way to
think about this is in terms of ‘chains’ of causation—like neural firings that contracted the
muscles, that moved the hand, that pushed the cue stick, that hit the cue ball, that hit
the 8-ball into the corner pocket. Such causal chains go back and back and back. Another
way to think about the multiple causes of some event is in terms of complex webs and
networks: all the different factors that contributed to bringing about some outcome. The
8-ball’s moving was caused not only by my cue ball hitting it, but also by my choosing
to go to the pool hall and picking up the cue stick, the 8-ball resting where it in fact
was, the cue stick being chalked by the previous player, the billiard cloth having a certain
smoothness, and so on. Whether you think of causal relationships in terms of chains or
in terms of contributing factors, it seems clear that causal relationships are everywhere.
the variation in their values shows some trend. If the values of two variables are cor-
related, then we may wonder if one causes the other. For example, imagine you have
always observed that whenever the price of a beer at your local pub is $5, there are fewer
customers than when the price is $3. This is a correlation. You may wonder, based on
this, whether the increased price of the beer decreases demand for beer. This is a causal
claim. You may think this is so even if the timing doesn’t match; maybe customers only
start to trail off a while after the price of beer has gone up.
While correlation is a guide to causation, it’s an imperfect guide. Correlation can
exist when causation does not. For one thing, correlation is symmetric: if an event A
correlates with another event B, then B correlates with A as well. But causation isn’t
symmetric. Having cancer correlates with death, and death correlates with having cancer,
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
248 Causal Reasoning
but cancer causes death and not the other way around. In other cases, neither correlat-
ing event causes the other, but they share a common cause—a third event that causes
both. Ice-cream consumption and homicide rates are famously correlated, but eating
ice cream is not a cause of murder, nor does committing murder cause ice-cream eat-
ing. Instead, there is some evidence that hot days increase both ice-cream consumption
and homicide rates.
There are also spurious correlations, where two types of events happen to be corre-
lated but are not related in any interesting way, causally or otherwise. For example, from
2000 to 2009, data from the US Dairy Association regarding per capita cheese consump-
tion and data from the Centers for Disease Control regarding the numbers of people who
died by becoming tangled in their bedsheets were highly correlated (see Figure 7.3), but
obviously there’s no causal relationship connecting these variables.
Causal relations between events can also exist even when they don’t seem to be
correlated. Philosopher Nancy Cartwright has suggested the following example in
which counterbalanced causal relationships cancel each other out. Smoking cigarettes
is well established as a cause of heart disease. It’s also the case that adequate exercise
prevents heart disease. If, for whatever reason, smoking is strongly correlated with
exercise, then a well-established cause of heart disease will also be strongly correlated
with its prevention, and smoking and heart disease will not generally correlate. But,
smoking would remain a cause of heart disease (Cartwright, 1989). Here’s another
example. Pregnancy is a cause of thrombosis, which involves blood clots forming inside
blood vessels. Since taking contraceptive pills reduces the chance of pregnancy, one
might hope that taking contraceptive pills indirectly prevents thrombosis. However,
taking contraceptive pills is also a cause of thrombosis. So contraceptive pills prevent
thrombosis by reducing the chance of pregnancy, while also causing thrombosis. If
these opposed influences exactly cancelled each other out, then thrombosis and taking
contraceptive pills will not exhibit a statistical correlation even though the two events
are related causally (Hesslow, 1976).
FIGURE 7.3 Visualization of the correlation between per capita consumption of cheese and
number of people who died from getting tangled in their bedsheets
Reproduced under Creative Commons, <http://tylervigen.com/>.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 249
So, while correlation is a guide to causation, causation doesn’t just boil down to corre-
lation. There must be something more to causation, Hume’s skepticism notwithstanding.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
250 Causal Reasoning
guarantee the occurrence of the effect, then the cause is not a sufficient cause.
Some causes are needed for an effect to occur but may not by themselves guarantee the
effect. To say ‘oxygen causes combustion’ is to say that combustion never occurs without
oxygen present, although oxygen is often present in the air without causing fires. This is
a necessary cause: the causal condition must be present for the effect to occur, but the
cause might sometimes occur without bringing about the effect. If the occurrence of a
cause isn’t required for the occurrence of the effect, then the cause is not a necessary cause.
So, sufficient causes guarantee their effects, while necessary causes are required for their
effects. This should bring to mind the discussion of necessary and sufficient conditions
from Chapter 4. It can be useful to keep in mind the difference between necessary and suf-
ficient causes. Knowledge of sufficient causes empowers us to bring about desired effects.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 251
If we introduce the causes that are sufficient to bring about an effect, we’re guaranteed
that effect will occur. To have healthy teeth, for example, it’s ordinarily sufficient to brush,
floss, and visit the dentist regularly. Knowledge of necessary causes enables us to prevent
some effects from happening. If we remove just one necessary cause, this will eliminate
the effect. For example, spaying or neutering one’s pets prevents unwanted kittens or
puppies, regardless of what other conditions occur. This is because intact reproductive
systems are necessary for reproduction. And abstaining from excessive drinking prevents
hangovers and liver cirrhosis, because significant alcohol consumption is necessary for
both of these health conditions.
Although there’s a useful distinction between necessary and sufficient causes, matters
are often not so simple. For many putative necessary conditions, alternative causes can be
found. For example, having sex is usually necessary for sexual reproduction, but it isn’t
always; in vitro fertilization is an alternative. Likewise, for many putative sufficient causes,
exceptions can be found too, when the cause doesn’t bring about the effect as expected.
Raising the price of goods, like beer, does not always decrease demand. Sometimes instead,
demand is sustained by institutions that regulate the market.
The exceptions to sufficient and necessary causal relationships hint at the importance
of background conditions for causal relationships, what we might call the causal back-
ground. The causal background of two events comprises all the other factors that actually
do, or in principle might, causally influence these two events, thereby also potentially
affecting the causal relationship between the two events. Oftentimes causal background
is ignored when causal claims are made, but it’s actually crucial for causal relationships
to occur as expected.
Revisiting a couple of our previous examples shows that causes only count as suffi-
cient or necessary assuming a given causal background. Brushing, flossing, and visiting the
dentist regularly is sufficient to ensure healthy teeth if your dentist is qualified and (say)
you haven’t already had all your teeth removed. And spaying and neutering one’s pets
works to prevent unwanted kittens and puppies because pets having intact reproductive
systems is necessary for new kittens or puppies if in vitro fertilization isn’t employed and
no stray kittens and puppies show up at your house.
Some causal relationships may even have exceptions within a given causal background.
Consider again the example of fracking causing earthquakes. If this is true in Alberta but
not in Oklahoma, this may well be because of different causal backgrounds in those two
locations, perhaps having to do with geological features. If fracking causes an earthquake
at one site in Alberta but not at another, is this also due to different causal backgrounds
in those two locations, or is it pure chance? Occurrences of a cause do not always lead to
occurrences of its effect, either because causation itself is probabilistic or because causal
backgrounds vary. Here’s another example. There are people who smoked two packs of
cigarettes a day without ever getting cancer, even though smoking does cause cancer.
Is this because smoking causes cancer probabilistically or because some feature of the
causal background prevents some people from getting cancer? This is a matter of debate.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
252 Causal Reasoning
A factor that increases the likelihood of an event occurring despite being neither neces-
sary nor sufficient for the effect is called a contributing cause or partial cause. Contributing
causes are much more common than truly necessary or sufficient causes. For this reason,
it is useful to think about causation probabilistically. Usually, a cause raises the probability
of its effect. This idea can be formalized in terms of conditional probabilities, which we
discussed in Chapter 5. For a cause C and an effect E,
The effect is (usually) more likely to occur if the cause occurs than if the cause doesn’t
occur. This idea is deeply related to correlation as a guide to causation.
The probabilistic relationship that generally holds between causes and their effects
can also be exploited beyond the observation of correlations. Recall the difference-
making account of causation. Well, if researchers bring about some event and observe
a resulting increase in the frequency of a different event, this is some evidence that
the first causes the second. Even better if this intervention is carried out when extrane-
ous variables are controlled directly or indirectly. This enables the causal background
to be held fixed or to vary randomly, leaving the intervention on the suspected cause
as the only difference between the circumstances in which the suspected effect does
and doesn’t occur. This relates deeply to our discussion of experimental design in
Chapter 2. If you suspect that playing video games causes violent behavior, you might
ask one group of people to play several hours of video games and another group of
people to do something else like read books, and then query them about their moods
and dispositions afterward. If more video game players are agitated or aggressive or
disposed to act violently, this may point at the video games as the culprit—the cause
of violent behavior.
Thinking about causation in terms of conditional probabilities also provides a way to
define the strength of a causal relationship. If Pr(E|C) = 1 and Pr(E|not-C) = 0, then the
cause is both necessary and sufficient for the effect, in any causal background(s) where this
is true. When the cause occurs, so does the effect; when the cause is absent, so is the effect.
For probabilistic causal relationships, the stronger they are, the closer they will be to this
ideal. You can judge the strength of a causal relationship with the following calculation:
Notice that a necessary and sufficient cause will result in the maximum value of 1. If,
at the other extreme, there is no difference in the probability of E when C is present
(holding fixed the causal background), then the occurrence of C is causally irrelevant to
the occurrence of E. For the video gaming and violence example, this would correspond
to the finding that the experimental and control groups do not differ in their levels of
violent behavior. The strength of most causal relationships is somewhere in between the
two extremes of perfect guarantee and irrelevance.
We have already discussed that causation doesn’t just boil down to correlation. The
same goes for probabilistic dependence. Changes in causal backgrounds can interfere with
probabilistic dependence. Beyond this, probabilistic dependence may change in different
causal backgrounds and only hold in some causal backgrounds. Smoking may not raise
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 253
the probability of someone getting heart disease if the person also starts a serious exercise
regime at the same time. And smoking does not raise the probability of someone getting
cancer if the person already has cancer. Also, probabilistic dependence, like correlation,
doesn’t distinguish among causes, effects, and events that are correlated but not causally
related.
All of these are reasons why causation isn’t just probabilistic dependence. These are
also reasons for ensuring good experimental design when looking for probabilistic depen-
dence. Intervention is a way of isolating the expected cause, which avoids mistaking an
effect for a cause or events that share a common cause with cause and effect. And hav-
ing a control group is a way of controlling for the influence of the causal background.
These steps enable researchers to determine which events truly make a difference to the
occurrence of other events.
with low admission rates, whereas more men applied to less competitive programs
with higher admission rates. The positive correlation between rejection and being
a woman was thus due not to gender itself but to a correlation between gender
and the competitiveness of the program applied to. This is an instance of Simpson’s
paradox, described in 1951 by the British statistician Edward Simpson. Simpson’s
paradox demonstrates the importance of considering the causal background. A
correlation between two types of events can disappear, or be reversed, when
data are grouped in a different way, because different groupings take into account
different factors in the causal background (here: the competitiveness of different
graduate programs).
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
254 Causal Reasoning
EXERCISES
7.1 Describe Hume’s worry about causal reasoning in your own words. Evaluate the merits
of his concern, taking into account the main points of discussion throughout this section.
7.2 Define correlation, and give three examples of events that you believe are correlated.
7.3 Describe how correlation and probabilistic dependence relate to causation. Give
an example of a causal relationship that results in straightforward correlation, an
example of a causal relationship that in some contexts does not seem to result in cor-
relation, and an example of a correlation that is not due to a causal relationship.
7.4 Describe each of the following scenarios as a causal claim put in terms of difference-
making, and then as a causal claim put in terms of chains of physical processes. You
might need to invent some details about these causal relationships to give a thorough
answer—feel free to get creative.
a. The high tide washing ocean debris up to a certain point on the beach
b. Your pickup basketball team winning its game yesterday
c. Smoking causing lung cancer
7.5 What do you think the advantages are to thinking about causation in terms of differ-
ence-making? How about the disadvantages? What are the advantages and disad-
vantages of thinking about causation in terms of physical processes?
7.6 Describe what each of the following is and how each informs, or is taken into ac-
count, in the investigation of causal relationships: spatiotemporal contiguity, correla-
tion, probabilistic dependence, and causal background.
7.7 Give a novel example of each of the following:
a. A causal relationship that violates spatial contiguity
b. Events at the same place that are not causally related
c. A causal relationship that violates temporal contiguity
d. Events at the same time that are not causally related
e. A causal relationship and causal background in which the cause is not correlated
with the effect
f. Two correlated events that are not cause and effect
7.8 Define proximate causes and distal causes. Then, for each of the following events,
describe a more proximate cause and a more distal cause. You might need to invent some
details about these causal relationships to answer this question; feel free to be creative.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 255
7.10 Write down the formula regarding conditional probabilities that gives the strength
of causal relationships. Then, considering that formula, order the following causal
relationships from strongest to weakest:
a. Brushing your teeth, flossing, and visiting the dentist prevents cavities.
b. Frequent smiling increases well-being.
c. Eating pizza prevents getting the flu.
d. Consuming anabolic steroids improves physical strength.
e. An increase in the minimal wage produces higher attendance at football games.
f. Warmer summers lead to longer periods of drought.
7.11 For each of the causal relationships in 7.10, name one feature of the causal back-
ground that would make the causal relationship stronger and one feature of the
causal background that would make the causal relationship weaker. It might help
to consider the conditional probability relationship that gives the strength of causal
relationships.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
256 Causal Reasoning
The idea of difference-making is much more useful for causal analysis in many fields
of science. Scientists have at least two methods to go beyond statistical information about
correlation to uncover difference-making relationships. One method is to run an experi-
ment—ideally, a perfectly controlled double-blind experiment, as detailed in Chapter 2.
Another method, when experimentation isn’t feasible, is to construct a causal model
and rely on statistical information about variables of interest to make causal inferences.
This section discusses how experiments can be used to uncover causal relationships;
causal modeling will be addressed in the next section. We have covered topics related
to testing causal hypotheses and causal modeling earlier in the book, including Chapter
2’s discussion of experimentation, Chapter 3’s discussion of modeling, and Chapter 6’s
discussion of statistical hypothesis-testing. But let us reconsider these topics now with an
eye to how they relate to causation in particular.
Let’s suppose that you are a farmer and you are interested in finding out whether
using a new fertilizer will increase your crop yield. This involves a causal hypothesis.
How would you test it?
One way would be to try out the fertilizer on your crops this year and see what kind
of a yield you get. But the causal background might vary from last year to this year in
a way that affects crop yield. You wouldn’t be able to distinguish that influence from
the specific effect of the fertilizer on the yield. What you want to know is whether the
fertilizer makes a difference to crop yield.
A better approach would be to divide your field into different plots of equal size. You
can then use the new fertilizer on some of the plots but not on the others. After some
time, go to your field and compare the crop yield from the fertilizer plots to the crop yield
from the other plots. If the plots treated with the new fertilizer produced, on average,
a larger crop yield than the other plots, then the fertilizer made a difference. If the two
groups of plots yielded about the same amount of crop, then the new fertilizer is probably
useless (or no better than your old fertilizer, if that’s the comparison you were studying).
If the fertilizer plots do worse, the fertilizer makes a difference—but the wrong kind!
Let’s redescribe this scenario using concepts from Chapter 2. The farmer has created an
experimental group of plots (to which the new fertilizer is applied) and a control group
of plots (which is handled according to the farmer’s past practices). The application of
fertilizer to plots in the experimental group is an intervention (or treatment). In causal
terms, the farmer is intervening on a suspected cause in order to see whether this makes
a difference to the suspected effect. The suspected cause is the independent variable, and
the suspected effect is the dependent variable.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
In testing causal hypotheses like this, sometimes the aim is to establish whether there
is a causal relationship. Other times, the aim is to clarify the nature and strength of a
causal relationship. For example, some drug trials simply seek to establish safety—that a
drug won’t have negative effects. Others seek to establish efficacy—that a drug will have
the expected positive effect. And still others aim to determine whether some drug is
more effective than another drug already on the market, that is, to establish the relative
strength of a causal relationship already identified.
By introducing an external influence on a system, interventions disrupt ordinary func-
tioning in a way that can help to disentangle causal relations. That’s in part why the
suspected cause is called an independent variable—the intervention independently deter-
mines its value, which eliminates the possibility that the suspected cause is affected by
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 257
the causal background. Other features of experimental design, such as having a control
group, are used to minimize the chance that changes to the suspected effect are due to
the causal background instead of the intervention. Altogether, these features help scientists
test causal hypotheses, identifying which particular factor is a genuine difference-maker.
Mill’s Methods
The English philosopher and social scientist John Stuart Mill (1806–1873) emphasized
the role of both observation and experimentation in discerning causal relationships (Mill,
1893). Mill identified five methods (see Table 7.1) used in the science of his day—and
before the development of statistics—to evaluate hypotheses concerning cause and effect.
(Scholars have suggested that some of these methods were discussed by scientists and
philosophers well before Mill—for instance by the Persian polymath Avicenna, whom you
encountered in Chapter 1.) Mill’s methods have proven to be a helpful way to think about
how observation and experiments, even nowadays, are used to identify causal relationships.
Let’s start with what Mill called the method of concomitant variations. This method
begins with the observation of correlation: that the values of two variables change in the
same circumstances. Mill noted that when one variable varies together with another, we
may infer a causal connection of some kind between them, although we won’t yet know
just how they are causally related. More specifically, we won’t yet know whether the two
variables are cause and effect, or share a common cause, or are related in some other way.
So, for example, we might see that people who play more video games than average are
also more violent than average. But while these attributes may be causally related, we
cannot tell just from their concomitant variation whether propensity to violence causes
an interest in video games, whether people become more violent by virtue of video game
exposure, or whether there is some indirect relationship between them, like a love of
excitement causing both a propensity to violence and an interest in video games.
The other methods Mill identified help get to the bottom of that question, and they
do so in ways that suggest the importance of intervention and randomization or other
forms of variable control. According to the method of agreement, one begins with cases
that agree in effect, and then scrutinizes them to learn what possible cause they have in
common—some way in which they agree. If in all instances when an effect occurs there
is one prior event or condition common to all of those cases, then one may infer that the
event is the cause of the effect. To use this method, one might let the causal background
vary while keeping the suspected cause the same. If the suspected effect still occurs in
Copyright © 2018. Taylor & Francis Group. All rights reserved.
those different instances, this is evidence that the suspected cause is indeed responsible
for the effect. If the causal background is varied sufficiently, this rules out a common
cause or other circuitous causal relationship.
The opposite approach is the method of difference. It begins with cases that differ in
effect, and then scrutinizes them to learn whether there’s some other respect in which
they differ. If in one case an effect is observed and in another case that effect is not
observed, and the only difference is the presence of a single event or condition in the first
case that is absent in the second case, then one may infer that this event is the cause of
the effect. An instance in which the suspected effect occurs is compared to an instance in
which the suspected effect does not occur. If the suspected cause is the only factor present
in the former but not the latter, this suggests the suspected causal relationship obtains.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
258 Causal Reasoning
The method of difference can also be employed when agreement has been discovered;
this is called the joint method of agreement and difference. We can consider cases where
the suspected effect occurs and see what they have in common and consider also cases
where the suspected effect does not occur and see what those have in common. If the
suspected cause is the only difference between the two sets of cases, then this affirms
a causal relationship between the suspected cause and the suspected effect. Imagine
interviewing people with a record of violence and people without such a record. If the
only distinguishing feature we find is that those in the former group play a lot of video
games and those in the latter group do not, this result would indicate a causal connec-
tion between video games and violence. This joint method of agreement and difference
provides more evidence of the causal relationship than either the method of agreement
or the method of difference by itself.
None of these methods—the method of agreement, the method of difference, or the
joint method of agreement and difference—eliminates the possibility that the suspected
effect is instead the cause. From the investigation described above, we have established a
causal relationship between video games and violence, but we can’t know whether video
games cause violence or the other way around. To resolve this, we can perform an interven-
tion on an experimental group with the joint method, with the added element of exter-
nal influence on the independent variable. If we randomly choose groups of participants
(thereby eliminating any pre-existing differences between people in the groups) and ask
one group to play a lot of video games, then we’ve eliminated the possibility of violent
tendencies causing video-game-playing. In observational studies (see Chapter 2), the joint
method of agreement and difference can be supplemented not with an intervention but by
using other forms of causal analysis. So, for example, we might ask our interview subjects
not just how much gaming they do but also for how many years they’ve played video
games. For each person, we can compare that with when his or her violent behavior began.
Finally, the method of residues is a way to apportion causal responsibility. With this
method, one traces all other effects to their causes and looks for the causal variable that
remains. If scientists have learned that some causal factors bring about certain effects, and
some of those causes present by themselves bring about some but not all of the effects,
then the missing cause(s) should be taken to be responsible for the absent effect(s).
This is a way of taking into account the causal background in order to focus on some
specific cause and determine the difference it makes. Imagine we’ve learned that obesity
and smoking cause diabetes, heart disease, and lung cancer. From our knowledge that
obesity causes diabetes and heart disease but not lung cancer, we can infer that smoking
Copyright © 2018. Taylor & Francis Group. All rights reserved.
causes lung cancer. A limitation of this form of causal reasoning is that it assumes causal
relationships are simpler than they often are. What if, for example, the combination of
obesity and smoking together causes lung cancer, but neither does by itself? The method
of residues can’t evaluate this possibility.
Consideration of Mill’s methods is in part of interest because causal hypothesis-testing
in today’s science inherits some of the features of these methods. These include a focus on
similarities among like situations, differences among unlike situations, and causal appor-
tioning. Mill’s methods also illustrate the difficulty of establishing the direction of causa-
tion, the importance of intervention, and the limitations of apportioning causal influence.
With Mill’s methods in the background, let’s now move on to these and other topics
regarding causal hypothesis-testing.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 259
Method Procedure
1. Method of agreement Start with cases that agree in the effect, and find a possible
cause they have in common
2. Method of difference Start with cases that differ in the effect, and find a possible
cause on which they differ
3. Joint method of agreement Compare cases that agree in the effect to cases that agree
and difference in not having the effect, and find if there is one possible
cause that cases in the former group have in common but
cases in the latter group do not
4. Method of residues Trace all known causes to their effects, and find a possible
cause and possible effect that are left over
5. Method of concomitant Find a possible cause that varies (directly or inversely) with
variations the effect
being infected by ‘cadaverous particles’ from doctors who didn’t wash their hands thoroughly
enough after performing autopsies. Both of these were steps toward the germ theory of disease.
Compare Pasteur’s and Koch’s causal hypothesis about some microorganisms causing
disease to the example of a farmer testing how a new fertilizer influences crop yield. In
the latter case, the farmer’s hypothesis not only posits a causal relationship and its direc-
tion but also something about the strength of the relationship. In particular, the farmer
is interested to know whether the fertilizer increases crop yield by at least enough to
justify the additional cost of purchasing and applying the fertilizer.
Hypotheses about causal relationships, their direction, and their strength are used to
develop specific expectations regarding how dependent variables will change in response
to changes to independent variables. Based on the germ theory of disease, Koch expected
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
260 Causal Reasoning
that healthy mice infected with the proper bacteria would develop anthrax, an infec-
tious disease. The farmer was evaluating the expectation that the fertilizer plots of land
produced enough additional crop yield to offset the increased costs of supplies or labor.
But expectations based on causal hypotheses inherit all the complications of causal
reasoning in general. Should Koch expect all treated mice to develop anthrax? We have
seen that many causes aren’t sufficient by themselves but only increase the probability of
their effects. So, if not every mouse, how many should Koch expect to develop anthrax?
And, in what conditions should we expect this to happen? Other features of the causal
background might interfere with this causal relationship, even if the germ theory of
disease is true. We don’t expect the application of fertilizer to increase crop yield if the
crops aren’t watered, after all. These are a few of the complications in determining what
expectations we should generate from a causal hypothesis.
These complications with causal reasoning make some features of experiments and
observational studies particularly significant. To start, we have seen that control groups
provide a way to eliminate differences in the causal background, keeping them from
becoming confounding variables. In Koch’s experiments, he inoculated some mice with
blood taken from the spleens of farm animals that had died of the anthrax disease. He
inoculated other mice with blood from the spleens of healthy animals. The only (rel-
evant) difference between these groups of mice was thus their exposure to blood from
an animal that died from anthrax (Ullman, 2007). Random assignment to groups is also
important to control variation in the causal background. Our farmer’s investigation of the
new fertilizer won’t be very illuminating if all the fertilizer plots are in an arid, low-yield
part of the farm and the control plots aren’t.
Statistical hypothesis-testing is also crucial for testing causal hypotheses. As we saw in
Chapter 6, statistical hypothesis-testing involves the development of specific expectations
regarding the probability distribution values of a random variable on the assumption the
null hypothesis is true. This is important for hypotheses that predict probabilistic causal
influence. Causal hypotheses play the part of the alternative hypothesis in statistical
hypothesis-testing. The null hypothesis is, usually, simply that the posited cause does not
actually influence the phenomenon of interest. So, for our farmer, the null hypothesis
is that the fertilizer is causally inefficacious: the range of crop yield from the fertilized
plots of land will only differ from the range of crop yield from the other plots by chance
variation. Taking into account the number of plots of land and average crop yield for the
plots in the control group, the farmer can predict how high of a crop yield for the fertil-
izer plots is sufficiently unlikely, given the null hypothesis, to warrant rejecting the null
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 261
EXERCISES
7.12 Describe how you might apply each of Mill’s methods to test the causal hypothesis
that not getting enough sleep makes you (you in particular) hungrier the next day.
7.13 Describe the ideal experiment, looking back to Chapter 2 if helpful. You should
reference experimental and control groups, random assignment, independent and
dependent variables, extraneous variables, and intervention. Then, articulate the
significance of each of the features of the ideal experiment for testing causal hypoth-
eses in particular. Your response should discuss causal background, distinguishing
causes and effects, common causes, and spurious correlation.
7.14
a. Describe how statistical hypothesis-testing can be used to investigate a causal
hypothesis—say, that the death penalty prevents crime. (Look back to Chapter
6 if this is helpful). Make sure you specify the null hypothesis and describe, in
general, what is needed in order to reject it.
b. Write out the formula for determining the strength of a probabilistic causal
relationship (from 7.1). What is the relationship between the two sides of this
equation if C does not influence E, that is, if C is not a cause of E?
c. Considering your answers to (a) and (b), answer the following questions. What
would the process of statistical hypothesis-testing show if C is not a cause of E
(and there is no type I or type II error)? If one causal relationship (CR1) is proba-
bilistically stronger than another causal relationship (CR2), is there a greater
chance of a type I error with CR1 or with CR2? How about a type II error?
7.15 Headlines in popular media often misrepresent the scientific studies they discuss.
One way this happens is that many headlines suggest a causal relationship where
the evidence provided by the scientific study only supports a correlation. Consider
the following headlines. For each, (a) identify whether it makes either a causal or a
correlational claim; (b) rewrite any headline using causal language so that it reads
as a correlational study; and (c) suggest a possible explanation for each correlation
that is not the posited or suspected causal relationship.
1. ‘Lack of Sleep May Shrink Your Brain’, CNN, September 2014
2. ‘To Spoon or Not to Spoon? After-Sex Affection Boosts Sexual and Relationship
Satisfaction’, Science of Relationships, May 2014
3. ‘Daytime TV (Soap Operas) Tied to Poorer Mental Scores in Elderly’, Reuters,
Copyright © 2018. Taylor & Francis Group. All rights reserved.
March 2006
4. ‘Study Suggests Attending Religious Services Sharply Cuts Risk of Death’, Medi-
cal Xpress, November 2008
5. ‘Facebook Users Get Worse Grades in College’, Live Science, April 2009
6. ‘Texting Improves Language Skill’, BBC, February 2009
7. ‘Study Suggests Southern Slavery Turns White People into Republicans 150
Years Later’, Think Progress, September 2013
8. ‘Dogs Walked by Men Are More Aggressive’, NBC News, November 2011
9. ‘Want a Higher GPA? Go to a Private College’, New York Times, April 2010
10. ‘Sexism Pays: Men Who Hold Traditional Views of Women Earn More Than
Men Who Don’t, Study Shows’, Science Daily, September 2008
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
262 Causal Reasoning
7.16 Choose three of the headlines listed in Exercise 7.15, and then, for each, look up
the text of the popular media report. Write a paragraph evaluating the strength of
the evidence cited in the media report supporting the claim (causal or correlational)
in the headline. Try to note both positive features and negative features.
7.17 For each of the following claims, identify three possible confounding variables in
the causal background that may impact the relationship. Say whether each possible
confounding variable would be an alternative cause, contributing cause, common
cause of both stated cause and stated effect, or something else.
a. Watching pornography leads to committing sex crimes
b. Eating pizza promotes immunity to flu
c. Ice-cream consumption raises the probability of drowning deaths
d. Being an American scientist raises the chance of having a scientific paper
published
e. Volcanic eruptions cause tsunamis
7.18 Describe an experiment you could use to determine whether smoking marijuana is
a cause of schizophrenia. Address how extraneous variables are to be controlled.
Finally, identify the expectations given the hypothesis, that is, what finding would
enable you to conclude that smoking marijuana is a cause of schizophrenia.
7.19 Psychologists have long studied the causes of altruistic behavior. In a classic psycho-
logical study by Darley and Latane (1968), participants walked down an alley on
their way to another experiment. Some were told they were late for the experiment,
others were told they were on time. Each passed by a confederate slumped in a
corner. Darley and Latane found that time pressure decreased helping behavior.
Describe the specific causal hypothesis and the features the experimental design
must have had to adequately test this hypothesis.
7.20 Economists have taken a different approach to studying altruistic behavior. They
have investigated it using experimental paradigms, such as the ultimatum game
encountered in Chapter 2—a task in which one player is given a real sum of money
and decides how to split that money with a partner, then the partner can decide
only whether to accept or reject the offer. The finding was that people offered fairer
divisions than self-interest predicts, and they rejected divisions deemed unfair even
though this results in no money won. The researchers concluded that people sacrifice
some self-interest to promote fairness. What are some important differences and
similarities between this approach and the experiment described in 7.19? Evaluate
Copyright © 2018. Taylor & Francis Group. All rights reserved.
each approach for how well it can investigate the causes of altruism.
• Describe the advantages of causal modeling and when this approach is called for
• Define causal Bayes nets and say what they are good for
• Specify the kinds of assumptions embedded in causal Bayes nets and discuss their
significance and limitations
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 263
Suppose you are interested in the relationship between three variables: vaccination
(V ), immunity (I), and autism (A). All three variables have two possible values: true and
false, or yes and no. You already know that vaccination causes immunity. But—worried
about what you’ve heard about potential side effects of vaccination—you make three
hypotheses about the dependency between autism and vaccination.
The first hypothesis is that vaccination causes immunity, which in turn causes autism.
This structure can be graphically represented as a straightforward chain: V→ I → A. The
second hypothesis is that vaccination is a common cause of immunity and autism. Using
arrows pointing from a cause to its effect, you can graphically represent this structure
as: I ← V → A. If this is right, then vaccination is a way to become immune to various
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
264 Causal Reasoning
FIGURE 7.4 Generic causal graph with nodes representing variables of interest and arrows
representing direct causal relationships
diseases, but it also has some chance of inducing autism. Your third hypothesis is that
autism isn’t causally related to either vaccination or immunity. To evaluate these hypoth-
eses, scientists would collect data about the values of the three variables of interest in
different patients. What should you expect to find if each of these hypotheses were true?
Consider the first hypothesized causal structure: V → I → A. This hypothesis states
that immunity causally depends on vaccination and that autism depends on immunity.
If this is right, then an intervention on immunity (say, due to decreases in the levels of
antibody that protect from acquiring a disease) will decrease the chance of autism but will
not affect whether one was vaccinated. This intervention would set the variable immunity
to the value false and disrupt causal links from vaccination to immunity. And, if this first
hypothesis were true, then intervening on immunity in this way would interfere with
any correlation between vaccination and autism, making the variables vaccination and
autism statistically independent, or uncorrelated. Put another way, on this hypothesis, if
you consider everyone, vaccination would be correlated with autism, but if you consider
only patients who are immune to a disease, then patients having autism would be uncor-
related to patients having been vaccinated. Vaccination would have no effect on autism
beyond its influence on immunity.
Consider the second hypothesis, that vaccination is a common cause of both immunity
and autism: I ← V → A. What should the data look like to support the conjecture? You
should find two correlations, one between the variables vaccination and immunity and
another between vaccination and autism. Generally, if you find a correlation between
two variables, then this dependence may result from one variable causing the other, but
Copyright © 2018. Taylor & Francis Group. All rights reserved.
it is also possible that there is some third variable, a common cause, that causes the val-
ues of both variables and explains their correlation. Given the common cause structure
associated with our hypothesis, you should also find that altering the value of the vari-
able autism will not affect the value of immunity and that altering the value of immunity
will not affect the value of autism. Holding fixed the value of vaccination makes autism
probabilistically independent from, or uncorrelated with, immunity. So, if this hypothesis
is true, then examining only people who are vaccinated (or aren’t vaccinated) would result
in no correlation between immunity and autism.
In actuality, there is no evidence that vaccination of any kind causes autism. Both
hypotheses are false. Before delving into that, let’s first consider where the practice of vac-
cination came from. Vaccination has been practiced for three centuries. In the 1700s, there
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 265
was some recognition that survivors of certain infectious diseases would become immune
to future exposure, and researchers began a primitive form of inoculation by infecting
themselves with a disease to gain immunity. The risk of sickness and death with these
primitive forms of inoculation was high. Then, the English physician and scientist Edward
Jenner (1749–1823) discovered that if he infected people with the cowpox virus, related
to smallpox but less dangerous, they had far lower mortality rates from smallpox. Vaccine
research advanced significantly again almost a century later, when Louis Pasteur identified
bacteria as a major cause behind several diseases; this knowledge led to the germ theory of
disease we discussed earlier in this chapter, and to the first synthetically made vaccination.
The 1900s saw the introduction of several successful vaccines, including those against
diphtheria, measles, mumps, and rubella. As vaccines became more common, their causal
mechanism became well understood. Basically, vaccines train the immune system to iden-
tify and combat pathogens, either viruses or bacteria. Certain molecules from the patho-
gen must be introduced into the body to trigger an immune response but not necessarily
the whole pathogen. So, many modern vaccines have no chance of making you sick from
the pathogen, since they don’t even contain the full viruses or bacteria.
But despite increased understanding of how vaccines work and drastically increased
vaccine safety, misconceptions remain. The myth that vaccines cause autism originated
with a study published in a prestigious medical journal in 1997. The study linked the
measles, mumps, and rubella (MMR) vaccine to increasing autism in British children. This
was a correlation. Several other studies were independently conducted to test whether
this correlation was due to a causal relationship; none found a causal relationship between
vaccination and autism. In fact, several studies couldn’t even replicate the correlation
between vaccination and autism. In the meantime, several other researchers pointed out
that there were several methodological errors in the original study, that the authors had
financial conflicts of interest, and that the study was ethically problematic. The article
was eventually retracted from the journal. While the causes of autism are unclear, it has
been definitively shown that vaccination is not among them.
From this, we can conclude that any data you gathered would not confirm either your
first or second hypothesis about a causal pathway from vaccination to autism. Vaccination
and immunity are strongly correlated with each other; the reason why is that vaccina-
tion is one of the major causes of immunity. But vaccinations have undergone extremely
extensive safety testing with huge groups of test subjects, and none has shown a cor-
relation with autism. And scientists now believe there are physiological signs of autism
even in utero, well before exposure to vaccination. Neither vaccination nor immunity are
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
266 Causal Reasoning
of expectations stemming from the three different hypotheses for how vaccination may
causally relate to autism. Once the assumptions about causal relationships are explicit, a
causal model can simply represent dependencies between different variables in the model,
and precise expectations can be formed about what would happen if you changed the
value of a variable in the model. Patterns of statistical information can be used to test
these expectations and, thus, the causal hypotheses behind them. Using causal models,
scientists can make a fine-grained evaluation of whether correlational evidence supports
a causal hypothesis; they can identify what manipulations to perform when conducting
an experiment to assess a causal connection; and they can better recognize what factors
in the causal background must be controlled.
Causal models are used across many different fields of sciences, from epidemiology to
economics. While there are several different approaches to causal modeling, the leading
approach to causal learning and reasoning is the causal Bayes nets approach. The rest of
the chapter will survey this approach.
+ +
FB page
shut down
FIGURE 7.5 Causal graph of the relationships between posting copyrighted material on your
Facebook page, a friend reporting you, and your Facebook page being shut down.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 267
of each causal relationship. The strength of those relationships matters; this decides
whether on balance posting copyrighted material increases or decreases the chance of
your Facebook page being shut down. Suppose that all three variables have two pos-
sible values, true and false, and that their conditional probability relationships are given
in Table 7.2.
Causal Bayes nets like this one can be used to make probabilistic and causal inferences
and to learn about causal relationships. Because they are complete models for specified
variables and their relationships, they can be used to answer questions about the prob-
ability that a certain variable takes on a specific value. For example, the causal Bayes net
model outlined in Figure 7.5 and Table 7.2 can be used to determine the probability
that you’ve been reported, given that your Facebook page has been shut down, but you
posted no copyrighted material. Another use is, when the values of certain variables are
observed, the network can determine the value of other variables by computing their
posterior probabilities using Bayesian conditioning.
Bayes nets can also be used to estimate causal relationships that are related to statisti-
cal features of our observations—for example, the negative correlation between copy-
right infringement and being reported by a friend. And they can be used to predict
the effects that potential interventions on some variables would have on the values of
other variables—for example, to predict what would happen if you posted copyrighted
material on your page.
TABLE 7.2 Conditional probabilities for the causal graph in Figure 7.5
Pr(Copyright infringement)
0.20 0.80
Pr(Reported|Copyright infringement)
T 0.01 0.99
Copyright © 2018. Taylor & Francis Group. All rights reserved.
F 0.40 0.60
T T 0.99
T F 0.80
F T 0.90
F F 0.00
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
268 Causal Reasoning
To better understand how scientists use Bayes nets to learn about causal relationships,
consider this scenario:
Suppose that a patient has been suffering from shortness of breath (called dyspnoea)
and visits the doctor, worried that he has lung cancer. The doctor knows that other
diseases, such as tuberculosis and bronchitis, are possible causes of this symptom, as
well as lung cancer. She also knows that other relevant information includes whether
or not the patient is a smoker (increasing the chances of cancer and bronchitis) and
what sort of air pollution he has been exposed to. A positive x-ray would indicate
either TB or lung cancer. (Korb & Nicholson 2010, p. 30 ff.)
There’s plenty of causal information here, but how that information relates to the case
at hand is tricky to figure out. Constructing and using a causal Bayes net is one effective
way to assist the doctor in making a medical diagnosis. To construct such a model, the
first thing to do is to identify the relevant variables. Like in the previous example, each
variable will be represented with a node. There’s no uniquely right way of setting up the
Bayes causal net, but it helps to make choices about what nodes to include that enable us
to represent the relevant, known aspects of the situation with enough detail to perform
the desired reasoning. One possible modeling choice is shown in Table 7.3. In this case,
the variables include dyspnoea, smoker, pollution exposure, x-ray result, lung cancer.
The second step of constructing a causal Bayes net is to specify the causal structure of
the system by drawing arrows between the nodes. Smoking and living in a polluted area
are two factors affecting the patient’s chance of having lung cancer. In turn, having lung
cancer is a factor affecting the result of an x-ray, and the patient’s difficulty in breathing,
that is, the patient’s suffering from dyspnoea. If this is the structure of the situation, then
we may draw the graph pictured in Figure 7.6.
Several forms of causal relationships can be represented in a causal Bayes net. A cause
can increase or decrease the probability of some variable taking on a given value, causes
can influence themselves, or there can be a feedback loop where two or more variables
influence one another in a cyclical way. Most of the time, however, Bayes nets are assumed
to be directed acyclic graphs (sometimes abbreviated DAG), which means that all the
causal relationships are taken to go in one direction without feedback loops. This means
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Variable Values
Dyspnoea {T, F}
Smoker {T, F}
Pollution {low, high}
X-ray {positive, negative}
Lung cancer {T, F}
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 269
Smoker X-ray
Lung
cancer
Pollution Dyspnoea
High T 0.050
High F 0.020
Low T 0.030
Low F 0.001
that earlier causes are assumed not to also be later effects. You can see from Figure 7.6
that our graph satisfies this assumption; no arrows form circles like X → Y → Z → X,
and no arrow is bidirectional like X ↔ Y.
Having specified the nodes and their structure, the strength of the relationships
between connected nodes must now be specified. To do so, one needs to define a prob-
ability distribution for each node, conditional on any node(s) that causally influence it.
In the dyspnoea case, statistical information from medical studies or observed frequen-
cies can be used to specify these probability distributions. For variables for which no
such information is available, initial probabilities can be based on an intuition, guess, or
estimation. These are exactly like the prior probabilities from the discussion of Bayesian
statistics in Chapter 6. It turns out that Bayes nets can be accurate in the long run even
if they start off with imprecise or inaccurate initial probabilities.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Let’s take a look at the variable lung cancer in Figure 7.6. The variables that causally
influence it are pollution and smoker, each of which can take two possible values for a
total of four combinations of values: {<high, T>; <high, F>; <low, T>; <low, F>}. We can
specify the conditional probability of having cancer in each of these four cases. One way
to represent these conditional probabilities is in a table, as in Table 7.4.
Once all the conditional probability distributions are determined, our causal Bayes
net captures all of the relevant knowledge available. Now we can start to reason with
it. Reasoning with a Bayes net amounts to the task of computing a posterior probability
distribution for one or more variables of interest given the values of variables that you have
information about. These computations are governed by Bayesian conditioning. Think
of this as updating your beliefs about a variable based on changes to your beliefs about
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
270 Causal Reasoning
other variables. The arrows connecting nodes in the causal Bayes net show the paths that
probability distribution changes follow.
Belief updating can happen either from cause to effect, based on information about the
value of a cause variable, or from effect to cause, based on information about the value
of an effect variable. For example, if we’re certain that the patient has dyspnoea, and her
x-ray results are negative, then we can update our diagnosis about whether the patient
has cancer, a causal influence on both dyspnoea and x-ray results. In turn, updating our
diagnosis of cancer will affect our beliefs about whether the patient is a smoker and lives
in an area with high levels of pollution, proceeding up the chain of causal influence. Or
if we are certain that the patient is a smoker, we can update our beliefs about her chance
of having lung cancer accordingly, which is causally influenced by smoking status. This
also influences our expectations of the x-ray result.
A different type of reasoning with causal Bayes nets regards the relationship between
two causes that compete to explain an observed effect. In our case, smoker and pollution
are two such causes. They compete to explain the value of the variable lung cancer, which
they both influence. Suppose we learn that the patient has cancer. This new piece of infor-
mation raises the probability of both possible causes. Suppose that we learn further that
the patient lives in a badly polluted city. Something interesting would now happen in our
causal Bayes net. This new piece of information both explains the patient having cancer,
and it also lowers the probability that the patient is a smoker. Although the variables
smoker and pollution are initially probabilistically independent, given that we know that
the patient has cancer and lives in a highly polluted area, the probability that the patient
is a smoker goes down. Now that we know the patient has been exposed to significant
pollution, this information accounts for the lung cancer and disrupts the attribution of
a probabilistic association between lung cancer and smoking. Put another way, we don’t
need to speculate that the patient was a smoker in order to explain the lung cancer.
In the simple cases we’ve considered, a Bayes net is fully specified, and then used to
make causal inferences and predictions. In some scientific applications, in contrast, causal
Bayes nets are incomplete in two respects. First, there are many other variables that could be
added to the model; variables that precede, mediate, or follow the variables that are explicitly
represented. Second, information might be lacking about the causal relationships between
variables represented in the model. In this case, the structure of the network and the relevant
probabilistic dependencies must be learned from data as the model is developed.
Cognitive neuroscientists, for example, are interested in the causal relationships between
brain areas that support the same cognitive capacity. To find out about these causal rela-
Copyright © 2018. Taylor & Francis Group. All rights reserved.
tionships, they often rely on brain imaging data, where subjects perform tasks that tap the
cognitive capacity of interest while having their brain activity recorded. Neuroscientists
already have some background knowledge about which brain regions might be involved
in a task, so they often focus their attention on recorded activity from only a few regions
of interest, each one of which can be treated as a variable and represented as a node in
a causal Bayes net. The challenge is then to discover the causal structure of these regions
of interest—to determine the nature of the arrows.
Machine learning algorithms help neuroscientists to tackle this challenge. One of these
algorithms searches the brain imaging data set to find the causal structure that best
helps scientists explain observed statistical dependencies between the variables of interest.
Roughly, this search procedure begins with a graph with no arrows. Arrows are added
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 271
sequentially, based on how well they would help account for observed correlations. When
no further addition of arrows can improve the account of observed correlations, the pro-
cedure moves to eliminating arrows until the account is as simple as it can be while still
matching the observed correlations. The resulting causal structure is invoked as the best
explanation of the observed data (Glymour, 2007).
causal Markov condition assumes that whether the patient has a positive x-ray is influ-
enced by whether he has cancer, but taking into account whether he has cancer, it is not
influenced by whether he is a smoker, or by whether he lives in a high pollution area.
The idea is that cancer causes a positive x-ray result, whether the cancer was caused by
smoking or by pollution.
The causal Markov condition indicates which variables will be probabilistically inde-
pendent conditional on other variables. This enables scientists to reason from probabilistic
information to causal relationships. If the causal Markov condition holds, then a Bayes net
can correctly represents the absence of a direct causal relationship with the conditional
independence of two variables. Our reasoning about vaccination, immunity, and autism
relied on this reasoning. The causal Markov condition might fail if the set of variables
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
272 Causal Reasoning
included in a Bayes net is incomplete in certain ways. But, here too, there are sophisticated
machine learning techniques for causal discovery that work reliably.
The third assumption of causal Bayes nets we’ll discuss is faithfulness. While the Markov
condition indicates which variables in a Bayes net will be probabilistically independent,
faithfulness specifies which variables will be probabilistically dependent conditional on
other variables. In the dyspnoea example, if having cancer is causally related to tuber-
culosis, then TB and cancer in our Bayes net should be probabilistically dependent. The
basic motivation for the faithfulness condition is that a causal relationship between two
variables entails, almost always, a probabilistic dependence between those variables. This
implies that the probabilistic influence of different causal pathways from one cause to
an effect will not exactly cancel out each other’s influence.
However, faithfulness doesn’t always hold. In Section 7.1, we discussed two examples
of events that are causally related but uncorrelated. If smoking causes heart disease, but
also causes exercise, and exercise prevents heart disease, then the causal influence may
exactly cancel out. Here the faithfulness assumption fails. Failures of faithfulness don’t
compromise causal inference as seriously as failures of the causal Markov condition.
Conditions where faithfulness fails are much better understood than conditions where
the causal Markov condition fails, and the number of techniques for causal discovery that
don’t rely on faithfulness is larger.
There are many more assumptions underlying reasoning with causal Bayes nets, beyond
modularity, the causal Markov condition, and the faithfulness condition. As we have said
of causal modeling in general, specifying these assumptions, and seeing where they fail
to hold, is an important step toward making causal claims transparent. Understanding
how causal modeling works when some of assumptions fail and what kinds of errors they
may introduce is one of the most important challenges at the forefront of current causal
modeling approaches.
EXERCISES
7.21 Describe what causal modeling can be used for. What are some advantages and
limitations compared to other strategies we have seen for learning about causal
relationships?
7.22 For each of the following cases, (a) indicate the causal hypothesis, explicitly distin-
guishing the cause from the effect; (b) offer another plausible cause for the effect;
Copyright © 2018. Taylor & Francis Group. All rights reserved.
and (c) draw a simple causal model to help you assess whether the reasoning
described in the case is good or bad.
1. You have eaten your birthday dinner at your favorite pizzeria in town for the
past 10 years. This year, you got sick. This was also the first time your uncle
Sam was there. You conclude you got sick because uncle Sam was there.
2. Every time Felipe goes to see Real Madrid play, they lose. Whenever he is not
there, they win. If I want Real Madrid to win, I had better not let Felipe go to
any more games.
3. Eryka normally goes to bed at midnight and gets up by 7:00 a.m. each morn-
ing. She usually runs two kilometers after having some breakfast. This morning,
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
Causal Reasoning 273
however, she ran only half a kilometer and had to stop, as she was so tired.
She recalled that she had gone to sleep unusually early the night before and
concluded that too much sleep made her too tired to run.
4. In Albystown, there are two kinds of students: those who own a diary and those
who own a smartphone. A first-grade teacher in Albystown noticed that all the
students who consistently failed exams owned a smartphone. He concluded that
those students who own a smartphone are intellectually inferior to those who
own a diary, and that’s why they failed more exams.
5. Phineas Gage’s moral character changed dramatically after an explosion blew
a tamping iron through his head. Gage was leading a railroad construction
crew near Cavendish, Vermont, when the accident occurred. ‘Before the acci-
dent he had been a most capable and efficient foreman, one with a well-bal-
anced mind, and who was looked on as a shrewd smart business man.’ After
the accident, he became ‘fitful, irreverent, and grossly profane, showing little
deference for his fellows. He was also impatient and obstinate, yet capricious
and vacillating, unable to settle on any of the plans he devised for future action’.
7.23 Causal reasoning involves various types of probabilistic inferences: predictive infer-
ences (from causes to effects); diagnostic inferences (from effects to causes); and rea-
soning about interventions (what would happen if you manipulated a certain feature
of a system). For each of the following situations, (a) indicate whether you would
make a predictive or a diagnostic inference to find out about the events described;
(b) describe what intervention you would carry out to find out about the events
described; and (c) explain why you would make those inferences and interventions.
1. You are a physician working at a hospital, and you notice that some patients
have been infected with influenza.
2. You notice that you have a runny nose, body aches, and a sore throat.
3. You notice that there is an unusual smell coming from the engine of your car, while
the needle on the temperature gauge creeps up quickly past the normal limit.
4. Every morning, you notice a continuous tinkling noise coming from the kitchen
in your apartment.
5. You notice that the countryside of your town has more animals than the site
could support for a grazing season.
7.24 Describe the important elements of a causal Bayes network and what each represents.
7.25 A group of psychologists is interested in how intrinsic motivation of university students
affects their exam results. They believe that intrinsic motivation affects both class atten-
Copyright © 2018. Taylor & Francis Group. All rights reserved.
dance and home preparation (reading the textbooks, doing the assignments, and so on).
They also believe that both class attendance and home preparation affect exam results.
They do not believe that there are any further causal interactions. All relevant variables
(intrinsic motivation, class attendance, home preparation, and exam results) have two
values: high and low for intrinsic motivation, class attendance, and home preparation
and pass and fail for exam results. The psychologists observe the following frequencies:
1. 40% of all students have a high intrinsic motivation.
2. 90% of all highly motivated students attend classes regularly, as opposed to
60% of all students with low motivation.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
274 Causal Reasoning
3. 70% of all highly motivated students prepare well, as opposed to 20% of all
students with low motivation.
4. 80% of all students who prepare well and attend class regularly pass the exam.
5. 60% of all students who prepare well and do not attend class regularly pass the exam.
6. 45% of all students who do not prepare well and do attend class regularly pass
the exam.
7. 40% of all students who do not prepare well and do not attend class regularly
pass the exam.
Draw the causal Bayes net that corresponds to the story. Then, suppose that the uni-
versity implements a new policy that forces students to attend class. Assume that all
students comply with this policy. From the causal Bayes net and the frequencies given
above, determine the probability that students pass the exam after this intervention.
7.26 Construct causal Bayes nets for simple examples of causal relationships with (a) a
common cause structure, (b) a common effect structure, and (c) a chain structure.
7.27 Tillbourg College admits students who are either brainy or sporty (or both). Let C
denote the event that someone is admitted to Tillbourg College, which is made true
if they are either brainy (B) or sporty (S). Suppose in the general population, B and
S are independent. Draw a causal Bayes net to represent this situation, defining all
relevant variables and probabilities. If you learn that all students at Tillbourg College
are sporty, what can you infer about the value of S? Explain your reasoning.
7.28 Give an example of explaining away, a situation in which discovering one causal
relationship diminishes the probability of some presumed cause.
7.29 Suppose that we measure the variables storm (S), barometer reading (B), and atmo-
spheric pressure (A). You find that storm and barometer reading are probabilisti-
cally dependent, as are barometer reading and atmospheric pressure, and storm
and atmospheric pressure. Furthermore, you find that storm and barometer reading
given atmospheric pressure are independent. From these constraints alone (assum-
ing the causal Markov condition and faithfulness hold), what underlying causal
structures can you infer? For each, provide a causal Bayes net.
FURTHER READING
For more on the psychology of causal reasoning, see Sloman, S., & Lagnado, D. (2015).
Causality in thought. Annual Review of Psychology, 66, 223–247.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Pasteur’s influence on the history and sociology of medicine is described in more detail in
B. Latour’s (1993). The pasteurization of France. Cambridge: Harvard University Press.
For an account of the difference-making view of causation and its importance in scientific
explanation, see Woodward, J. (2003). Making things happen: A theory of causal expla-
nation. Oxford: Oxford University Press.
For a pluralist view of the nature of causation and discussion of causal analysis, includ-
ing causal Bayes nets, see Cartwright, N. (2007). Hunting causes and using them:
Approaches in philosophy and economics. Cambridge: Cambridge University Press.
For advanced treatments of causal modeling, see Pearl, J. (2009). Causality: Models, rea-
soning, and inference, 2nd edition. New York: Cambridge University Press. Also see
Spirtes, P., Glymour, C., & Scheines, R. (2001). Causation, prediction, and search, 2nd
edition. Cambridge: MIT Press.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-26 19:08:17.
CHAPTER 8
Explaining, Theorizing,
and Values
will sell more of something when the price is high than when the price is low. Taxi
drivers in particular will tend to sell more of their labor hours—that is, they will
tend to work longer—when wages are higher than when they are lower. In other
words, they will work more, when it really pays off, and cut out early on bad days
when it doesn’t.
The law of supply—along with its counterpart, the law of demand—is one of the
most fundamental and intuitive explanatory principles in economics. Assuming people
strive to do what is in their best interest, economists invoke general principles like the
laws of supply and demand to explain how people set the prices of goods and services
and how people allocate resources like their time. When an employer pays higher over-
time hourly rates, the number of hours employees are willing to work increases. When
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
276 Explaining, Theorizing, and Values
consumers are willing to pay more for a slice of pizza than for a cupcake, bakeries will
increase their production of pizza and reduce the production of cupcakes. The law of
supply captures the relationship between price changes and suppliers’ behavior, as in
these examples.
Psychologists give a different answer to the taxi-driver question. There is a theory about
daily income, called the ‘daily-income-targeting theory’, that appeals to two psychological
tendencies. One tendency is that, when confronted with multiple related decisions over a
period of time, people often consider the merits and weaknesses of only a single decision
at a time, instead of considering the consequences of all decisions at once. The second
psychological tendency is loss aversion: people dislike losing money or other resources
more than they enjoy gaining similar amounts. Applied to taxi drivers, these tendencies
suggest that their decisions about how much to work are made day by day instead of all at
once and that they generally will resist quitting until they reach their daily target income.
This predicts that taxi drivers will work longer hours on low-wage days and quit early on
high-wage days. This is, of course, the opposite of what economists’ law of supply predicts.
A group of economists and psychologists tested these competing predictions by car-
rying out a field study, where they analyzed data about New York taxi drivers’ behavior
from the years 1988, 1990, and 1994 (Camerer et al., 1997). Their data indicated that
less-experienced drivers tend to work more hours on bad days, when working does not
pay off, and clocked off too early on good days. The income-targeting theory explains this
apparently irrational behavior in a simple way: inexperienced taxi drivers use a simple rule
of thumb—a heuristic—that guides them to aim for a certain amount of earnings over a
certain period of time. If they are falling behind that rate, they work longer to catch up,
and if they are ahead, they quit early.
The data showed that more experienced taxi drivers don’t display this pattern of
behavior. To figure out why, the researchers evaluated their data sets with an eye to other
possible explanations. Two plausible explanations were that taxi drivers may learn with
experience to resist the temptation to quit early on good days. Or they may simply learn
that driving a fixed number of hours each day is more efficient than aiming for a certain
amount of money. Neither of these possible explanations appeals to general economic
principles. Taxi drivers, inexperienced or experienced, don’t seem to act in accordance
with the law of supply.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 277
the daily-income-targeting theory does a better job at accounting for drivers’ behavior.
This explanation seems to help us understand taxi drivers’ decisions about how long to
work each day, and it may be a promising start for explanations of other, similar human
behavior (for a nice example see Camerer, 1997).
To say that science aims to produce a special kind of knowledge is not to say that scien-
tific explanations are entirely different from ordinary, everyday explanations. The explana-
tory knowledge produced in science is a special kind of knowledge, explicitly supported by
evidence through the use of methods discussed in this book. But there’s significant overlap
between scientific and everyday forms of explanation. All of us sometimes notice things
that cry out for explanation. We routinely ask questions such as: ‘how much does drinking
corrode the liver?’, ‘why did the economic crisis happen?’, ‘why do colleges and universi-
ties have vastly more highly paid administrators than they used to, given steep declines in
public funding for higher education?’, and, of course, ‘how did the dinosaurs go extinct?’
Even children regularly engage in this pursuit of explanatory knowledge. Many have
wondered why the sky is blue. A parent might quickly answer that the sky is blue because
it looks that way to us or because that’s just the way the sky is. Such answers don’t explain
why the sky is blue; they offer no insight into why or how the phenomenon is the way it
is. A satisfying explanation of why the sky is blue relies on some sophisticated scientific
theorizing: sunlight travels in straight lines unless some obstruction either reflects it, like a
mirror; bends it, like a prism; or scatters it, like the molecules of gas in the Earth’s atmo-
sphere. Because blue light has shorter wavelengths, it is scattered more than other colors
in the spectrum. That’s why we normally see a blue sky. In contrast to most parents’ quick
answers to this question, this explanation appeals to other facts about the world and scien-
tific laws or theories in order to give a deeper understanding of the phenomenon in question.
Generating explanations serves a variety of cognitive roles. It facilitates learning and
discovery, and plays a central role in confirmation and reasoning. As we discussed in
Chapter 4 in relation to abductive reasoning—also known as ‘inference to the best expla-
nation’—explanatory considerations can be used as evidence in support of a hypothesis,
making the hypothesis more credible. With respect to learning, generating explanations
to oneself or to others facilitates the integration of new information into existing bodies
of knowledge and can lead to deeper understanding; this is called the self-explanation
effect. Performance on a variety of reasoning tasks, including logical and probabilistic tasks,
can be improved when one is asked to explain. This is why explaining the study mate-
rial and responding to explanatory questions is such a good way to learn new material
encountered in a course. Instructors and tutors learn material faster and with more depth
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
278 Explaining, Theorizing, and Values
Psychologist Alison Gopnik (1998) once likened understanding to orgasm. Sex evolved to
feel good because it leads to babies, which is needed for a species to continue. Similarly,
Gopnik reasoned, understanding is enjoyable because explanations are tremendously help-
ful to people getting around in the world. And so, the desire to satisfy our curiosity has
led humans to ever more sophisticated and accurate theories about our world.
The satisfaction of curiosity is no guarantee of a good explanation, though. People
can have a sense that they understand something without genuinely understanding it—
explanations can be wrong. People also often fall prey to an illusion of explanatory
depth, believing they understand the world more clearly and in greater detail than they
actually do. We all regularly overestimate our competence and depth of knowledge; recall
our discussion in Chapter 1 of the cognitive errors, like confirmation bias, which science
is designed to correct for.
An illustration of how one can be dangerously misled by the feeling one understands
something is the public reception of climate change research. As you may recall, cli-
mate change was originally called ‘global warming’. But this terminology misled many
people about what they should expect to experience. When a season was not warmer
than usual in some particular location, some people were tempted to doubt the reality
of climate change—it seemed to them like things weren’t getting warmer after all. But
climate change does not produce warmer temperatures in every location at every point
in time. Instead, it produces a global increase in average temperatures and increasingly
extreme weather and storms along the way.
Unfortunately, some people—including some politicians who shape how nations
respond to climate change—still disregard scientific knowledge of climate change because
of apparent conflicts with the daily weather they experience. Figure 8.1 pictures Oklahoma
Copyright © 2018. Taylor & Francis Group. All rights reserved.
FIGURE 8.1 Oklahoma Senator James Inhofe speaking before the US Congress in 2015 while
brandishing a snowball
Reproduced from C-SPAN
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 279
Senator James Inhofe speaking before the US Congress in Washington D.C. in February
2015. Inhofe brought a snowball to illustrate that it was (he claimed) unseasonably cold
outside. In fact, it was not unusually cold in D.C., and meanwhile, the West Coast of
the United States was unusually warm. The year prior, 2014, had the warmest average
temperatures in recorded history, and the Earth has continued to warm in the years since.
Another example of the illusion of explanatory depth concerns public reception
of neuroscientific information. Experimental data suggest that people are often mis-
led into judging bad psychological explanations as better than they really are when
accompanied by completely irrelevant neuroscientific information. This ‘seductive
allure’ of neuroscientific explanations might interfere with people’s ability to criti-
cally evaluate the quality of an explanation (Weisberg et al., 2008). Coupled with
an illusion of explanatory depth, this interference can have negative practical effects
when, for example, it is exploited by advertisements for ‘brain training’ that promise
brain enhancement ‘proven by neuroscience’. This is the opposite of the climate change
case. Instead of scientific expertise being disregarded because of personal experience,
scientific credibility is misapplied to get people to believe something there’s not actu-
ally sufficient evidence for.
Given the centrality of explanation to the scientific enterprise and the potential for
all people, including scientists, to feel like they understand something even when they
do not, it’s an important task to clarify the nature of scientific explanation. If we can say
what features good explanations must have, then we will be better able to judge whether
something counts as an adequate explanation.
One simple idea is that explanations are just true answers to why or how questions,
such as ‘why is the sky blue?’ or ‘how do bicycles move?’ But we have suggested that
some true answers to the question of why the sky is blue, like ‘because that’s the way
it is’, don’t count as explanations. So, we need a way to determine when a true answer
to a why- or how-question is a good explanation. What features should good answers to
why- or how-questions have?
Philosophers of science and some scientists have thought long and hard about this
question. The possible answers relate to other topics we have discussed in this book. Some
have suggested that explanations should cite laws in order to account for phenomena,
either deductively or probabilistically. Another idea is that explanations should show how
phenomena fit into patterns. Others have suggested that explaining is a kind of causal
reasoning and that explanations should say what causes a phenomenon.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
280 Explaining, Theorizing, and Values
or more scientific laws or principles that, together with background conditions, make it
so that the phenomenon to be explained was to be expected. So, according to Hempel,
nomological explanations have a form like this:
1. L1, …, Ln
2. C1, …, Cn
∴ 3. E
In this scheme, L1, …, Ln are statements of general laws, such as the laws of supply and
demand in economics. C1, …, Cn are statements of background conditions, such as the
actual price and quantity of a good in some market at some time. And E is a statement of
the phenomenon to be explained, like a dramatic decrease in the number of people taking
taxis over the past year. Hempel believed that knowing the law and background condi-
tions would lead people to realize the phenomenon in question was to be expected. By
rendering phenomena expectable, scientific explanations reveal our world to be ordered,
proceeding in accordance with general laws.
Thus, if you want to explain why people are taking fewer taxis, you may begin by
stating the law of demand: all other factors being equal, as the price of a good increases,
the quantity of goods demanded by consumers decreases, and as the price of a good
decreases, the quantity demanded increases. Then you may point out an increase in ride-
share programs and cycling incentives and the advent of companies like Uber and Lyft.
(See Figure 8.2 for some relevant data.) From these background conditions and the law
of demand, it follows that taxi rides have gotten comparatively more expensive. And so,
as the law of demand predicts, many people who previously bought taxi rides are now
500000
450000
400000
350000
300000
Trips/Day
250000
200000
Copyright © 2018. Taylor & Francis Group. All rights reserved.
150000
100000
50000
0
Dec-15
Jan-15
Oct-15
Jan-16
Oct-16
Dec-16
Jan-17
Mar-15
Apr-15
Jul-15
Mar-16
Apr-16
Mar-17
Apr-17
Nov-15
Jul-16
Nov-16
Feb-15
Jun-15
Jun-17
May-15
Aug-15
Sep-15
Feb-16
Jun-16
May-16
Aug-16
Sep-16
Feb-17
May-17
FIGURE 8.2 Ridership data for Yellow Taxis and Uber in New York City 2015–2017, based on
data from reports by the New York City Taxi & Limousine Commission
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 281
doing so less often; they can instead use other, cheaper forms of transportation. That’s
why fewer people take taxis.
Hempel thought that some nomological explanations were valid deductive arguments,
while others were strong inductive arguments. (See Chapter 4.) As in the preceding
explanation scheme, the premises must include at least one statement of a scientific
law—a general pattern or regularity. The premises also must have empirical content, so
they can be tested.
Many scientific explanations fit this nomological conception of explanation. Consider
how scientists might explain the increase in the average global temperature of Earth’s
atmosphere. One can begin by pointing out that atmospheric density changes in propor-
tion to the permeability of the atmosphere to solar radiation and that the permeability
of the atmosphere to radiation is directly correlated with average surface temperature.
These are law-like generalizations that describe patterns and regularities in nature. Next,
note that the atmospheric density on Earth has increased (because of greenhouse gases).
This is a background condition, a fact about current circumstances. Together, these claims
deductively imply the conclusion that the Earth’s average temperature has increased. This
argument is deductively valid with all true premises, so we have a simple nomological
explanation of global warming.
Just as phenomena can be explained by laws, scientific laws themselves can be explained
by appealing to other, more comprehensive laws. For example, consider Galileo’s law that
bodies fall toward Earth at a constant acceleration. This law can be deductively derived
from the Newtonian law of gravitation. The Newtonian force of gravity explains the
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
282 Explaining, Theorizing, and Values
constant acceleration of bodies falling toward Earth. Newtonian laws, in turn, can be
explained by appealing to the principles of the more comprehensive general theory of
relativity developed by Einstein. The Earth’s gravity is explained as a distortion of space
caused by the Earth’s mass. Objects speed up as they fall toward Earth, just as a ball
rolling from the edge to the center of a bowl speeds up.
The idea of explaining scientific laws with reference to other, more general laws intro-
duces a second conception of explanation. According to the pattern conception, expla-
nations fit particular statements about phenomena into a more general framework of
laws and principles. This has been called a unification conception of explanation, since
the number of assumptions and beliefs required to explain phenomena decreases when
an explanation is provided. Phenomena, and laws as well, are unified by uncovering the
basic patterns that govern them.
One advantage of the pattern conception over the nomological conception is that
there’s no requirement of citing laws. Pattern explanations can cite regularities that may
not qualify as laws. In place of the law requirement, there’s an emphasis on fitting the
phenomenon to be explained into a wider pattern, to see it as one instance of a more
general regularity of the world that has been identified.
Earlier, we described the simple explanation of decreased taxi ridership as a nomologi-
cal explanation, but it can also be construed as a pattern explanation. The phenomenon of
decreased taxi ridership is explained as one instance of the general pattern whereby higher
prices drive decreased demand and vice versa, a pattern that also applies to purchases as
different from taxis as pizza, pomegranates, and tickets to the cinema.
Many scientific explanations fit the pattern conception of explanation rather well.
Consider evolutionary theory. This theory explains a great many phenomena involving the
traits of organisms and the relationships among them with reference to a simple pattern
that plays out in a multitude of ways. The pattern at the heart of evolutionary theory is
that natural selection acts on variation among organisms to produce cumulative change in
species. The theory of evolution is not a single, general law of nature; it recognizes many
different influences on evolution besides natural selection and random variation, which
proceed in various ways depending on various factors. Many evolutionary explanations
are thus not productively viewed as nomological explanations. But they do fit the pattern
conception rather well.
The ideas behind the nomological and pattern conceptions of explanation—that expla-
nations make phenomena less surprising by referencing laws or by showing how they fit
into a wider pattern—are undoubtedly important. These ideas describe important features
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 283
Yet, the nomological and pattern conceptions of explanation don’t recognize this asym-
metry. Consider the nomological account. Suppose that, in general, weather alerts are
sent out when storms are approaching. Then, from this generalization and the premise
that a storm is approaching, you could explain why you received a weather alert. This is
a valid argument as required for nomological explanation. But the generalization about
weather alerts being sent when storms are approaching and the premise that you received
a weather alert can also be used to deductively infer that a storm is approaching. There
is a valid deductive argument whether the weather alert or the approaching storm plays
the part of background information.
And yet, the storm is a good explanation for why you received a weather alert, but the
weather alert is no explanation for why the storm is approaching. You can do a lot with
your mobile phone, but you can’t usually bring about a storm. The problem is similar
for the pattern conception of explanation. There is a general pattern relating weather
alerts to approaching storms. What’s to say that this pattern can explain weather alerts
but can’t explain storms? That difference isn’t accounted for by the pattern conception.
A second problem for the nomological conception is that many good explanations
don’t appeal to any laws. We have already suggested that some evolutionary explanations
are successful without appealing to laws. Here’s another example. Why dinosaurs went
extinct some 65 million years ago is explained by one of two hypotheses: either there
was a massive bout of volcanism or an enormous asteroid hit the Earth. Either event
would have had dire consequences on Earth’s climate and on dinosaurs’ ecosystems, and
whichever occurred caused dinosaurs’ extinction because of those consequences. But
neither of these explanations involves a general law of nature. We can’t say, in general,
what to expect on the basis of a volcano or asteroid collision. This depends on numerous
circumstances related to the nature of the catastrophic event, the organisms in question,
and other factors.
There is a related second problem for the pattern conception. The pattern conception
focuses on explanations that fit a phenomenon into a wider pattern. But some explana-
tions seem to be highly specific. Consider the explanation for how the human heart pumps
blood. This explanation may not apply to the function of the hearts of other kinds of
organisms. This is because hearts and other organs vary across species, and their differences
are more significant the more distantly related organisms are. Something similar is true
for the explanation for why dinosaurs disappeared. Both the volcanism explanation and
the asteroid explanation are highly specific. They rely on particular conditions on Earth
over 65 million years ago to help account for this extinction event. Nothing guarantees
Copyright © 2018. Taylor & Francis Group. All rights reserved.
that these circumstances will ever recur; the explanation might account for only this one
phenomenon, ever. So, they do not describe general patterns. Still, it seems like whichever
of these is true is a good explanation.
Here’s one final concern with the nomological and pattern conceptions. Discussion of
laws has been decreasing in science. The decline is perhaps most evident in the psychologi-
cal sciences. Psychologists are spending less and less time discovering and appealing to laws
in their research. A bibliometric study of abstracts from the PsycLit database—indexing
psychological research papers and journals—during the last century (1900–1999) looked
at over 1.4 million abstracts and found 3,093 citations of law—an average of 22 cita-
tions per 10,000 entries (Teigen, 2002). As shown in Figure 8.3, the average number of
such references significantly dwindled over time. Further, the laws psychologists are most
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
284 Explaining, Theorizing, and Values
300—
286
250—
200—
citation count
150— 138
119
100—
82
50— 40
28 32
23
13 10
0
1900– 1910– 1920– 1930– 1940– 1950– 1960– 1970– 1980– 1990–
1909 1919 1929 1939 1949 1959 1969 1979 1989 1999
FIGURE 8.3 Occurrence of the word law in PsychLit abstracts per 10,000 entries
(Teigen, 2002)
concerned about or familiar with were discovered long ago, with the most commonly
cited laws discovered from 1834 to 1957.
If psychology is any guide, the nomological conception of explanation is in trouble.
And the pattern conception’s emphasis on broad patterns might be plagued with similar
difficulties. Over this same period of time, psychologists have established very few general
relationships between empirically measured variables—that is, very few general patterns.
Causal Explanation
Many laws and patterns in phenomena are also called effects. For example, in psychol-
ogy, there is the Garcia effect: an aversion to a particular taste or smell associated with
a negative reaction. This is why you might have trouble ever again eating whatever food
you had right before a bout of stomach flu. There are the primacy and recency effects,
according to which people recall more easily items at the beginning (primacy) and items
Copyright © 2018. Taylor & Francis Group. All rights reserved.
at the end (recency) of a list. And there is the self-explanation effect, described earlier
in this chapter. This is where explaining something to yourself boosts your learning and
helps you integrate new knowledge with existing knowledge.
The convention of referring to certain patterns as effects isn’t limited to psychology.
Consider the Larsen effect in acoustics. A public-address (PA) system has at least sev-
eral major components: microphone, mixing console and soundperson, amplifier, and
loudspeaker. If the soundperson registers that the broadcast is inaudible to the audience,
she can adjust the volume level via the mixing console to increase the microphone’s
input sensitivity, so it can pick up the speaker’s vocalizations more effectively, which the
loudspeaker puts out via the amplifier. This system is a basic homeostatic mechanism
involving feedback. But if volume levels increase beyond optimal values, the loudspeaker
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 285
can emit an unpleasant, high-pitched, runaway squeal. This feedback pattern is called the
Larsen effect.
The Larsen effect can be invoked to explain why there’s a squeal when a soundperson
adjusts the volume on a PA system. But this effect is also something that stands in need
of explanation. The explanation of the Larsen effect is that the microphone pickup locks
on to, or couples with, the natural vibration produced by the loudspeaker, which causes
them to begin resonating together. This pure tone resonance, or ‘ring’, causes the loud-
speaker to further increase in efficiency, and the microphone picks it up again and relays
it back to the loudspeaker. The coupling process is repeated at the speed of sound, and
as the set-points for minimum and maximum volume are exceeded, the resonance seizes
the system with abnormal levels of gain. This transition occurs very suddenly, temporarily
arrests the broadcast, and is dangerous to the system (including the ears of people in the
audience) if left unattended.
Consider a second pattern that also involves homeostasis, that is, a stable equilibrium
among interdependent elements. The scientific explanation of an organism’s regulation of
blood sugar appeals to homeostatic systems that use pancreatic endocrine hormones to
maintain blood sugar within a certain range (≈70–110 milligrams of glucose per 100 mil-
liliters of blood). If blood sugar decreases below this range, pancreatic alpha cells secrete
glucagon, which causes the liver to release stored glucose. If blood sugar increases above
the range, pancreatic beta cells secrete insulin, which causes adipose tissue to absorb glu-
cose from the blood. This explanation is also part of the explanation of diabetes, which
is a disorder characterized by the pancreas producing insufficient amounts of insulin.
These patterns, or effects, seem to be explained by describing their causes. The Larsen
effect is caused by a coupling between the microphone pickup and the loudspeaker’s
vibration, and this explains the volume feedback. According to the causal conception,
explanations appeal to causes that bring about the phenomenon to be explained. The
causal conception seems to account well for many explanations in science, including
especially in fields that do not deal with laws. As we emphasized in Chapter 7, knowledge
of causes enables prediction and manipulation of phenomena, via intervention on causal
factors. It’s plausible that explaining those causal factors is also central to understanding.
In one variety of causal explanation, the focus concerns how causal factors regularly
combine into complex systems that produce the target phenomenon. The blood sugar
regulation example exhibits this nicely. Pancreatic hormones, liver tissue, and blood sugar
ordinarily work together in complex ways to maintain blood sugar levels within a nar-
row range. Some call this variety of causal explanation mechanistic. The search for causal
Copyright © 2018. Taylor & Francis Group. All rights reserved.
mechanisms seems to play an especially important role in some parts of the social and
life sciences.
A causal conception of explanation can address the concerns raised earlier with the
nomological and pattern conceptions. First, causal explanations are automatically asym-
metric: causes explain their effects, but effects cannot explain their causes. This solves the
symmetry problem of nomological and pattern accounts. The reason why appealing to
the storm explains your mobile phone’s weather alert, but appealing to the weather alert
doesn’t explain the storm, is that the storm’s approaching is a causal factor in producing
your mobile phone’s alert; but the alert didn’t cause the storm.
Second, some causal relationships occur in very general patterns or are law-like in
nature, but others do not. If you heat ice, it will melt or evaporate. There are virtually no
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
286 Explaining, Theorizing, and Values
exceptions to this. If you heat chocolate, it will usually melt—but if you heat it too quickly,
it gets thick and lumpy instead. This is a general pattern, but it has some exceptions. In
contrast, perhaps the volcanic or asteroid episode that led to the dinosaurs’ extinction was
an event that will happen only once in the Earth’s history. Perhaps background conditions
had to be just right for such an event to cause a major extinction. But all of these, from
the law-like to the highly particular, are still cause-effect relationships.
This resolves the second concern with the nomological and pattern accounts: causal
explanations can range from the highly general to the highly specific. The third concern
raised with the other conceptions of explanation is that laws and general patterns seem
to be of decreasing importance in science. In contrast, we suggested in Chapter 7 that
causal reasoning is absolutely central to science.
Yet the causal conception of explanation faces its own difficulties. First, as we also
surveyed in Chapter 7, there is no consensus about the nature of causation. So, there are
sometimes disagreements about whether a given explanation captures genuine causes.
For example, are we sure that the economic law of demand is the kind of thing that can
causally explain a decrease in taxi use? Some people respond to this concern by adopting
a very inclusive view of causation. Others think that some explanations cite causes, and
others cite other kinds of regularities, like mathematical regularities.
A second difficulty with the causal conception of explanation stems from the observa-
tion that phenomena often have many causes. For this reason, causal explanations may
come too easily. Causal explanations often cite only one or a few causal influences, when
we know there are many causal influences on the phenomenon that’s explained. How is
this enough to explain the phenomenon? Some respond to this challenge by saying that
the more causal information you can give, the better explanation you have. Others seek
another principle to decide what causal information belongs in an explanation.
A third difficulty for the causal conception of explanation results from simply pointing
out its difference from the nomological and pattern conceptions. If it is right that general
patterns and scientific laws help us understand the world, at least sometimes, then the
causal conception of explanation is lacking. This is because the causal conception doesn’t
give us a way to recognize the explanatory value of general laws or patterns.
So far, we have talked about these three conceptions of explanation as if one is right
and the others wrong. But it’s possible that each conception captures certain elements
of what helps us understand the world. One initial reason to think this might be so is
that each of these conceptions of explanation aptly characterize some, but not all, of
the examples of successful explanation we have discussed. Perhaps laws, patterns, and
Copyright © 2018. Taylor & Francis Group. All rights reserved.
causes all can contribute to our understanding, and so any of these can be an ingredient
of explanation.
EXERCISES
8.1 First, rate your knowledge and familiarity with bicycles on a scale from 1 (‘I know little
or nothing about how bicycles work’) to 7 (‘I have a complete understanding of how
bicycles work’). Figure 8.4 is a partial sketch of a bike; you will notice that it’s missing
some parts. Try to finish the drawing, adding in your own sketch of the pedals, chain,
and the missing pieces of the frame.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 287
Frame
Pedals
Chain
Rate once again your knowledge and familiarity with bicycles on a scale from
1 (‘I know little or nothing about how bicycles work’) to 7 (‘I have a complete
understanding of how bicycles work’). Did your rating go up, down, or stay the
same? (Adapted from Lawson, 2006).
8.2 Describe the illusion of explanatory depth in your own words. Then, think through
possible explanations for this illusion. Describe the possible explanation that you
think is most promising, and say what might help you avoid the illusion of explana-
tory depth if your explanation is correct. Finally, describe how you might be able to
test that explanation to see whether it’s correct.
8.3 From your background knowledge and the information provided in this section, do
your best to answer each of the following questions.
a. Why is the sky blue?
b. Why is December cold in Sweden but warm in Australia?
c. How do earthquakes happen?
d. How does cancer kill an organism?
e. Why do objects fall when dropped?
What are the common features of the explanations you’ve given? What are
some differences, and what do you think accounts for them?
8.4 For each of your explanations in 8.3, identify what conception(s) of explanation fits it best
and say why. Then reflect on all of your answers together, and describe what you notice.
For instance, if you answered in the same way about all or most of the explanations, why
do you think that’s so? If you answered in different ways, what do you think accounted for
Copyright © 2018. Taylor & Francis Group. All rights reserved.
the difference? Is there any general form—any common features—to your explanations?
8.5 After looking back at the box on scientific laws, consider the following argument: if crite-
ria for lawfulness are necessary criteria, then something must satisfy them all to count as
a genuine law of nature. Research in psychology, biology, and other disciplines do not
satisfy all these criteria. So, there are no genuine laws in psychology, biology, and other
disciplines. But if scientific explanation is nomological, it requires genuine laws. Thus,
there are no explanations in psychology, biology, and perhaps other fields of science.
We’re pretty confident this conclusion is false, but the argument is deductively valid.
So, at least one of its premises must be false. Decide which premise you think is
mistaken and develop an argument defending your view.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
288 Explaining, Theorizing, and Values
Scientific Theories
Consider the grounds we’ve covered in the chapters of this book so far. In Chapter 1,
we considered what is distinctive about science. Chapter 2 focused on experiments and
other ways of testing hypotheses with observation. Chapter 3 looked at modeling, another
way of investigating hypotheses. Chapters 4–7 have all been about aspects of this same
process of subjecting hypotheses to empirical tests: deductive, inductive, and abductive
patterns of reasoning in scientific arguments; the role of statistics in representing data and
testing hypotheses; and the significance of causal hypotheses. All of this fits in some way
Copyright © 2018. Taylor & Francis Group. All rights reserved.
with the basic ingredients of recipes for science we laid out in Chapter 1: developing a
hypothesis, formulating expectations on the basis of the hypothesis, and testing expecta-
tions against observations. At the same time, there is also remarkable variation in how
science proceeds—recipes, not a single recipe—and we have tried to also give a sense for
that in how each of these topics has been addressed.
Still, recipes focused on hypotheses, expectations, and observations are not all there is
to science. We have already seen in this chapter that a central aim of science is explaining
our world. Scientists aren’t simply accumulating a list of confirmed hypotheses, the facts
we know about our world and ourselves. The project is bolder: scientists are charged with
helping us understand why and how things happen. And scientists are asked to furnish us
with tools for predicting and changing the world around us. Scientists also create scientific
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 289
theories, which are large-scale systems of ideas about natural phenomena, more general
and more elaborate units of knowledge than individual hypotheses typically are, and with
much more evidence to support them. Scientific theories thus provide bigger and more
powerful insights into the world.
Theories often go beyond what is readily observable. The Darwinian theory of evolu-
tion by natural selection is a grand theory about the origins of all the diverse life forms
on Earth, and Einstein’s theory of relativity is a grand theory about the very nature of
space and time. To be clear, empirical evidence has been central to testing and confirm-
ing both of these theories. But the content of these and other theories are usually taken
to be larger than their readily observable implications. Evolutionary theory, for example,
indicates what happened in the earliest years of life on Earth. Relativity theory tells us
what would happen if we travelled at the speed of light, and it also gives us a reason for
believing nothing but light will ever travel that fast.
In common usage, that an idea is a ‘theory’ sometimes indicates that it hasn’t been
tested out. Scientific theories are not like that. Quite the opposite, they are important
human accomplishments, as both the Darwinian theory of evolution and Einstein’s the-
ory of relativity illustrate. Yet, because scientific theories have implications that are not
immediately observable, they are never taken to be true beyond a shadow of a doubt,
no matter how much empirical data support them. Scientists have excellent justification
for the theories of evolution, relativity, and, say, the atomic composition of matter. Even
so, the possibility is held open that someday one or another of these theories, or another
theory among our prized scientific achievements, will be replaced by a better theory. This
possibility is intrinsic to the open and self-correcting nature of science.
Just as scientific theories go beyond the readily observable, theories also come about
not simply by extrapolating, or generalizing, from observations. Instead, there is usually
a significant conceptual shift, some feat of imagination, that gives rise to a new way
of thinking about observations. Darwin wondered whether the similar forms of life he
observed across continents might not suggest they dispersed from some ancient, common
ancestor. And he was inspired by an economist, Thomas Richard Malthus (1766–1834),
who wrote about the pressures to survive created by population increases. Einstein was
inspired by the puzzle of how to set clocks that are far apart to the exact same time and
how observers’ experiences vary depending on whether they are in motion, to reconsider
the very nature of space and time. In both cases, extensive observations were subsequently
obtained to empirically support the theories. But the initial idea was a kind of spark of
insight, a different way of thinking about what it was already known about the world.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Scientific Breakthroughs
No scientific theory is set in stone, and theories are sometimes replaced by successors. The
differences between a theory and its successor can be minor, or they can amount to radi-
cal shifts. The most significant scientific breakthroughs have been changes in worldview;
they involve comprehensive revision to how background or auxiliary assumptions, data,
and ideas are combined, and thus which scientific theory is supported.
Consider again theories of our universe and the bodies within it. The worldview
that arose with Aristotle (384–322 BCE) had great scope and logical coherence. The
Aristotelian theory of falling bodies claims that heavy bodies fall faster than light ones,
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
290 Explaining, Theorizing, and Values
and its geocentric conception of the universe placed Earth at the center, which fit with
most common observations of how the world was. But over time, observations were made
that the Aristotelian worldview couldn’t easily accommodate. Eventually, it was replaced
by a Copernican conception of the universe, followed by a Newtonian conception, with
the Earth not a fixed center but a planet in motion around the Sun.
Because of the dramatic change in worldview, astronomers from the 4th century BCE
and the 17th century would have agreed about the positions of the stars in the sky, but
they would have radically different interpretations of those observations. Similar observa-
tions provided clues to constructing explanatory theories, but the differences between
those theories were vast. This is the Scientific Revolution of the 16th and 17th centuries,
discussed also in Chapter 1.
Additional radical shifts followed on the heels of the rejection of the Aristotelian
worldview, and with these changes came radical revisions to accepted ideas about the
position and movement of Earth, the shape of orbits, and the nature of universal forces. In
general, each new theory accounted for some body of evidence better than its predecessor.
Still, most of the changes were rather radical changes in perspective. The same is true of
the later replacement of Newtonian mechanics with Einstein’s theory of relativity, when
universal forces were replaced by non-Euclidean geometries of space-time.
Scientific breakthroughs have periodically occurred in other fields of science as well.
This is as you’d expect if scientists are truly open to revising or replacing any theory when
doing so is warranted by the available evidence. And breakthroughs seem rewarding and
significant; there’s a sense that, after a scientific breakthrough, we more clearly understand
our world. An initial spark of insight leads to a conceptual shift that reinterprets existing data
to support a new theory, and then more data are discovered that confirm this new theory.
From another perspective though, the idea of scientific breakthroughs is also troubling.
What happened to our scientific knowledge from before the breakthrough—were scien-
tists just altogether wrong? How do we know that our current best theories won’t suffer
the same fate and also be rejected for new and better theories? Can we trust our current
scientific theories at all then? These are deep and troubling questions that strike right at
the heart of science. But let’s postpone that discussion until later in this section, after we
have a fuller picture of what scientific breakthroughs are like and how and why they occur.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 291
the notion of revolution. He suggested that scientific revolutions have occurred and will
continue to occur periodically as an important part of science. In his view, this would
prevent science from proceeding in a straight line by accumulating an increasing body of
knowledge and an expanding store of explanations. Kuhn thinks science instead proceeds
in phases. We’ll first describe these phases; then we’ll work through how they apply to
a specific scientific revolution.
Kuhn called the earliest phase of science pre-paradigmatic. This is characterized by
the existence of different schools of thought that debate very basic assumptions, including
research methods and the nature and significance of data. Data collection is unsystematic,
and it’s easy for theories to accommodate new observations because the theories are incho-
ate, or undeveloped. Such theories can easily be adapted in different ways to accommodate
new observations. There are many puzzles and problems but not very many solutions.
Kuhn’s second phase is the normalization of scientific research. One school of thought
begins to solve puzzles and problems in ways that seem successful enough to draw
adherents away from other approaches. Kuhn called this period normal science, because
widespread agreement about basic assumptions and procedures allows scientific research
to become stable. Scientific practices become organized. Laboratories or other workspaces
may be set up, experimental techniques and methods become widely accepted, and
agreed-upon measurement devices are improved.
During normal science, scientific developments are driven by adherence to what Kuhn
called a paradigm. Broadly conceived, a paradigm is just a way of practicing science. It
supplies scientists with a stock of assumptions about the world, concepts, and symbols
that they can use to more effectively communicate. It also involves methods for gathering
and analyzing data, as well as habits of scientific research and reasoning more generally.
Science students learn and come to tacitly accept the paradigm associated with a period
of normal science based on textbooks. Containing little historical insight into the dynam-
ics of scientific change, textbooks encapsulate the tenets of the paradigm, and provide
students with shared examplars of good science.
Kuhn thought that, during a period of normal science, each field of science is governed
by a single paradigm. But scientists in the grip of some paradigm have often ended up
with observations that are at odds with the paradigm or that lead to worrying puzzles
called anomalies: deviations from expectations that resist explanation by the reigning
theory. Usually, anomalies are just noted and set aside for future research. But anomalies
can accumulate, and this creates a kind of increasing tension for the accepted scientific
theory. Scientists begin to worry that the theory might not be right after all.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
The accumulation of anomalies sets science up for a crisis. A crisis occurs when
more and more scientists lose confidence in the reigning theory in the face of mounting
anomalies. For Kuhn, a paradigm is only rejected if a critical mass of anomalies have led
to crisis and there’s also a rival paradigm to replace it. Another theory has been developed
by some renegade scientists, and the problems with the existing paradigm mean that this
new theory—together with its auxiliary assumptions, methods, and so on—can finally get
attention. If this is so, a crisis might be followed by a scientific revolution. In this period
of science, all the elements of the accepted paradigm are up for negotiation. Data, inter-
pretations of data, auxiliary assumptions, methods, and technical apparatus—any and all
might be rejected, replaced, or reinterpreted from the perspective of the new paradigm.
This four-stage view of scientific change is summarized in Table 8.1.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
292 Explaining, Theorizing, and Values
Stage Features
contains a universal fire-like substance, which he named phlogiston (from Greek, meaning
flammable). Combustible materials, like wood, tend to lose weight when burned, and
Stahl explained this change in terms of the release of phlogiston from the combustible
material to the air. When the air becomes saturated with phlogiston or when a combus-
tible material releases all its phlogiston, the burning stops. Stahl believed that the residual
substance left behind after a metal burns is the true substance of the original metal,
which lost its phlogiston during combustion. This residue, which was called metal calx
(what we now know to be an oxide), has the form of a fine powder. Both metal calx and
the gases produced during combustion could be captured, measured, and experimentally
manipulated.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 293
Unlike gases and calx though, phlogiston was an utter mystery. Nobody had isolated it,
and nobody had found a way to experimentally manipulate it. In fact, phlogiston seemed
to have properties that were inconsistent with Stahl’s theory. Stahl believed phlogiston
had a positive weight. When you burn a piece of wood, the remaining ash loses phlogiston,
and it weighs less than the original log. But in other cases, for example, when magnesium
or phosphorus burn, the residue left behind weighs more than the original material. If
phlogiston was released during the burning process, why was there a gain in weight in
these cases? This is an anomaly.
Intrigued by this, the Lavoisiers experimented with a variety of metals and gases to inves-
tigate why and how things burn. They observed that lead calx releases air when it is heated.
This suggested that combustion and air were, somehow, linked. Explaining the link was a
difficult task, however, because at that time, little was known about the composition and
chemistry of air. Meeting the English theologian and polymath Joseph Priestley (1733–1804)
helped. Priestley had discovered a gas he called dephlogisticated air, which was released by
heated mercury calx. This gas was thought to greatly facilitate combustion because, being
free from phlogiston, it could absorb a greater amount of the phlogiston released by burning
materials. Candles burning in a container with dephlogisticated air would burn for much
longer, for example. This gas, Priestley observed, facilitated respiration too: mice in contain-
ers with dephlogisticated air lived longer than mice placed in containers without this gas.
The Lavoisiers tried to replicate Priestley’s experiments, and based on their own results
and observations, they elaborated a new theory of combustion. The central idea was that
combustion was the reaction of a metal or other material with the ‘eminently respirable’
part of air. Believing (incorrectly) that this kind of air was necessary to form all sour-
tasting substances, or acids, Lavoisier called it oxygène (from the two Greek words for
acid generator). According to this new theory, combustion did not involve the removal
of phlogiston from the burning material, but rather, the addition of oxygen.
This emerging rival paradigm set the basis for the revolution from which modern
chemistry emerged. In the 1780s, the Lavoisiers and other scientists adopted the idea of
a chemical element and of chemical compositions of simpler elements. This new system
of chemistry was set out by Antoine-Laurent Lavoisier in a textbook in 1789. As Kuhn
would expect, this book didn’t just describe the theory but also the other elements of
a paradigm. The book explained the effects of heat on chemical reactions, the nature
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
294 Explaining, Theorizing, and Values
of gases, and how acids and bases react to form salts. It also described the technological
instruments used to perform chemical experiments. And it contains a ‘Table of Simple
Substances’—the first modern listing of chemical elements.
After the publication of this textbook, most young chemists adopted Lavoisier’s theory
and abandoned phlogiston. ‘All young chemists’—Lavoisier wrote in 1791—‘adopt the the-
ory, and from that I conclude that the revolution in chemistry has come to pass’ (Donovan,
1993, p. 185). From a Kuhnian perspective, the next phase of normal science had begun.
Furthermore, the science of biology since the Darwinian revolution has not simply
consisted in the application of Darwin’s ideas, as Kuhn would have us expect for a
period of normal science. Rather, our understanding of evolution has been in continual
development. The so-called Modern Synthesis in the early 20th century integrated the
existing knowledge of genetics and Darwinian evolution, which had previously been seen
as competing theories. Other elements of evolutionary theory have been revised since,
like the recognition of non-genetic influences on traits and how significantly organisms
shape their environment, thereby affecting how natural selection acts on themselves and
nearby organisms of other species.
Another point in support of non-revolutionary scientific change is that theory change
doesn’t always involve the rejection of existing theories. Sometimes, it comes from the
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 295
joining of theories, as in the Modern Synthesis, and other times, it can come from new
methods. American biologist James Watson and English physicist Francis Crick, for exam-
ple, reached their groundbreaking conclusion that the DNA molecule exists in the form
of a double helix by applying a new modeling approach to data that had been gathered
by Rosalind Franklin. Using cardboard cutouts to represent the chemical components of
DNA, Watson and Crick tried to make different arrangements, as though they were solv-
ing a jigsaw puzzle. Through this concrete model-building, the double-helical structure of
DNA was identified. This had enormous consequences for subsequent biological research.
Mathematics and even philosophy can drive scientific theory change too. The develop-
ment of a new kind of geometry, non-Euclidean Riemannian geometry, paved the way for
Einstein’s theory of relativity. Einstein’s theory adopted this geometry as a description of
physical space-time. One basic difference between Euclidean and non-Euclidean geom-
etry concerns the nature of parallel lines. In Euclidean geometry, there’s only one line
through a given point that is parallel to another line. In some non-Euclidean geometries,
there are infinitely many lines through a point that are parallel to another line, and in
others, there are no parallel lines. This development in mathematics made it possible
for Einstein to wonder whether the geometry of our own universe might actually be
non-Euclidean.
Scientific Progress
Earlier, we raised worries about how scientific breakthroughs may undermine our con-
fidence in scientific theories. If some well-confirmed theories were eventually rejected,
who’s to say our current theories won’t also be rejected? And do such scientific break-
throughs make it so that science isn’t really making progress at all? Let’s consider these
questions in a bit more depth and isolate a few important considerations. But we’ve
entered deeper philosophical waters now, and this discussion won’t be decisive. There
are lots of interesting questions here about the nature and significance of science, even if
science is unquestionably our best way to gain knowledge about our world.
When scientific theories change, do we have reason to think that the new theory is an
improvement on the last one and that science is progressing toward truth? This question
is complicated by two features of theories and theory-change. First, theories often appeal
to phenomena that cannot be directly observed. Examples we have encountered in this
book include the Higgs boson, the first moments of the universe’s existence after the big
bang, and the original common ancestor of all life on earth. How can we ever be sure we
Copyright © 2018. Taylor & Francis Group. All rights reserved.
are right about these and other phenomena like them? Second, at least some instances of
theory-change have been radical: scientists rejected phlogiston, decided they were wrong
about the placement of Earth in the universe, and much more recently decided Pluto
wasn’t a planet after all. How can we ever be sure that our scientific findings are on a
path to truth, when the next radical revision could be right around the corner?
There’s at least one influential argument suggesting that, despite all this, we have reason
to believe that our best scientific theories are true. This argument—sometimes called the
no miracles argument—is an abductive inference, or inference to the best explanation,
from the success of science. It begins with the observation that our best scientific theories
are extraordinarily successful; they enable scientists to make empirical predictions, to
explain phenomena, to design and build powerful technologies. What could explain this
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
296 Explaining, Theorizing, and Values
success? One possible explanation is that our best scientific theories are approximately
true. And if these theories were not approximately true, then the fact that they are so
successful would be astonishing. So, it seems, the best explanation for the success of sci-
ence is that our best theories are true—or at least on the path to true and getting closer.
Yet some believe that this conclusion is overly optimistic. Here’s an inductive argument
for the opposite conclusion. If we examine the history of scientific theories in any given
field, we find a regular turnover of older theories rejected in favor of newer ones. So, most
past theories are plainly false. Therefore, by generalizing from these cases, most scientific
theories are false. It seems this would include our current theories too. This suggests our
current theories stand a good chance of eventually being replaced and regarded as false.
The upshot of this argument—sometimes called the pessimistic meta-induction—is that we
do not have a strong reason to think our current best scientific theories are true.
This argument raises questions about how certain we can be about our current scientific
theories. But we want to emphasize that science is the single most successful project for
generating knowledge that humans have ever embarked on. Science as a set of methods
for investigating our world has persisted for centuries and is unlikely to be surpassed,
even if individual scientific theories are sometimes abandoned.
EXERCISES
8.11 Write a list of the primary features of scientific theories based on the discussion
from early in this section. How do theories differ from hypotheses and laws? What
features do they all share in common?
8.12 What do scientific theories add to science, beyond the processes of hypothesis-
testing we have mostly focused on in this book? How does theorizing relate to
hypothesis-testing?
8.13 Look back at the Higgs boson discovery discussed in Chapter 6. This discovery was
additional confirmation of the so-called Standard Model in physics. Investigate this
theory, then answer the following questions about it as best you can.
a. What is the theory a theory of—that is, what phenomena is it supposed to be about?
b. What are the central concepts featuring in the theory?
c. Are some claims made about things that we can’t directly observe? What kinds
of things?
d. What do scientists explain, predict, and describe with the theory?
Copyright © 2018. Taylor & Francis Group. All rights reserved.
e. What are some of the considerations that sparked the development of the theory?
f. Has the theory undergone any changes over time? Which one(s) and why?
8.14 Describe the features of each of Kuhn’s expected stages in your own words: pre-
paradigm science, normal science, crisis, and scientific revolution. Illustrate each
stage by describing how it applies to the chemical revolution.
8.15 Consider again the Copernican revolution, chemical revolution, and the Darwinian
revolution. Evaluate the merits of Kuhn’s view of scientific change. What do you think
are strengths of his view? Do you think there are any weaknesses or ways it is limited?
Support your points by referencing these episodes of scientific change. (You’re also
welcome to appeal to other scientific breakthroughs discussed earlier in this book.)
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 297
8.16 How does the existence of scientific breakthroughs, or revolutions, challenge the ideas
of scientific truth and scientific progress? Motivate the concern as well as you can.
Then, evaluate the merits of the concern, thinking back to all you’ve read in this book.
8.17 Think of the case of Lavoisier and the chemical revolution and the case of Darwin’s
evolutionary theory. In each of these episodes, in what sense did the breakthrough
represent progress? In what ways did chemistry and biology improve? More gener-
ally, what standards do you think we should use to identify progress and advances
in science?
• Describe three examples of how science has been influenced by its social and his-
torical context
• Articulate how exclusion and marginalization based on race and ethnicity, national-
ity, gender, sexuality, and other factors have negatively influenced both society and
science
• Define the value-free ideal for science and give an example of when values have
influenced science in a problematic way
• List five ways in which values influence science in legitimate ways and give an ex-
ample of each
• Characterize the main contemporary challenges to science’s trust and objectivity
The social and historical context of scientific activity significantly influences the nature
of science. Even as science aspires to produce knowledge that is not limited by a specific
perspective, scientific theories are also creatures of the times, places, and people who cre-
ated them. Recall how Darwin’s ideas about evolution were influenced by the economist
Malthus, for instance. Some have also suggested that how Darwinian evolutionary theory
dealt with sexual reproduction and the differences between male and female animals
was strongly influenced by Victorian moral sentiment (Knight, 2002). Darwin took it for
granted that, throughout the animal kingdom, male animals are promiscuous and aggres-
sive and female animals are ‘coy’ and selective. This looks suspiciously like human gender
norms—in Darwin’s Victorian England and, to some extent, in many cultures today. While
Darwinian evolutionary theory was certainly a tremendous step forward for biology, it
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
298 Explaining, Theorizing, and Values
was also influenced by the time and place of its creation and perhaps by features of the
individuals who created it.
So, science seems to be shaped by its social and historical context. Science is also
regularly used to promote particular social aims. The difficult truth is that, throughout
history, science has regularly been used to promote objectionable social aims and, at times,
has even been pursued in ways that incorporate morally repugnant social views. Science
has been used to expand power over others, to invent nuclear and chemical weapons for
the purpose of mass casualties, and to amass wealth for the few, as with research for the
fossil fuel industry. Science has also been used to promote misinformation, as when the
Ethyl Corporation paid Robert Kehoe to vouch for the safety of lead in gasoline (recall
from Chapter 1) or when tobacco corporations paid scientists to present cancer research
in a way calculated to mislead the public. Science has also directly abused people from
marginalized groups, as when the Nazis ran deeply cruel experiments on the prisoners of
concentration camps and when the US Public Health Service ran the Tuskegee Syphilis
Experiment. In this clinical study, researchers withheld treatment from 399 impoverished,
rural African-American men who had syphilis. They never informed the participants that
they had syphilis or that there was a cure for the disease.
In this section, we take a hard look at the relationship between science and society.
We will consider how the participants in science and the social context of science influ-
ence the development of science. We will investigate the roles moral and political values
should and should not play in science. And we will also raise some of the most pressing
challenges to science and scientific authority in the contemporary world.
Participation in Science
Let’s first explore the idea that the traits of scientists might influence the nature of the
scientific endeavor itself. Here’s one way in which this seems to be so. A negative social
influence on science is the exclusion or marginalization of individuals from the scientific
community because of their gender, race and ethnicity, sexuality, or social and cultural
background. The English polymath Alan Turing (1912–1954) did groundbreaking research
in computer science, formal logic, mathematics, cryptography, and morphology. During
World War II, he helped crack the code used by the Nazis to protect their military com-
munication, an achievement that many historians believe was the single greatest contri-
bution to the Allied victory. Turing was also a visionary of artificial intelligence. You may
have heard of the Turing machine and Turing test, which he invented; he anticipated that
Copyright © 2018. Taylor & Francis Group. All rights reserved.
human intelligence would one day be matched by machines. Turing was also gay, and at
the time, this was illegal in Britain. Despite his groundbreaking scientific contributions,
Turing was arrested and chemically castrated by the British government. Humiliated and
resentful, he killed himself at the age of 41.
Being outed as gay in the mid-20th century UK was awful; matters were also dark
for women in science for most of history. British-American astronomer Cecilia Payne-
Gaposchkin’s dissertation Stellar Atmospheres in 1925 became a cornerstone of modern
astrophysics, and for this, she was rewarded with low-paying adjunct teaching work for
the next 20 years. Rosalind Franklin (1920–1958) was an English chemist and x-ray crys-
tallographer, whom we mentioned earlier for her important contributions to the under-
standing of DNA’s structure. Among other contributions, Franklin was responsible for an
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 299
FIGURE 8.6 (a) Rosalind Franklin; (b) Franklin’s x-ray diffraction image that famously inspired
Watson and Crick’s double-helix model of DNA
x-ray diffraction image that was shared with Watson and Crick without her knowledge
(pictured in Figure 8.6b). After seeing that image, Watson and Crick developed their
physical model of DNA. They went down in history as having discovered DNA’s double
helix structure, eventually winning the Nobel Prize for this work. In contrast, Franklin’s
enormous contributions to the discovery were recognized only after her death. A similar
story is that of British neuroscientist Kathleen Montagu (est. 1877–1966), who published
her discovery of the neurochemical dopamine in the human brain in 1957. Her work
was largely overshadowed by a very similar discovery three months later by Swedish
neuropharmacologist and Nobel Prize winner Arvid Carlsson and colleagues.
This is a common enough phenomenon that it has been named. The Matilda effect
is the bias against recognizing women scientists’ achievements. These women’s work is
often uncredited or else attributed to their male colleagues instead. Societal prejudice has
made it more difficult for not only women but also racial and ethnic minorities, people
from developing nations, and other marginalized groups, to be supported in their scientific
work and even to become scientists in the first place.
When only certain kinds of people participate in science, the value of science is lower.
For one thing, science squanders or loses out entirely on the contributions of people who
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
300 Explaining, Theorizing, and Values
the differences among them contribute to the range of questions, richness of ideas, and
ultimately the success of science. If instead only certain kinds of people dominate science,
like middle-class and wealthy white men from developed countries, the kinds of questions
posed and ideas generated may reflect the limitations of the scientists.
As an illustration of how diversity contributes to scientific success, consider Temple
Grandin, an American researcher in animal behavior. Grandin explicitly brought her
experiences as an autistic woman to bear in ways that led to dramatic improvements
in the efficiency and ethical treatment of animals in slaughterhouses. Another example
is Barbara McClintock (1902–1992), who significantly advanced our understanding of
genetic inheritance by discovering ‘jumping genes’, or transposons. These are parts of
chromosomes that are moved from one of the chromosomes in a pair to the other in an
early step of sexual reproduction. McClintock’s great insight arose from a simple decision
about how to study genes, and it has been suggested that this decision was motivated
in part by her outsider status in science and her identity as a woman (Keller, 1983).
McClintock departed from the well-established practice of focusing on mosquito genes
by instead studying the genes of corn, or maize. Mosquitos are targeted in such studies
because they are genetically simple, with only four chromosome pairs. Maize, in contrast,
has ten chromosome pairs. This added complexity makes them more difficult to study,
and McClintock was criticized for her decision. But this also made it possible to observe
jumping genes in action.
The issue is more than just who is recognized for what discovery, how breakthroughs
are received, and who gets to be a scientist. Different people bring different styles of
reasoning to the table, and scientific progress often demands creativity and seeing things
anew. For these reasons, the inclusion of diverse people—with different personalities and
backgrounds—in science doesn’t just benefit those individuals and society; it also makes
science itself more successful.
idea that science should be free from the influence of our values—such as moral, social,
or political beliefs. Scientific theories and hypotheses should be accepted only when
evidence and reasoning favor them, not just because we want them to be true. This is an
ideal because it’s not something we think actually always happens. Science has a spotty
track record in this regard: it’s too often been used to further racist or sexist views, in
support of one nation’s dominion over others, or to contribute to corporate profits. But
the value-free ideal suggests that this shouldn’t happen, that any science influenced by
moral or political views is bad science. Ideally, science will be governed by evidence and
reasoning and not by the values of scientists, their funding sources, or societal trends.
Because the desirability of scientific theories—whether we want them to be true—
makes no difference as to whether they are true, the desirability of a theory also shouldn’t
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 301
make a difference as to whether the theory is accepted as true. Questions about the
reality of climate change, about evolutionary changes in species, or whether GMO foods
are safe, and how male brains and female brains are different, and so forth, may evoke
emotional reactions from people, but those emotions aren’t part of answering these ques-
tions. These questions can only be answered by conducting experiments or observational
studies, constructing models, evaluating evidence, and applying statistical reasoning—by
applying the recipes for science.
Still, there are some challenges to the value-free ideal for science. First, science’s spotty
track record in achieving freedom from the influence of political and economic values
might inspire skepticism of the value-free ideal. Careful historical studies of episodes in
science reveal the influence of values. We mentioned how Darwin’s evolutionary theory
encodes Victorian morality and how science has sometimes directly abused marginalized
people. In Chapter 5, we mentioned Galton’s work on eugenics, which was just the tip
of the iceberg of racist and sexist uses of science aimed to affirm white male superiority
over others. If the ideal science is value-free, it’s unclear that science has come close to
that ideal.
Second, even if we just think about what we want science to be like—the ideal—it’s
unclear that values should be entirely absent. Is discovering a vaccine for the Zika virus,
which is easily transmitted to humans by mosquito bites and leads to serious birth defects
when pregnant women are infected, more important than discovering new facts about
quasars, pulsars, supernovas, or other astral phenomena? If you think so, this reflects
a value you hold. If you think Zika research should be prioritized over astronomical
research, then you think this value should influence science. You probably agree that the
US Public Health Service should not have withheld syphilis treatment from 399 impov-
erished African-American men without their knowledge. This too is wishing for people’s
shared values to influence science.
The value-free ideal suggests that science is simply a source of objective facts about
the world, immune to influence by human values. On the other extreme, some believe
science only serves pre-existing values. The right view of how values should influence
science is somewhere between these extremes.
Scientists are human beings with different moral, political, and religious values, and
we suggested just a moment ago that the social context for science and who partici-
pates in science can both influence scientific findings. And yet, the recipes for science
are designed to limit the kinds of influence social and individual values have on science.
Science is not, and should not be, value-free. Nonetheless, there are ways that our values
Copyright © 2018. Taylor & Francis Group. All rights reserved.
should influence science, and there are ways in which science should limit or eliminate
the influence of values.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
302 Explaining, Theorizing, and Values
None of these roles for values seems to interfere with science’s objectivity. We might
think of these, then, as legitimate influences of values on science. Some other ways in
which values sometimes have influenced science seem to be illegitimate. This includes,
for example, endorsing a scientific theory not because evidence and reasoning support it,
but just because you want it to be true. It also includes doctoring experimental results
so the data sets support a hypothesis that you want to believe is true. Reflecting on these
legitimate and illegitimate influences of values suggests a dividing line. When scientists’
values lead them to violate the recipes for science—acceptable approaches to data col-
lection and modeling, to hypothesis-testing and abductive reasoning, to statistical analysis,
and to causal reasoning, and so on—this is illegitimate. When scientists use values to
supplement or guide the use of the recipes for science, this can be legitimate.
In his book A Tapestry of Values, philosopher Kevin Elliott (2017) divides the legitimate
roles values can play in science into five categories, as helping to answer five different
questions. These questions are summarized in Table 8.2. Answers to these questions are
needed in order to know which scientific methods to employ, on which phenomena, and
to which end. These roles for values thus align with our suggestion that legitimate uses
of values supplement or guide the methods of science but do not violate those methods.
To begin, scientists’ individual values and societies’ collectively held values help answer
the first question about what to study. Individually, a researcher’s interests and values
surely shape what field of science she pursues, which lab she works in, and what problems
she tackles. Collectively, we choose what kinds of scientific research to support when
funding agencies, including tax-supported governmental agencies, designate the areas of
research they fund and which specific research projects in those areas to fund.
Beyond what to study, values also inform decisions about how phenomena should be
studied. Scientists can bring different questions, methods, and auxiliary assumptions to
bear on any given topic, and these choices in how research is pursued reflect research-
ers’ and society’s values. One instance of this influence is how the initial hypothesis and
assumptions about the causal background both guide experimental design, including the
TABLE 8.2 Five questions that arise when doing science that our values help us answer
(Elliott, 2017)
Question Example
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 303
nature of the intervention and which extraneous variables to control. As with the initial
choice of what to study, funding agencies can also influence how phenomena are studied.
For example, research into depression can focus on the efficacy of cognitive therapy; the
role of sleep, diet, and exercise; or the efficacy of pharmaceutical intervention. The choice
to strongly prioritize pharmaceutical intervention to the exclusion of other focuses reflects
the outsized influence drug companies have had on medical science (Elliott, 2017).
The third question that a focus on values helps to answer is what, exactly, scientists
are trying to accomplish in studying some phenomenon. This is an even more fine-
grained decision than just what to study and how to study it. In climate research, for
example, scientists might prioritize getting information about climate trends available
quickly so that it can guide policy, or they might prioritize getting as accurate informa-
tion as possible, no matter how long it takes. This decision about the aim of research
is influenced by values, including views about the social role the scientific research is
expected to have.
Fourth, values influence how scientists proceed in the face of uncertainty. Scientific
results are never free from uncertainty. There’s the basic problem of measurement error.
We’ve also seen that, if observations don’t match expectations, it could be the fault of the
hypothesis, or it could be the fault of some auxiliary hypothesis. In an experiment, no mat-
ter how perfectly controlled, there’s always the chance that an unexpected confounding
variable has interfered with the finding. In statistical hypothesis-testing, scientists choose
whether to reject the null hypothesis, and either choice could be wrong. These are just a
few examples of the unavoidable uncertainty in science and the need to choose how to
proceed in the face of uncertainty.
These kinds of uncertainty are all forms of underdetermination. Recall from Chapter 2
that underdetermination is when the evidence available to scientists is insufficient to
determine which of multiple theories or hypotheses is true. Some believe that there is
even permanent underdetermination in science: that there will never be enough evidence
to conclusively decide in favor of one hypothesis or theory and against all possible alter-
natives. When scientists face underdetermination, they must choose what to believe or
whether to suspend judgment.
Because of this unavoidable uncertainty, scientists must decide how much evidence to
require before endorsing a theory or hypothesis (or before rejecting a theory or hypoth-
esis). Safety is very important to us, whether for drinking water or medications, so toxi-
cology tests must have a high bar for success. There is a lower bar for deciding whether
a new drug is more effective than an already available drug. Scientists also must decide
Copyright © 2018. Taylor & Francis Group. All rights reserved.
how to represent scientific uncertainty to the public. In 1988, climate scientist James
Hansen declared that climate change, global warming, was occurring. He described that
as a decision based on weighing ‘the costs of being wrong to the costs of not talking’
(Weart, 2014, referenced in Elliott, 2017). There was already enough evidence for Hansen
to be relatively confident in his choice to speak up. Decades later, of course, there is now
incontrovertible evidence of climate change.
This introduces a fifth category of values’ legitimate influence on science regarding how
scientists—and journalists and others who communicate scientific findings to the broader
public—should talk about those findings. As Elliott stresses, this isn’t just a decision to be
accurate. Scientific findings also can be discussed in their relationship to previous findings,
their potential social effects, or—picking up on the previous point—their level of certainty.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
304 Explaining, Theorizing, and Values
This framing influences whether and how the public will engage with research, and this
is a choice not dictated by scientific methods but by scientists’ and society’s priorities.
So, what scientists should study, how they should study it, what they aim to accomplish,
how much evidential support should be required, and how scientists should communicate
their results all depend on moral considerations—on values. These are legitimate influ-
ences of values on science. Recognizing these roles for values in science is crucial. This
enables us, as a society, to critically assess what values are employed at each of these
junctures. The influence of values on science can be problematic or even nefarious if the
wrong values are employed at any one of these stages. Figuring out the right and wrong
values is tricky and, it is not a matter for scientists alone to decide. Instead, this is an issue
that needs to be engaged with broadly in our society.
Examples of problematic values influencing science are, unfortunately, very easy to
come by. Here’s one. In 2017, US President Donald Trump proposed that NASA resources
should be dedicated to exploring the solar system instead of to climate change research.
This research priority—a decision about what to study—is a reflection of values endorsed
by a small but vocal contingent of the Republican Party. Choosing not to fund climate
change research amounts to deciding that knowledge about the rate and impact of climate
change is relatively unimportant. But because climate change is already having disastrous
effects on populations, the environment, and economies across the world, and because
the long-term costs of ignoring it will be disastrous, this decision was arguably the wrong
decision on moral grounds. Pulling funding from NASA’s Earth science division in order
to avoid investigating climate change and its effects upheld the wrong values. (Notice this
doesn’t mean that space exploration is unimportant! It too should be funded.)
Other examples of problematic values influencing science through the proper chan-
nels include the outsized influence the pharmaceutical industry has on medical research,
the continuing exploitation of at-risk communities due to the approaches used to study
them, and powerful corporations controlling what messages the public gets about climate
change and the risks of fossil fuel extraction. We’ll engage with some of these problems
in the next section.
We have suggested that science doesn’t have to be free from values to be trustworthy
and objective. What matters is that values influence science in the right ways and that
science effectively resists the problematic influence of values. Values, even good values,
shouldn’t play the wrong roles in science; we should never decide a theory is true simply
because we wish it were true. Further, the wrong values shouldn’t influence science, even
through the proper channels. To better understand how science earns its trust and objec-
Copyright © 2018. Taylor & Francis Group. All rights reserved.
tivity, it’s important to acknowledge the many roles of sociopolitical and moral values
in scientific reasoning and to critically examine the values that influence our science. By
doing these things, we can clarify what values should influence science and in what ways.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 305
welcoming of diverse perspectives. Objectivity in science can occur when scientists’ judg-
ments are critically and openly assessed in light of other data and investigations, as well
as competing interpretations and alternative possibilities.
However, this intersubjective process, and thus science’s capacity for self-correction,
faces significant challenges. Some of the most significant challenges relate to the incentive
structure in science and how it shapes scientific findings in ways that undermine trust
and objectivity. Facing up to these challenges requires us to think carefully about the
scientific process, the role of incentives in shaping it, and what values are thus finding
their way into science.
Let’s back up. What is science’s incentive structure, and how does it create challenges
for trust and objectivity? As we have seen, science is a social practice that occurs in insti-
tutions like universities and national research centers. Scientists are professionals who get
paid for teaching and for doing research. But university salaries are, in most cases, not
enough to fund scientific research. Scientists need extra money to pay for scientific instru-
ments and lab equipment, for experimental participants, and for their assistants. This extra
money is generally awarded by public agencies like the ERC (European Research Council)
in Europe and the NSF (National Science Foundation) and NIH (National Institutes of
Health) in the US. The competition is fierce; every year, the number of applicants for
funding grows, while, partly due to budget cuts, the number of available awards shrinks.
Scientists’ ability to secure grants determines their career prospects. And their chance
of securing grants depends on their quantity and quality of publications, frequency of
citations, previously awarded grants, and the public attention they are able to attract.
‘Publish or perish’ is the phrase coined to capture the increasing pressure in science to
rapidly and continually publish work in order to sustain one’s career. The competition for
space in prestigious journals is also fierce; many have rejection rates of greater than 90%.
Because scientific production has increased dramatically over the years, journal editors
usually prefer to publish novel results that support an exciting hypothesis rather than
very robust and well-documented negative results.
Consequently, scientists have a strong incentive to produce surprising, positive results.
Other types of scientific research are harder to place in top journals. These include the
negative result that a hypothesis wasn’t supported by the evidence, studies that replicate
or assess previously published results, and preliminary, exploratory investigations that
are not decisive. The tendency to reward only one form of scientific finding is called
publication bias. This is common across all scientific journals but especially strong in
the most prestigious journals. Publication bias, coupled with the scarcity of resources and
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
306 Explaining, Theorizing, and Values
The literature on associations between specific foods and cancer risk, for example,
may be seriously distorted. Statistically significant associations with cancer risk have been
claimed for most food ingredients, from beef to tea. Careful analysis of this literature
highlights that many published studies report implausibly large effects, even when the
actual evidence is weak and effect sizes small (Schoenfeld & Ioannidis, 2013). Dissent
and work toward replicability would improve the reliability and validity of claims about
the role of food in cancer risk.
The incentive structure of current science also negatively impacts scientists’ communi-
cation of their results. The emphasis on producing exciting, publishable results may lead
scientists to cut corners in how experiments are designed, and how data are analyzed and
presented. Whether or not these are conscious decisions, scientists may not randomize
their experiment or not control for some known confounding factors. Another common
shortcut is data dredging, where data mining techniques are used to uncover patterns
in data sets that support a hypothesis not under investigation. This makes it more likely
that a claimed pattern is actually a type I error (see Chapter 6) and the supported
hypothesis is false. Relatively few studies report effect sizes and measures of uncertainty
in a transparent way, so it’s often hard for others to assess the quality of a study and the
soundness of the methods.
Fierce competition in science also leads more and more scientists to abandon academia
for jobs in industry. IT, AI, and pharmaceutical, chemical, and agricultural industries
have been hiring more and more scientists. This raises another worry about sincere and
transparent communication of scientific results. Being funded by a private company to
carry out research may pose conflicts of interest, which introduces funding bias. Scientific
studies are more likely to have findings supporting the interests of the study’s financial
sponsor. This can happen because of how values influence science—what to study and
how, what the aim is, how to handle uncertainty, and how to present the findings. It
can also happen via intended or unintentional improper influence on data or methods.
Regardless, this leads to corporations having outsized influences on the nature of our
scientific knowledge and, in some cases, unknowingly—or even knowingly—misleading
the public with bad information.
Another challenge regards communication: too much science is inaccessible to the
general public and even to many other scientists. Scientific studies get published by for-
profit journals, and these journals typically put articles behind pricey paywalls. Academic
institutions can pay for their faculty and students to have journal access, but not all
academic institutions can afford subscriptions to these journals. By the time science is
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 307
by competing, and perhaps better, theories and methods. Demographic diversity and
diversity in political views are also important for science’s capacity for self-correction
(Duarte et al., 2015), and yet science has historically been, and remains, limited in both
of these kinds of diversity.
In this book, we have painted a picture of science as fallible but with the tools to
reliably generate knowledge. Some scientific knowledge has had dramatic practical
importance—just think about the outstanding progress of the medical sciences and of
computer science and AI. Other knowledge regards fascinating aspects of the faraway
universe and the strange behavior of microparticles. The value of science in producing
knowledge requires openness to criticism and dissent, the pursuit of meaningful questions,
and the communication of results in a sincere and transparent way. Only then can science
live up to its self-correcting ideal, generate objective knowledge, and thus deserve our
trust. It’s looking like some features of science, including its current incentive structure,
need to be changed to promote these ends.
EXERCISES
8.18 Define diversity in your own words. Choose three characteristics of people (for
example, gender, nationality, and political views), and, for each, describe how sci-
entists’ diversity in that characteristic might contribute positively to science. You can
think about any field(s) of science that will help in answering this question.
8.19 Describe two or three steps that you think could be taken to increase diversity in
science. Mention also any concerns or downsides you can think of for each of these
steps.
8.20 Describe in detail an example of when values have influenced science in an illegiti-
mate way. Then diagnose what went wrong. What was wrong about the values or
the nature of their influence, and what was the detrimental effect to science?
8.21 State the value-free ideal of science. Then, summarize the view of how values can
legitimately factor into science outlined by Kevin Elliott. In your view, does that view
of values’ influence violate the value-free ideal, or is it consistent with that ideal?
Give an argument in support of your answer.
8.22 Suppose you are working for an NGO (non-governmental organization) on the task
of measuring poverty levels across countries. For each of the following decisions,
describe at least two ways to proceed, and say how values are relevant to making
Copyright © 2018. Taylor & Francis Group. All rights reserved.
the decision.
a. Which countries will you include in the study?
b. How will you define and measure poverty?
c. What extraneous variables will you take into account?
d. How will you make comparisons across countries?
e. How will your results be publicized?
8.23 List several potential ethical problems arising from scientific research funded by the
pharmaceutical industry. For at least three of these problems, describe a concrete
action to address that ethical problem that could be taken by governments, pharma-
ceutical companies, or some other party.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
308 Explaining, Theorizing, and Values
8.24 Describe a real example of when scientists need to act in the face of uncertainty.
Describe the nature of the uncertainty and explain how social, economic, and moral
considerations might factor into the decision of how to proceed.
8.25 Choose three of the main contemporary challenges to science’s objectivity described
in this section, and rank them in importance from 1 to 3, where 1 is the most important.
For each, describe why it is a problem, including some considerations not provided
in the text; then suggest one step that you think could help address the challenge. You
should also assess how practical it is to implement each of your suggestions.
8.26 In recent years, there have been several retractions of published scientific articles that
have captured the world’s attention. In 2015, it was the retraction of a paper about
gay marriage that was initially published in the prominent scientific journal Science.
Read the description of this case on Retraction Watch (https://retractionwatch.
com), and then answer the following questions.
a. What risks do people who report misconduct in science (whistleblowers) face?
b. Were human subjects ‘harmed’ in this case, and if so, how?
c. Describe how data management issues influenced this case.
d. Describe how authorship issues influenced this case.
e. Does this case raise any conflict of interest?
f. What issues does the case raise about collaborating with others?
g. Describe how replication issues influenced this case.
8.27 Look back at the case of climate change discussed in Chapter 1. Identify at least five
ways in which values are likely to have affected that research, and describe how
those values have impeded or promoted scientific knowledge of climate change.
FURTHER READING
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Explaining, Theorizing, and Values 309
For an exploration of how social conditions influence science, see Merton, R. K. (1973).
The sociology of science: Theoretical and empirical investigations. Chicago: University of
Chicago Press.
For an account of values in science focused especially on underdetermination, see Douglas,
H. (2000). Inductive risk and values in science. Philosophy of Science, 67, 559–579.
For an overview of objectivity in science, see Reiss, J., & Sprenger, J. (2014). Scientific
objectivity. In Stanford encyclopedia of philosophy. Retrieved from https://plato.stanford.
edu/archives/win2017/entries/scientific-objectivity/.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:53:53.
Glossary
abductive inference: a commonly used type of ‘backward’ scientific inference that attributes special
status to explanatory considerations; also called inference to the best explanation
abstraction: leaving out or ignoring known features of a system from a representation or account of it
accuracy: the extent to which a model correctly represents the true value of a target system
addition rule: the probability that one of a number of mutually exclusive outcomes will occur is the sum
of their individual probabilities
affirming the antecedent: using the truth of a conditional statement and its antecedent as grounds for
concluding the consequent is also true; a deductively valid form of inference
affirming the consequent: using the truth of a conditional statement and its consequent as grounds for
concluding the antecedent is also true; a deductively invalid form of inference
algorithm: step-by-step procedure for obtaining some outcome
alternative hypothesis: in statistical hypothesis-testing, the a bold and risky conjecture that, contrary
to the null hypothesis, the variables in question are statistically dependent
ampliative inferences: when conclusions express content that, in some sense, goes beyond what is
present in the premises
analogical models: physical or abstract objects with features analogous to focal features of a target
phenomenon used to model the phenomenon
anomaly: a deviation from expectations that resist explanation by the reigning scientific theory; (Kuhnian)
motivation for scientific revolution and paradigm shifts
antecedent: the left side of a conditional claim; a condition that guarantees some consequence; logically
prior
appeal to ignorance: an informal fallacy; concluding that a certain statement is true because there is
no evidence proving that it is not true
appeal to irrelevant authority: an informal fallacy; appealing to the views of an individual who has
no expertise in a field as evidence for some claim
Copyright © 2018. Taylor & Francis Group. All rights reserved.
applied research: scientific knowledge used to develop some product, like techniques, software, pat-
ents, pharmaceutical drugs, or new materials; often, a central motivation is to generate products for
profit
argument: a set of statements in which some of the statements, the premises, are intended to provide
rational support or empirical evidence in favor of another statement, the conclusion
assumption: a specification that a target system must satisfy for a given model to be similar to it in the
expected way
auxiliary assumptions: a set of assumptions about how the world works that often go unnoticed but
are needed for a hypothesis or theory to have the expected implications; also called background
assumptions
average: see mean
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:10.
Glossary 311
axioms: statements that are accepted as self-evident truths about some domain, used as a basis for deduc-
tively inferring other truths (theorems) about the domain
bar chart: visual representation of statistical outcomes in which bars of different heights are used to show
the frequency of different values for some discrete variable
basic research: scientific research that aims at knowledge for its own sake; also called pure research
Bayes factor: a compact, numerical way of measuring the statistical evidence for a hypothesis H0 with
respect to alternative H1. It is defined by the formula: B01(E) = Pr(H0|E) x Pr(H1) / Pr(H1|E) x Pr(H0) =
Pr(E|H0) / Pr(E|H1)
Bayes nets: causal Bayes networks, or nets; a kind of probabilistic model that provides a compact, visual
representation of the causal relationships in a system and the strength of those relationships by using
joint probability distributions
Bayes’s theorem: a mathematical formula used for calculating conditional probabilities. It is defined by
the formula: Pr(H|O) = Pr(O|H) x Pr(H) / Pr(O). Another form of Bayes’s Theorem that is generally
encountered when comparing two competing hypotheses H and not-H is: Pr(H|O) = Pr(O|H) x P(H) /
Pr(O|H) x P(H) + Pr(O|not-H) x Pr(not-H); the heart of Bayesian statistics
Bayesian conditionalization: a probabilistic rule of inference. It says that, upon observing new evi-
dence O, the new degree of belief in a hypothesis H ought to be equal to the posterior probability
of H: Prnew(H) = Pr(H|O)
bell curve: see normal distribution
biased variable: a random variable that is not fair, that is, for which some outcomes are more likely
than others
big data: very large data sets that cannot be easily stored, processed, analyzed, and visualized with
standard statistical methods
bimodal distribution: two values in a range are the most common; in a histogram, there are two peaks
blind experiment: an experiment or study designed so that the scientists recording or taking measure-
ments don’t know which subjects are in the control group and which are in the experimental group
calibration: comparing the measurements of one instrument with those of another to check the instru-
ment’s accuracy so it can be adjusted if needed
case study: a detailed examination of a single individual or system in a real-life context
causal background: the other factors that in fact do or in principle might causally influence two events,
thereby also potentially affecting the causal relationship between the two events
causal conception of explanation: the view that explanation involves appealing to causes that
brought about the phenomenon to be explained
central limit theorem: a statistical theory that samples with a large enough size will have a mean
approximating the mean of the population
central tendency: a distribution with one peak at the center, corresponding to the most common group
of values of a variable
Copyright © 2018. Taylor & Francis Group. All rights reserved.
cluster indicators: identify several markers of some trait in order to more precisely define the trait while
not oversimplifying it
cohort study: a study in which researchers select a group of subjects and track them over time, at set
intervals, to observe the effects of some condition they experience
collecting data: gathering and measuring information about variables
collectively exhaustive outcomes: when at least one outcome of a set of outcomes must occur at
any given time
common cause: when neither of two covarying types of events causes the other but a third event causes both
computer simulation: a program run on a computer using algorithms to explore the dynamic behavior
of a target system; also called computer model
conclusion: in an argument, a statement that is supported by the premises
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:10.
312 Glossary
conditional probability: the probability of an event’s occurrence given that some other event has
occurred; expressed Pr(X|Y) where X is conditional on Y
conditional statements: statements in which one circumstance, the antecedent, is given as a condition
for another circumstance, the consequent; the antecedent guarantees the occurrence of the consequent
confederate: in an experiment, an actor who pretends to be a subject
confirmation: the observation matches the expectation based on the hypothesis, providing evidence in
favor of the hypothesis
confirmation bias: the tendency we all have to look for, interpret, and recall evidence in ways that
confirm and do not challenge our existing beliefs
conflicts of interest: financial or personal gains that may inappropriately influence scientific research,
results, or publications; scientists are obligated to disclose any potential conflicts of interests
confounding variables: extraneous variables that have varied in an uncontrolled way and influenced
the dependent variable under investigation
consequent: the right side of a conditional claim; the condition that arises from, or is guaranteed by,
the antecedent
contributing cause: a cause that is neither necessary nor sufficient to bring about an effect; also called
a partial cause
control group: a group that is similar to the experimental group but experiences other value(s) of the
independent variable, i.e., does not receive the intervention
correlated variables: the value of one variable raises or lowers the probability of the other variable
taking on some value
correlation coefficient: describes the direction and strength of correlation; a positive or negative sign
indicates positive or negative correlation, and a number between 0 and 1 indicates the strength of
the correlation
correlation strength: how predictable the values of one variable are based on the values of the other
variable
counterexamples: situations you can describe, whether real or imagined, in which the premises of an
argument are true but the conclusion false; shows that a deductive argument is invalid
crisis: a period in which widespread failure of confidence in the ability of a (Kuhnian) paradigm to fulfill
its scientific function
cross-sectional study: a study in which different individuals are measured for some property or con-
dition at a single, given time; helpful in investigating relationships among a number of different
variables
crucial experiment: an experiment that decisively adjudicates between two hypotheses, settling once
and for all which is true
curve fitting: extrapolating from a data set to the expected data for measurements that weren’t actually
taken by fitting a continuous line through a data plot; there are always multiple different lines consis-
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:10.
Glossary 313
deductive inference: an inference in which the relationship between premises and conclusion is pur-
ported to be one of necessitation; in a valid deductive argument, the truth of the premises necessitate
the conclusion; in an invalid deductive argument, they do not
denying the antecedent: using the falsity of an antecedent and the truth of a conditional as grounds
for concluding the consequent is false; a deductively invalid form of inference
denying the consequent: using the falsity of a consequent and the truth of a conditional as grounds for
concluding the antecedent is false; a deductively valid form of inference
dependent variable: a variable that is expected to depend on, or be the effect of, the independent
variable
descriptive claim: a statement about how things are without making any value judgments
descriptive statistics: tools for summarizing, describing, and displaying data in a meaningful way
difference-making: the idea that if the occurrence of one event makes a difference to the occurrence of
a second event, the first is a cause of the second
direct correlation: greater values for one variable increase the probability of greater values for a sec-
ond variable; also referred to as positively correlated
direct variable control: when all extraneous variables are all held at constant values during an
intervention
directed acyclic graphs: graphs in which all the causal relationships are one-directional (none of a
cause’s effects are also among its causes) and do not move in a circle (following a series of cause-
effect relationships will not lead you back to an earlier cause as a later effect)
distal causes: causes that occurred further back in time from the effect and perhaps further away as well
double-blind experiment: an experiment or study in which both scientists and subjects are unaware of
which subjects are in which group (control or experimental) because of randomization
Duhem-Quine problem: the idea scientific hypotheses can never be tested in isolation; instead, scien-
tific hypotheses are tested only against the background of auxiliary assumptions
ecological validity: the degree to which experiment circumstances are representative of real-world
circumstances
effect size: a quantitative, scale-free measure of the strength of a phenomenon
empirical evidence: information gathered through the senses, including with the use of technology to
extend the reach of the senses, that weighs in favor or against some hypothesis
estimation: predicting properties of a population on the basis of a sample
eugenics: the idea that a human population can be improved by controlling breeding; historically linked
to racist and classist science that threatened human liberties and human dignity
evidence: fact or information that makes a difference to what one is justified in believing
evidentialism: the idea that a belief’s justification is determined by how well the belief is supported by
evidence
exemplar: a model that is one of the target systems it is used to represent
Copyright © 2018. Taylor & Francis Group. All rights reserved.
expectations: conjectural claims about observable phenomena based on some hypothesis; expectations
should be true if the hypothesis is true, false if the hypothesis is false
experiment: a method of testing hypotheses that involves intervening on one or more variables of interest
and observing what effects this has
experimental group: a group that receives the intervention to the independent variable or otherwise
experiences the intended value of the independent variable
explanatory knowledge: generating answers to questions about how things work and why things are
the way they are
exploratory experiment: an experiment that does not rely on existing theory and may not be aimed to
test a specific hypothesis; used to suggest novel hypotheses or to assess whether a poorly understood
phenomenon actually exists
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:10.
314 Glossary
external experimental validity: the extent to which experimental results generalize from the experi-
mental conditions to other conditions—especially to the phenomena the experiment is supposed to
yield knowledge about
extraneous variables: other variables besides the independent variable that influence the value of the
dependent variable; if uncontrolled, these may become confounding variables
fair variable: a random variable that has independent outcomes and is unbiased, that is, its outcomes
are all equally likely
faithfulness: the requirement that probabilistically independent variables are not directly causally
related; an assumption of causal Bays nets
falsifiable: evidence can be described that, if found, would show the claim to be false; a key feature of
scientific claims
falsificationism: the idea, due to Karl Popper, that scientific reasoning proceeds by attempting to dis-
prove ideas rather than to prove them right
field experiment: an experiment conducted outside of a laboratory, in the experimental subjects’ every-
day environment
frequency distribution: how often a variable takes on each range of values in a data set
frequentist interpretation: the idea that the probability of an outcome is the limit of its relative fre-
quency; an element of classical statistics
full control: creating the conditions such that no variables other than the target independent variable and
the dependent variable change as a result of an intervention
funding bias: when a scientific study is more likely to support the interests of its financial sponsor(s)
gambler’s fallacy: fallacious reasoning from a past variation from the expected frequency of outcomes
that there will be a future variation from the expected frequency in the opposite direction; errantly
supposing statistically dependence of outcomes
Gaussian distribution: see normal distribution
generality: a desirable feature of models; a model’s ability to apply to a greater number of target systems
Hawthorne effect: a confounding variable in experiments involving human participants, where experimen-
tal participants change their behavior, perhaps unconsciously, in response merely to being observed;
see also observer bias
histogram: visual representation of statistical outcomes in which bars of different heights are used to
represent the frequency of different values of a continuous variable
hypothesis: a conjectural statement based on limited data; a guess about what the world is like, which
is not (yet) backed by sufficient, or perhaps any, evidence
hypothetico-deductive method: a method of hypothesis-testing; an expectation is deductively inferred
from a hypothesis and compared with an observation; violation of the expectation deductively refutes
the hypothesis, while a match with the expectation non-deductively boosts support for the hypothesis
idealization: assumption made without regard for whether it is true, often with full knowledge that it is
Copyright © 2018. Taylor & Francis Group. All rights reserved.
false
illusion of explanatory depth: believing that one understands the world more clearly and in greater
detail than actually is the case
illusion of understanding: a lack of genuine understanding of some topic linked to a lack of apprecia-
tion for the depth of one’s ignorance about the topic
independent outcomes: the probability of the outcome of one trial is not conditional on the outcomes
of any other trials; e.g., numbers rolled on two different dice rolls are independent from one another
independent variable: a variable that is changed or observed at different values in order to investigate
the effect of the change
indirect correlation: greater values for one variable increase the probability of smaller values for a
second variable; negatively correlated
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:10.
Glossary 315
indirect variable control: causing the influence of extraneous variables to vary in a way that is inde-
pendent from the value of the independent variable
inductive: an inferential relationship from premises to conclusion that is one of probability not necessity
inductive generalization: inference to a general conclusion about the properties of a class of objects
based on the observation of some number of objects in the same class
inductive projection: inference to a conclusion about the feature of some object that has not been
observed based on the observation that some objects of the same kind have that feature
inductive strength: the probabilistic extent to which the conclusion of an inductive inference is true
given that its premises are all true.
inference: a logical transition from one thought to another that can be characterized in terms of abstract
rules
inference to the best explanation: see abductive inference
inferential statistics: using statistical reasoning to draw broader conclusions on the basis of limited data
informal fallacies: inference patterns that involve a problem with the content of an inference; a deduc-
tive argument that commits an informal fallacy may be valid, but it will not be sound
instruments: technological tools or other kinds of apparatus used in experiments
intelligent design: the idea that life forms are so complex that they couldn’t possibly have come about
without the help of an intelligent designer (such as the Judeo-Christian God)
internal experimental validity: the degree to which scientists can draw accurate conclusions about
the relationship between the independent and dependent variables
intervention: a direct manipulation of the value of the independent variable
isomorphism: one idea of the relationship a model bears to its target system(s); a one-to-one correspon-
dence between each part or feature of the model and of the target
joint method of agreement and difference: one of Mill’s methods; considering cases where the
suspected effect occurs to see what they have in common (method of agreement), as well as consider-
ing cases where the suspected effect does not occur to see what those have in common (method of
difference)
joint probability distribution: the probability distribution for each of a set of variables, taking into
account the probability of the other variables in the set
justification: reasons for belief; one requirement for a belief to qualify as knowledge
knowledge: traditionally, a belief that is at least both true and sufficiently justified
laboratory experiments: experiments conducted in a laboratory, giving scientists control over inter-
ventions performed and direct and indirect control of many extraneous variables
likelihood: often used as a synonym for ‘probability’, or to refer to the probability of observed data given
the truth of a specific hypothesis. More precisely, a likelihood is a function of the parameters of a
statistical model given observed data
logic: the study of the rules and patterns of good and bad inference
Copyright © 2018. Taylor & Francis Group. All rights reserved.
longitudinal study: a study in which the same subjects are measured (for some property or condition)
repeatedly over a period of time, sometimes many years, allowing the researchers to track a subject’s
change
Markov condition: the requirement that the probability of causal variables conditional on their parent
causes are probabilistically independent of all their other ancestors; an assumption of causal Bays nets
Matilda effect: the bias against recognizing the achievements of women scientists, whose work is often
uncredited or else attributed to their male colleagues instead
material conditional: a conditional statement (with an antecedent and consequent) that is false only if
the antecedent can be true while the consequent is false
mathematical models: mathematical formulas that relate variables, parameters, and constants to one
another to represent one or more target systems
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:10.
316 Glossary
mean: a measure of the central tendency of a data set; the sum of all values in the data set divided by the
number of instances; also called average
measurement error: the difference between a measured value quantity and its true value
mechanisms: complex hierarchical systems consisting of component parts and operations that are orga-
nized so as to causally produce a phenomenon
mechanistic conception of explanation: the view that phenomena are explained by showing how
they are mechanistic activities composed of the organized operations of those mechanisms’ component
parts
median: a measure of the central tendency of a data set; the middle value in a distribution when the
values are arranged from the lowest to the highest
method of agreement: one of Mill’s methods; considering cases where the suspected effect occurs to
see what they have in common
method of concomitant variations: one of Mill’s methods; using the observation that the value of
one variable changes in tandem with changes to the value of a second variable to infer that the two
are causally related
method of difference: one of Mill’s methods; considering cases where the suspected effect does not
occur to see what those have in common
method of residues: one of Mill’s methods; comparing cases in which a set of causes brings about a
set of effects to cases in which some of those causes bring about some of those effects and inferring,
on that basis, that the absent cause(s) are responsible for the absent effect(s)
methodological naturalism: the idea that scientific theories shouldn’t postulate supernatural or other
spooky kinds of entities
mode: a measure of the central tendency of a data set; the most frequent value in the data set
modularity: the assumption that interventions on some causal relationship will not change other causal
relationships in the system
modus ponens: see affirming the antecedent
modus tollens: see denying the consequent
monotonic: the addition of new information never invalidates the inference
multiplication rule: the probability that two independent events both occur is the result of multiplying
their individual probabilities
mutually exclusive outcomes: a set of outcomes, only one of which can occur in a given trial; e.g.,
rolling a one and a three are mutually exclusive outcomes
natural experiments: interventions on independent variables occur naturally without experimenters
influencing the system
natural explanations: explanations that invoke features of the world to account for the phenomena
under investigation
natural phenomena: objects, events, regularities, and processes that are sufficiently uniform to make
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:10.
Glossary 317
normal distribution: a symmetric, unimodal distribution with the most common values at the middle
and decreasingly common outcomes as the values get higher and lower; also called a bell curve or
Gaussian distribution
normal science: the most common (Kuhnian) phase of science, within which scientific research is stable
and based on widespread agreement about basic assumptions; this follows either pre-paradigm sci-
ence or scientific revolution
normative claim: a statement about how things ought to be, which might or might not correspond to
how they in fact are
null hypothesis: a reasonable default assumption about how the world is, which is not a bold and risky
conjecture; in statistical hypothesis-testing, the null hypothesis generally states that the variables in
question are statistically independent
observable: capable of being perceived or detected with the use of one’s senses under appropriate
circumstances; observability is relative to specific epistemic communities, their scientific theories, and
technical apparatus
observation: any information gained from your senses—not only what you see but also what you hear,
smell, touch, and sense in any other way you can experience the world
observational study: collecting and analyzing data without performing interventions or, often, aiming
to control extraneous variables
observer bias: See Hawthorne effect
observer-expectancy effect: when a scientist’s expectations lead her to unconsciously influence the
behavior of experimental subjects
ontological naturalism: the idea that no supernatural entities exist
openness to falsification: the willingness to abandon any claim or theory when the preponderance of
evidence suggests it’s wrong; a key feature of science
operational definition: a specification of the conditions when some concept applies, enabling mea-
surement or other kinds of precision
outcome space: the set of all values a random variable can take on, also called sample space
outliers: measured values for a variable that are notably different from the other values in the data set
p-value: the probability of the observed data assuming the null hypothesis is true
paradigm: according to Kuhn, a way of practicing science; provides scientists with a stock of assump-
tions about the world, concepts and symbols for effective communication, methods for gathering and
analyzing data, and other habits of research and reasoning
parameter: a quantity whose value can change in different applications of a mathematical equation but
that only has a single value in any one application of the equation
pattern conception of explanation: the idea that a phenomenon is explained by fitting it into a more
general framework of laws and principles
perfectly controlled experiment: an experiment in which all variables are controlled except for the
Copyright © 2018. Taylor & Francis Group. All rights reserved.
independent variable, an intervention is performed on the independent variable, and the effects on
the dependent variable are measured; no confounding variables are possible
Persian Golden Age: period of rapid intellectual achievements in science, philosophy, literature, and
art spanning from Central Asia to the Arabian Peninsula between the 8th and 13th centuries, which
was the core part of the so-called Islamic Golden Age more generally; arguably the most important
period in the development of science prior to the Scientific Revolution
phenomena: things or processes as we experience them; appearances of objects, events, regularities,
or processes that exist or occur
philosophy of science: the investigation of science, focused especially on questions of what science
should be like in order to be a trustworthy route to knowledge and to achieve the other ends we want
it to have, such as usefulness to society
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:10.
318 Glossary
physical constants: quantities that are universal and unchanging over time
physical process: an account of causation in which causation consists in some continuous physical
process, such as energy transfer
pie chart: visual representation of statistical outcomes in which a circle is divided into slices that used to
show the relative frequency of the outcome space of different values for some variable
placebo effect: when an experimental subject’s expectations lead to the outcome the subject expects;
this can be an extraneous variable
plagiarism: stealing somebody else’s ideas, data, or words by presenting them as one’s own work and
failing to give appropriate credit
population: a collection of entities that are grouped together, often in virtue of exhibiting common
features
population validity: the degree to which experimental entities are representative of the broader class of
entities of interest; for experiments with human subjects, this is the broader population
positively correlated: greater values for one variable increase the probability of greater values for a
second variable; also known as direct correlation
post hoc, ergo propter hoc: the mistaken conclusion that one event causes another simply because
the events occur in succession close to each other; translated from Latin, ‘after this, therefore because
of this’
posterior probability: the probability of a hypothesis conditional on an observation that has been
made; Bayes’s theorem can be used to calculate this
power: the probability that the test will reject a false null hypothesis
precision: the extent to which a model finely specifies features of a target system
premises: statements that provide support for some conclusion; the starting points for an inference
pre-paradigmatic: the earliest phase of science according to Kuhn; characterized by the existence of
different schools of thought that debate very basic assumptions, including research methods and the
nature and significance of data
prior probability: the rational degree of belief in a hypothesis before making a given observation
probability distribution: how often a variable is expected to take on each of a range of values
probability theory: a mathematical theory developed to deal with random variables, or outcomes that
are individually unpredictable but that behave in predictable ways over many occurrences
problem of induction: the idea that inductive inference cannot be logically justified, since any possible
justification would need to employ inductive reasoning and would thus be circular
prospective study: a study in which researchers identify a group of subjects with some property or
condition and track their development forward in time
proximate causes: causes that occur closely in time and perhaps in space to their effect
pseudoscience: a non-scientific activity that masquerades as science, but is not; often designed to
deceive people into believing it has scientific legitimacy
Copyright © 2018. Taylor & Francis Group. All rights reserved.
publication bias: the tendency to publish surprising, new results more often than negative results, repli-
cation studies, and exploratory work
qualitative data: information that is non-numerical and without some other standard that makes it easily
comparable, such as diary accounts, unstructured interviews, and observations of animal behavior
qualitative variables: variables with values that are not numerical but descriptive, such as the variable
sport, with the values basketball, hockey, and so on.
quantitative analysis: the use of mathematical techniques to measure or investigate phenomena
quantitative data: data that is easily comparable, often in numerical form, such as numbers, vectors,
or indices
quantitative variables: variables with numerical values, such as height or percent correct on an exam
random sampling: the individuals composing the sample are selected randomly from the population
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:10.
Glossary 319
random variables: variables that take on different values that are individually unpredictable but predict-
able in the aggregate
randomization: randomly assigning experimental entities to experimental and control groups
rational degree of belief: the interpretation of posterior probability in Bayesian statistics; believing a
hypothesis to the same degree as the probability it is true given the observations that have been made
range: a measure of the variability; the difference between the smallest and largest values in a data set
reasoning: psychological processes leading to beliefs; could be inferential or not
refutation: one consequence possible on the H-D method; the observation contradicts the expectation
deductively inferred from the hypothesis; the hypothesis is deductively proven to be false
regression analysis: finding the best-fitting line through the points on a scatterplot
regression to the mean: the tendency for outlier values to relate to less extreme values in the future
or past
relative frequency distributions: frequency distributions that record proportions of occurrences of
each value of a variable rather than absolute numbers of occurrences
replication: performing an experiment again—often with some modification to its design—in order to
check whether the result remains the same
representative: the experimental entities studied do not vary in any systematic way from the general
population
retrospective study: a study in which researchers first identify a group of subjects who have the
target property or condition, and then investigate their past in an attempt to isolate the cause of
the condition
robustness: a desirable feature of models; a measure of insensitivity to features that differ from the target
in a given model
robustness analysis: analyzing multiple models or different versions of a model to determine whether
and to what extent their results are consistent
sample: a subset of a population about which data are gathered
sample data: data about individuals in a sample
sample mean: the most likely average value of the trait of interest in a population
sample size: the number of individual sources of data in a study, often the number of experimental enti-
ties or subjects
sample space: see outcome space
sample standard deviation: an estimate of the spread of the probability distribution for the random
variable; s = √[∑(value−mean)2 / (n−1)]
sampling error: incorrect conclusion due to a non-representative sample
scale model: a concrete physical object that serves as a representation of one or more target
systems
scatterplot: visual representation of statistical outcomes in which the values of one variable are plotted
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:10.
320 Glossary
scientific theory: a large-scale system of ideas about a natural phenomenon supported by a variety of
evidence
self-explanation effect: the observation that generating explanations to oneself or to others can facili-
tate the integration of new information into existing bodies of knowledge and can lead to deeper
understanding
set: a grouping of objects (called elements)
significance level: how improbable, given the null hypothesis, an experimental result must be to warrant
rejecting the null hypothesis
Simpson’s paradox: a correlation between two events that disappears, or is reversed, when data are
grouped in a different way
68–95–99.7 rule: the percentages of values that lie within one, two, and three standard deviations around
the mean of a normal distribution
soundness: a property that deductive arguments have when they are both valid and have all true
premises
spurious correlations: two events are correlated but aren’t causally related in any obvious way
standard deviation: the square root of the variance; for a population, s = √[∑(value − mean)2 / n]
standard error: the standard deviation of the sampling distribution of the mean; SE = s/√(sample size)
statistical description: summarizing, describing, and displaying data in a meaningful way
statistically independent: two events for which the occurrence of one does not increase or decrease
the probability of the other; that is, when Pr(Y|X) = Pr(Y) and Pr(X|Y) = Pr(X)
statistically significant: data with a p-value below the chosen significance level; grounds for rejecting
the null hypothesis
strawman fallacy: an informal fallacy; caricaturing an argument in order to criticize the caricature
rather than the actual view
subjects: humans, non-human animals, or inanimate objects in an experiment or non-experimental study;
also called experimental entities
subtraction rule: the probability that some outcome doesn’t occur is the result of subtracting the prob-
ability of that outcome from the total probability (Pr = 1)
sufficient causes: causes that always bring about the effect
sufficient condition: a condition that, if met, guarantees a specified outcome will occur
super-observational: enhancement of our powers of observation far beyond what they ordinarily
include through the use of tools or other implements
target system: a selected part of the structure of world, about which scientists aim to gain knowledge;
the phenomenon intended to be represented by a model
theorems: statements deductively inferred from a set of axioms
theoretical claims: claims made about entities, properties, or occurrences that are not directly observable
thought experiments: devices of the imagination that scientists can use to learn about possible effects
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:10.
Glossary 321
valid: a property of deductive inference in which the truth of the premises logically guarantees or neces-
sitates the truth of the conclusion
value of a variable: the particular state or quantity that a variable has taken on in some instance
value-free ideal: the idea that good science should not rely on moral and political beliefs in assessing
the evidence for scientific models, theories, or hypotheses
variability: the distribution of values in a data set; measures of variability like standard deviation and
variance indicate how spread out the data set is; also called spread
variable: anything that can vary, change, or occur in different states and that can be measured
variance: a measure of how far a set of data is spread out from the average value of the data set; the
average of the squared differences of the values of a random variable from its mean
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:10.
References
Ahmed, M., Anchukaitis, K., Asrat, A., Borgaonkar, H., Braida, M., Buckley, B., …, & Curran, M. (2013).
Continental-scale temperature variability during the past two millennia. Nature Geoscience, 6, 339–346.
Al-Khalili, J. (2015). In retrospect: Book of optics. Nature, 518(7538), 164–165.
American Association for the Advancement of Science. (2001). Designs for science literacy. New York:
Oxford University Press.
Anderegg, W. R. L., Prall, J. W., Harold, J., & Schneider, S. H. (2010). Expert credibility in climate change.
Proceedings of the National Academy of Sciences, 107, 12107–12110.
Arrhenius, S. (1908). Worlds in the making: The evolution of the universe. London: Harper & Brothers.
Axelrod, R. (1984). The evolution of cooperation. New York: Basic Books.
Bao, X., & Eaton, D. W. (2016). Fault activation by hydraulic fracturing in western Canada. Science, 354,
1406–140.
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., . . . &
Cesarini, D. (2018). Redefine statistical significance. Nature Human Behaviour, 2, 6–10.
Blackawton, P. S., Airzee, S., Allen, A., Baker, S., Berrow, A., Blair, C., . . . & Hackford, C. (2011).
Blackawton bees. Biology Letters, 7, 168–172.
Broca, P. (1861). Remarques sur le siège de la faculté du langage articulé, suivies d’une observation
d’aphémie (perte de la parole). Bulletins de la Société d’anatomie, 2e serie, 6, 330–357.
Callaway, E. (2017). Oldest Homo sapiens fossil claim rewrites our species’ history. Nature News, 8 June
2017.
Callendar, G. S. (1939). The composition of the atmosphere through the ages. Meteorological Magazine,
74(878), 33–39.
Camerer, C. F. (1997). Taxi drivers and beauty contests. Engineering and Science, 60(1), 10–19.
Camerer, C. F., Babcock, L., Loewenstein, G., & Thaler, R. (1997). Labor supply of New York City cabdriv-
ers: One day at a time. Quarterly Journal of Economics, 112, 407–441.
Capra, F. (1975). The Tao of physics. Boston: Shambhala Publications.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Cartwright, N. (1989). Nature’s capacities and their measurement. Oxford: Oxford University Press.
Chatrchyan, S., Khachatryan, V., Sirunyan, A. M., Tumasyan, A., Adam, W., Aguilo, E., . . . & Friedl,
M. (2012). Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC.
Physics Letters B, 716(1), 30–61.
Chattopadhyay, R., & Duflo, E. (2004). Women as policy makers: Evidence from a randomized policy
experiment in India. Econometrica, 72(5), 1409–1443.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003.
Cumming, G. (2013). Understanding the new statistics: Effect sizes, confidence intervals, and meta-
analysis. New York: Routledge.
Darley, J. M., & Latane, B. (1968). Bystander intervention in emergencies: Diffusion of responsibility.
Journal of Personality and Social Psychology, 8, 377–383.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:24.
References 323
Darwin, C. (1872). On the origin of species by means of natural selection, or the preservation of favoured
races in the struggle for life, 6th edition. London: John Murray.
Dockery, D. W., Pope, C. A., Xu, X., Spengler, J. D., Ware, J. H., Fay, M. E., . . . & Speizer, F. E. (1993).
An association between air pollution and mortality in six US cities. New England Journal of Medicine,
329(24), 1753–1759.
Donovan, A. (1993). Antoine Lavoisier: Science, administration, and revolution. Oxford: Blackwell.
Duarte, J. L., Crawford, J. T., Stern, C., Haidt, J., Jussim, L., & Tetlock, P. E. (2015). Political diversity will
improve social psychological science. Behavioral and Brain Sciences, 38, 1–13.
Dyson, F. W., Eddington, A. S., & Davidson, C. R. (1920). A determination of the deflection of light by the
sun’s gravitational field, from observations made at the solar eclipse of May 29, 1919. Philosophical
Transactions of the Royal Society A, 220, 571–581.
Eberhardt, F. (2009). Introduction to the epistemology of causation. The Philosophy Compass, 4(6),
913–925.
Eddington, Sir Arthur. (1935/2012). New pathways in science: messenger lectures (1934). Cambridge:
Cambridge University Press.
Elliott, K. C. (2017). A tapestry of values: An introduction to values in science. Oxford: Oxford University
Press.
Enten, H. (2017). What Harry got wrong in 2016. FiveThirtyEight. Retrieved from http://fivethirtyeight.
com/features/what-harry-got-wrong-in-2016/
Fisher, R. A. (1956). Mathematics of a lady tasting tea. In J. R. Newman (Ed.), The world of mathematics
(pp. 1512–1521). New York: Simon & Schuster. (Original work published in Fisher, R. A. (1935). The
design of experiments. Edinburgh: Oliver & Boyd).
Fizeau, H. (1849). Sur une expérience relative à la vitesse de propagation de la lumière. Comptes rendus,
29, 90–92.
Floridi, L. (2012). Big data and their epistemological challenge. Philosophy and Technology, 25, 435–437.
Galton, F. (1889). Natural inheritance. London: Macmillan.
Gelman, A., & Hennig, C. (2017). Beyond subjective and objective in statistics (with discussion). Journal
of the Royal Statistical Society, 180(4), 967–1033.
Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself
statistically significant. The American Statistician, 60(4), 328–331.
Gillham, N. W. (2001). Sir Francis Galton and the birth of eugenics. Annual Review of Genetics, 35,
83–101.
Glymour, C. (2007). When is a brain like the planet? Philosophy of Science, 74(3), 330–347.
Gopnik, A. (1998). Explanation as orgasm. Minds and Machines, 8(1), 101–118.
Guéguen, N., Jacob, C., Le Guellec, H., Morineau, T., & Lourel, M. (2008). Sound level of environ-
mental music and drinking behavior: A field experiment with beer drinkers. Alcoholism: Clinical and
Experimental Research, 32(10), 1795–1798.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Güth, W., Schmittberger, R., & Schwarze, B. (1982). An experimental analysis of ultimatum bargaining.
Journal of Economic Behavior and Organization, 3, 367–388.
Haddad, D., Seifert, F., Chao, L. S., Possolo, A., Newell, D. B., Pratt, J. R., . . . & Schlamminger, S. (2017).
Measurement of the Planck constant at the National Institute of Standards and Technology from 2015
to 2017. Metrologia, 54, 633–641 (arXiv: 1708.02473).
Harlow, J. M. (1848). Passage of an iron rod through the head. Boston Medical and Surgical Journal, 39,
389–393.
Harlow, J. M. (1868). Recovery from the passage of an iron bar through the head. Publications of the
Massachusetts Medical Society, 2, 327–347.
Hempel, C. G. (1966). Philosophy of natural science. Englewood Cliffs: Prentice-Hall.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:24.
324 References
Herschel, W. (1801). Observations tending to investigate the nature of the sun, in order to find the causes
or symptoms of its variable emission of light and heat: With remarks on the use that may possibly be
drawn from solar observations. Philosophical Transactions of the Royal Society of London, 91, 265–318.
Hesslow, G. (1976). Two notes on the probabilistic approach to causality. Philosophy of Science, 43(2),
290–292.
Hodges, J., & Tizard, B. (1989). Social and family relationships of ex-institutional adolescents. Journal of
Child Psychology and Psychiatry, 30, 77–97.
Hubble, E. (1929). A relation between distance and radial velocity among extra-galactic nebulae.
Proceedings of the National Academy of Sciences, 15(3), 168–173.
Hublin, J. J., Ben-Ncer, A., Bailey, S. E., Freidline, S. E., Neubauer, S., Skinner, M. M., . . . & Gunz, P.
(2017). New fossils from Jebel Irhoud, Morocco and the pan-African origin of Homo sapiens. Nature,
546(7657), 289–292.
Hume, D. (1738/2007). A treatise of human nature (D. F. Norton & M. J. Norton, eds.). Oxford: Clarendon
Press.
Hume, D. (1748/1999). An enquiry concerning human understanding (T. L. Beauchamp, ed.). Oxford and
New York, NY: Oxford University Press.
Huygens, C. (1690/1962). Treatise on light (S. P. Thompson, trans.). New York: Dover Publications.
Intergovernmental Panel on Climate Change (IPCC). (2014). Climate change 2014: Synthesis report.
Retrieved from www.ipcc.ch/news_and_events/docs/ar5/ar5_syr_headlines_en.pdf
Ioannidis, J. P. (2005). Why most published research findings are false. PLoS Med, 2(8), e124.
Kahneman, D. (2011). Thinking, fast and slow. New York: Farrar, Straus, & Giroux.
Keller, E. F. (1983). A feeling for the organism: The life and work of Barbara McClintock. San Francisco:
W.H. Freeman and Co.
Khang, Y.-H. (2013). Two Koreas, war and health. International Journal of Epidemiology, 42, 925–929.
Knight, J. (2002). Sexual stereotypes. Nature, 415, 254–256.
Korb, K., & Nicholson, A. (2010). Bayesian artificial intelligence (2nd ed.). Boca Raton: Chapman & Hall/
CRC Press.
Kragh, H., & Smith, R. W. (2003). Who discovered the expanding universe? History of Science, 41(2), 141–162.
Kuhn, T. (1962/1970). The structure of scientific revolutions. Chicago: University of Chicago Press (1970,
2nd ed., with postscript).
Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., . . . Zwaan, R. A. (2018).
Justify your alpha. Nature Human Behavior, 2, 168–171.
Lawson, R. (2006). The science of cycology: Failures to understand how everyday objects work. Memory
& Cognition, 34(8), 1667–1675.
Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: Traps in big data
analysis. Science, 343(6176), 1203–1205.
Le Cam, L. (1986). The central limit theorem around 1935. Statistical Science, 78–91.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Lee, T. M., Markowitz, E. M., Howe, P. D., Ko, C. Y., & Leiserowitz, A. A. (2015). Predictors of public
climate change awareness and risk perception around the world. Nature Climate Change, 5(11),
1014–1020.
Levins, R. (1966). The strategy of model building in population biology. American Scientist, 54, 421–431.
Levitt, S., & Dubner, S. J. (2005). Freakonomics: A rogue economist explores the hidden side of everything.
New York: William Morrow.
Lindley, D. V. (1993). The analysis of experimental data: The appreciation of tea and wine. Teaching
Statistics, 15, 22–25.
Lord, C. G., Ross, L., & Lepper, M. R. (1979). Biased assimilation and attitude polarization: The effects
of prior theories on subsequently considered evidence. Journal of Personality and Social Psychology,
37(11), 2098–209.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:24.
References 325
Manson, M. (1893). Geological and solar climates: Their causes and variations. San Francisco: G.
Spaulding & Co.
McMullin, E. (1985). Galilean idealization. Studies in the History and Philosophy of Science, 16, 247–273.
Mendel, G. (1865/1996). Experiments in plant hybridization (W. Bateson, Trans.). Electronic scholarly
publishing project. (Original work published as Versuche über Plflanzenhybriden. Verhandlungen des
naturforschenden Vereines in Brünn, Bd. IV für das Jahr 1865, Abhandlungen, 3–47). Retrieved from
www.esp.org/foundations/genetics/classical/gm-65.pdf
Michotte, A. (1962). The perception of causality. Andover, MA: Methuen.
Milgram, S. (1963). Behavioral study of obedience. Journal of Abnormal and Social Psychology, 67(4),
371–378.
Mill, J. S. (1893). A system of logic, ratiocinative and inductive: Being a connected view of the principles
of evidence and the methods of scientific investigation. New York: Harper & Brothers.
Morgan, M., & Boumans, M. J. (2004). Secrets hidden by two-dimensionality: The economy as a hydraulic
machine. In S. de Chadarevian & N. Hopwood (eds.), Model: The third dimension of science
(pp. 369–401). Stanford: Stanford University Press.
National Research Council. (1979). Carbon dioxide and climate: A scientific assessment. Washington DC:
National Academies Press.
Newton, I. (1671/1672). A letter of Mr. Isaac Newton, Professor of the Mathematicks in the University
of Cambridge; containing his new theory about light and colors: Sent by the author to the publisher
from Cambridge, Febr. 6. 1671/72; In order to be communicated to the R. Society. Philosophical
Transactions, 6, 3075–3087.
Newton, I. (1704/1998). Opticks: Or, a treatise of the reflexions, refractions, inflexions and colours of
light: Also two treatises of the species and magnitude of curvilinear figures. Commentary by Nicholas
Humez (Octavo ed.). Palo Alto: Octavo.
Oreskes, N. (2004). The scientific consensus on climate change. Science, 306(5702), 1686.
Oreskes, N., & Conway, E. (2010). Merchants of doubt. New York: Bloomsbury.
Parsons, H. M. (1974). What happened at Hawthorne? Science, 183(4128), 922–932.
Pashler, H., & Wagenmakers, E. J. (2012). Editors’ introduction to the special section on replicability in
psychological science: A crisis of confidence?. Perspectives on Psychological Science, 7(6), 528–530.
Peirce, C. S. (1903/1904) (1931–1936). The collected papers (Vols. 1–6, C. Hartshorne & P. Weiss,
eds.). Cambridge: Harvard University Press.
Pfungst, O. (1911). Clever Hans (The horse of Mr. von Osten): A contribution to experimental animal and
human psychology (C. L. Rahn, Trans.). New York: Henry Holt (Originally published in German, 1907).
Popper, K. (1963). Conjectures and refutations: The growth of scientific knowledge. London: Routledge
and Kegan Paul.Pukelsheim, F. (1994). The three sigma rule. The American Statistician, 48(2), 88–91.
Rapoport, A., Seale, D. A., & Colman, A. M. (2015). Is tit-for-tat the answer? On the conclusions drawn
from Axelrod’s tournaments. PLoS One, 10(7), e0134128.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:24.
326 References
Schoenfeld, J. D., & Ioannidis, J. P. (2013). Is everything we eat associated with cancer? A systematic
cookbook review. The American Journal of Clinical Nutrition, 97(1), 127–134.
Semmelweis, I. (1861/1983). The etiology, the concept and the prophylaxis of childbed fever (K. C.
Carter, Trans.). Madison: University of Wisconsin Press.
Simon, V. (2005). Wanted: Women in clinical trials. Science, 308(5728), 1517–1517.
Snow, J. (1855). On the mode of communication of cholera. London: John Churchill.
Squire, P. (1988). Why the 1936 Literary Digest poll failed. Public Opinion Quarterly, 52(1), 125–133.
Stanford, P. K. (2015 online first). Unconceived alternatives and conservatism in science: The impact of
professionalization, peer-review, and big science. Synthese, 1–18.
Stanziani, A. (2008). Defining natural product between public health and business, 17th to 21st centuries.
Appetite, 51, 15–17.
Teigen, K. H. (2002). One hundred years of laws in psychology. The American Journal of Psychology,
115, 103–118.
Thorgeirsson, T. E., Gudbjartsson, D. F., Surakka, I., Vink, J. M., Amin, N., Geller, F., . . . & Gieger,
C. (2010). Sequence variants at CHRNB3–CHRNA6 and CYP2A6 affect smoking behavior. Nature
Genetics, 42(5), 448–453.
Ullman, A. (2007). Pasteur-Koch. Distinctive ways of thinking about infectious diseases. Microbe, 2,
383–387.
United States Environmental Protection Agency. (2015). High lead levels in flint, Michigan. Retrieved from
www.epa.gov/sites/production/files/2015-11/documents/transmittal_of_final_redacted_report_to_
mdeq.pdf
Volterra, V. (1928). Variations and fluctuations of the number of individuals in animal species living
together. Journal du Conseil. Conseil Permanent International pour l’Exploration de la Mer, 3, 3–51.
Walton, D. (1989/2008). Informal logic: A pragmatic approach. Cambridge: Cambridge University Press.
Watson, J. D. (1968). The double helix. New York: Atheneum Press.
Weart, S. (2014). The public and climate change (since 1980). Retrieved from https://history.aip.org/
climate/public2.htm
Wegener, A. (1929/1966). The origin of continents and oceans. New York: Dover Publications.
Weisberg, D. S., Keil, F. C., Goodstein, J., Rawson, E., & Gray, J. R. (2008). The seductive allure of neu-
roscience explanations. Journal of Cognitive Neuroscience, 20(3), 470–477.
Weisberg, M. (2013). Simulation and similarity: Using models to understand the world. Oxford: Oxford
University Press.
Woodruff, G., & Premack, D. (1979). Intentional communication in the chimpanzee: The development of
deception. Cognition, 7(4), 333–362.
Woodward, J. (2016). The problem of variable choice. Synthese, 193(4), 1047–1072.
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:24.
Index
Page numbers in italics indicate figures and in bold indicate tables on the corresponding pages.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:38.
328 Index
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:38.
Index 329
Darwin, Charles 137, 197, 289, 294, 297, 301 DNA (deoxyribonucleic acid) 11, 21, 94, 102,
data 41; big 84–85, 104; collection and analysis 295, 298–299, 299; analogical models of
of 56–58; curve-fitting 104, 105; models 108; scale model of 107, 107
of 103–105, 104; overfitting 104, 106; Doppler, Christian 127
qualitative 57; quantitative 57, 183–184; Doppler effect 127
questionnaire 57–58; sample 169; double-blind experiments 69
visualization of 84 drinking water 8, 76, 91, 150, 303
data cleansing 104 Dubner, Stephen 83
data dredging 306 Du Châtelet, Émilie 53, 55
deception 69–70 Duflo, Esther 76
deductive arguments 129 Duhem, Pierre 147
deductive reasoning: on age of the universe Duhem-Quine problem 147, 156
125–128, 126; in case of puerperal fever DuPont 36
142–146, 143–144, 145; conditional dyspnoea 268–269, 269
statements in 130, 130–132; Flint,
Michigan, water crisis and 150–151, 151; Early Childhood Longitudinal Study 83
in hypothesis-testing 141–148; hypothetico- earthquakes 158, 242, 244–246, 251, 287; and
deductive (H-D) method 141–142; inference, fracking 242, 244–247, 251, 260
argument and 128–129 ecological validity 75
defining science: by its history 18–21, 19; by its economics 17, 20, 30, 71, 76, 266, 275,
methods 23–26, 31–32; by its subject matter 280–281
21–23; tricky work of 16–17 Eddington, Arthur 64, 65, 146
denying the antecedent 135 Edwards, Marc 151, 162
denying the consequent 133 Edwards v. Aguillard 151, 162
dependent variables 49–50, 66 effect size 230
Descartes, René 24 Einstein, Albert 64, 65, 145–146, 289, 290
descriptive statistics 169–170; correlation electromagnetic radiation 61
in 195–201, 196, 197, 198–199; Elements of Geometry 147–148
generalizing from 207–217; measures of Elliott, Kevin 302–304
central tendency in 187–191, 188–189, empirical evidence 23–25
190, 192; measures of variability in 191–195, empiricism 24
192, 193, 195; variables and their values in Environmental Protection Agency (EPA) 151
182–184; visual representation of values of errors, sampling 216–217
variables in 184–187, 185–187 estimating from samples 212–215, 213, 214
de Vlamingh, Willem 154 Ethyl Corporation 35–36, 298
Dianetics 136 Euclid 147–148, 290, 295
difference-making 249–250; intervention and eugenics 201, 301
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:38.
330 Index
Freon 36 18–21, 19
frequency distributions 208–212, 209, 211 Homo sapiens 161, 162
frequentist interpretation 233 Hubbard, L. Ron 136
Freud, Sigmund 63 Hubble, Edwin 126, 126–128, 131, 159
fruit flies (Drosophila melanogaster) 95–96 human reasoning, flaws in 33–34, 34
functional magnetic resonance imaging (fMRI) 60 Hume, David 24, 155, 246, 249, 255
funding bias 306 Huygens, Christian 159
hypotheses 39–40; alternative 223; deductive
Gage, Phineas 80–82, 81 reasoning in testing 141–148; null 223–224,
Galilei, Galileo 20, 86–87 226, 228, 229, 260; testing causal 255–260;
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:38.
Index 331
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:38.
332 Index
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:38.
Index 333
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:38.
334 Index
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:38.
Index 335
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:38.
336 Index
thought experiments 85–86 values: shaping science 301–304, 302; trust and
time, studies extending over 83–84 objectivity 304–307; value-free ideal and
Tit-for-Tat 114, 116 300–301
Tolman, Edward 93 variability 188; measures of 191–195, 192,
total probability 173 193, 195
total trihalomethane (TTHM) 150 variables 48–51, 51, 66–67; choices in
tractability of models 119, 121–122 76–78; controlling 67–68; correlated 196;
Trinity College, Cambridge 56 definition of 183; in descriptive statistics
Trump, Donald 207, 215–216, 304 182–202; qualitative 183; quantitative
Trust 11–12, 15–16, 37–38, 38, 42, 60, 100, 183; random 172–174; value of 49,
117, 162, 290, 297, 304–305, 307 182–187, 185–187
Truth 28, 40–41, 58, 132–135, 137, 141–142, variance 192–194
150, 152–154, 158, 161, 163, 233, 249, variation 167
295, 297–298 virus: cowpox 265; ebola 83, 229; human
Turing, Alan 298–299 immunodeficiency virus (HIV) 11; human
Tuskegee Syphilis Experiment 298, 301 papilloma virus (HPV) 1–2; influenza
Tversky, Amos 32 273; smallpox (variola) 153–154, 265;
type I error 229, 233 Zika 301
type II error 229, 233 visualization, data 84
visual representation of values of variables 184–187,
underdetermination 58, 59 185–187
understanding 276–279; definition of 277; Vitruvius 93
illusion of 12–13; illusion of explanatory Volterra, Vito 98
depth and 279–280 von Osten, Wilhelm 33–34, 34
unification conception of explanation 282
uniform distribution 187, 188–189 Wallace, Alfred Russel 32, 294
uniformity of nature 155–156 water crisis, Flint, Michigan 150–151, 151, 153,
UC Berkeley 253 162, 163
US Army Corps of Engineers 91, 93 Watson, James 102, 107, 107, 108, 295, 299
US Dairy Association (USDA) 248 Wegener, Alfred 156–158
US Public Health Service 298, 301 Wells, Herbert George 167
Western Electric Hawthorne Factory 50, 51, 75
vaccinations 1–2, 28; causal modeling of ‘Women as policy makers’ 76
immunity and 263–266, 264 women in science 298–300, 299
validity: deductive reasoning and 132; ecological Woo-suk, Hwang 35
75; population 75 World War 8, 113, 298
value-free ideal 300–301 World Health Organization (WHO) 153
Copyright © 2018. Taylor & Francis Group. All rights reserved.
Potochnik, Angela, et al. Recipes for Science : An Introduction to Scientific Methods and Reasoning, Taylor & Francis Group, 2018. ProQuest Ebook
Central, http://ebookcentral.proquest.com/lib/purdue/detail.action?docID=5584122.
Created from purdue on 2021-08-29 21:54:38.