Reading Scientific Papers

And other useful mathematical tools for journalists A Mathematician’s Perspective
Rebecca Goldin December 10, 2012 National Press Foundation

Statistical Assessment Service www.stats.org
Jon Entine, Senior Fellow Cynthia Merrick, Intern

Rebecca Goldin, Director of Research Trevor Butterworth, Editor

Statistical Concepts in Scientific Journal Articles
Mean, median, mode Standard deviation Confidence intervals

Orders of magnitude
Confounding factors Percentages Absolute vs. relative risk Scientific methods Causation versus correlation

In the beginning: a press release
Press Releases Don’t Tell the Whole Story. Are designed to get attention by the press Present the results in the rosiest terms possible. Don’t put the results in context

Abstracts Don’t Tell the Whole Story. They don’t answer: How were the subjected recruited? What was the design of the experiment? What methods were used to analyze the data? What are the weaknesses of the conclusions?

Basic Advice for a Journalist with Limited Time and Ideas
• Read as a skeptic at all times. Avoid most •

• • •

conclusions of causality. A lot can be understood by even a cursory read (<10 minutes) of the summary, the abstract, and the conclusion. Avoid the press release. The summary and abstract will tell you the results, but hardly ever hint as to what the limitations are. The conclusion will often tell you some caveats. Look up on PubMed.gov key words and see if other literature has been published on the topic – give other research equal time!

(Easy) Questions to Ask While Reading a Journal Article

How are the people in the study recruited? In particular, would the recruitment method itself bias the results by involving people who might not be “typical” with regard to the thing measured? • Recent example: “Triple P” (Positive Parenting Program) seemed to have clear positive impact for families with children with “conduct disorder.” However, most participants in most studies were self-selecting – they were more likely to be motivated, more likely to be literate, more likely to be confident to volunteer their family (leading possibly to higher levels of compliance to treatment). • For example, are women who have a family history of breast cancer more likely to get a mammogram? If so, then rates of cancer detection among women getting

(Easy) Questions to Ask While Reading a Journal Article
• How is data collected? Is there room for bias (this is

especially important in survey, opinion, and food studies). • Is the data biologically relevant? • Are the numbers “significant”? (more on significance in a bit) • Usually scientific articles are trying implicate something or some behavior – red meat eaters have more cancer translates as red meat causes more cancer. Are the authors considering several possible explanations for observed data (such as the different

The Moral and Statistical Collide
 Medical Abortion  Obesity  Nursing vs. Bottle       

Feeding Smoking Homosexuality Daycare Food/Alcohol “Natural” versus “Chemical” Pollution Crime

Causation or Correlation

It’s easy to be fooled
 Height correlates with reading skills in children   


  

under 10. Income correlates with success in college. Ratio of finger lengths correlates with aggression. Facebook correlates with poor grades. Facebook correlates with good grades. Doing heroin correlates with doing marijuana. Higher taxes correlate with high annual growth, and are inversely correlated with poverty rates. Alcoholism correlates with less gray matter in the prefrontal cortex.

fMRI studies… a case study
 fMRIs are large magnets


measuring oxygen levels in blood People can engage in activities inside the machine Patterns of blood flow are thought to reflect patterns of brain activity (more on that in a bit). Typical studies: assume that observed patterns only occur when the tested behavior occurs. Typical studies: assume that observed patterns are caused by the tested behavior.

fMRI studies… a case study
 Lying can be determined by

patterns of fMRI scans. But perhaps stress or anxiety can lead to the same patterns  Violent video gaming leads to violent brain patterns But perhaps any competitive play, including non-violent non-video games has similar brain patterns. Plus, no indication of actual violence.  Math anxiety triggers activity in the pain center of the brain. But no pain experienced by subject with math anxiety. Perhaps anxiety, not mathematics, correlated with

Jumping from Correlation to Cause
 You don’t always have to know why it may not be

causal. Be wary of any claims of causality.  Some common reasons that a correlation could look causal when it’s not include: not adjusting for confounders, misunderstanding the mechanism, having an unknown confounder.  A causal relationship might be reasonable to suspect when the statistics are
 Overwhelming

 Observed in many different contexts
 Repeated tests show the same effect, on large numbers

of people  Double blind case-control studies.

Causation vs. correlation is not the only thing to worry about in medical research

The roll of randomness
 Given a hug urn of balls – 30% of the balls are white,

and the rest are other colors.  Each of 100 people pick 10 balls, write down their colors, then return the balls to the urn.  Some people will have 3 white balls, but others will have greater or fewer.

Number of Whites is Not Determined
Randomness in Choosing 10 Balls: How many are white?
0.3 0.2 0.25 0.15 0.1 0 0.05
Probability

 About 27% chance

you will get 3 white balls; it’s much more likely you’ll get some other number  About 1% chance of getting 7 white  Suppose that “white” represents something balls random, and bad, like producing a defective product at your factory.  If one factory produces 70% defects while the others only have 30% defects, wouldn’t you think there’s a reason? Our statistics suggest that maybe not. But we look for reasons, convinced of
0 1 2 3 4 5 6 7 8 9 10 Number of White Balls Chosen

On p-values
 Suppose you are flipping a coin

seen: it answers the question: “if the coin were fair, how likely would I be to see the data I am seeing?” In other words, if you had a fair coin, is it reasonable to see the proportion of heads/tails, or is it very unlikely to see that? • If you flip 1000 times, and you get 520 heads, there is just under a 10% chance of getting this many heads (or more). In contrast, if you had 550 heads in 1000 flips, the chance of this happening randomly is only about .1%., i.e. very unlikely if the coin were fair. • The biomedical community generally accepts p=.05 (5%) as

many times, and you think this coin is biased, because you aren’t getting close to ½ heads and ½ tails. How can you quantify your The p-value is a measurement based on the data you have suspicion?

Confidence Intervals, Odds Ratios, Confounders

Confidence Intervals are “significant” when they do not contain the number 1.0.

Multiple Testing, or How to Guarantee Results
 Once you have a standard, like p<.05, you have

ways of gaming the system. There is always a small chance, something less than 5%, that you will see something that looks suspicious when it really isn’t. Sometimes your coin will favor heads by a suspicious amount, the coin really is fair.  The more hypotheses you check, the more likely this it to happen.  And once you find something suspicious, you can write a scientific article about that.

What happens in the lab: Experiments Galore...

What the rest of the world sees

Metaphors for Bad Statistical Methods
 Drunk looking for his keys under the lamppost…

Metaphors for Bad Statistical Methods
 Texas Sharpshooter Fallacy…

In real life? Value Added Models
 Value added models evaluate teachers based on

the progress of the kids, measured in standardized tests, in their classes during the year they teach them.  The idea is to use these test scores to evaluate whether the teachers are effective or not.  Never mind the difficulties of the measurement, or the question of what “effective” means.

In real life? Value Added Models
 If kids were distributed randomly to the teachers,


some teachers would get unlucky and have some bad learners. Just like it’s unlikely to get 7 white balls out of 10 when there is only a 30% chance of getting a white ball, when you do it on a large scale, it’s bound to happen. Bad teachers may fare well, good teachers may fare poorly, purely due to chance. If you correct for known problems, such as family income, you introduce more variance. Lots of people know about these issues, but no one is reporting on them.

Absolute versus relative risk
 Absolute risk is the risk you actually undergo.

Women who take the birth control pill have an absolute risk of venous thrombosis (blood clot) of about 1 in 15,000 per year. The absolute risk of women who do not take the pill is 1 in 10,000 per year.  Relative risk is a risk compared to another group. Women who take the birth control pill have a relative risk of 50% of venous thrombosis, compared to women who don’t take the pill.

Relative risk representations have consequence
 In 1995, Committee on

Safety of Medicine in UK concluded that the 3rd generation birth control pill was riskier than previous versions.  Some press reported a 100 percent increase in risk in Deep Vein Thrombosis (blood clots); others reported “twice the risk”.

The absolute risk for DVT was 15 per 100,000 for 2nd generation birth control pills The absolute risk for DVT was 30 per 100,000 for 3rd generation birth control pills. The media blitz led to many women not taking their medications (rather than immediately replacing them) and a resulting increase in unwanted pregnancies The abortion rate went up 9% from 1995 to 1996. The absolute rate of DVT for pregnant women is 80 per 100,000.

Media Impact is Great

 In 1998, Andrew

Wakefield published a study on 12 children which was the basis for the belief that Autism is a result of vaccinations.  Press repeatedly reported these results, even though the scientific community was unable to reproduce the results. The existence of this study gave greater voice to other studies

• A journalist was responsible for an investigation of the scientific integrity of Wakefield’s work. • After his autism study was discredited, most media coverage about vaccinations reports “both sides of the story” about whether vaccines are safe or not. • However, the medical community almost universally endorses vaccine and believes that vaccines are safe. • Pockets of measles and croup due to vaccination refusal or lack of herd effect have been found in the U.S. and in the UK.

Scientific Culture
• Scientists hear of a surprising result and wonder, “What went wrong?” • Titles of papers are meant to be informative about the paper, not conclusive of the results. Abstracts tell only part of the story. Results cannot be divorced from methods. • The timeline of a scientific result is months to years. • Great ideas should be kept secret until they can be proven, • The media and “lay people” don’t matter as much as scientific journals. • Any result should be contextualized. • Vested interests include funding sources, prestige, and belief systems.

Journalist Culture
• Journalists hear of a surprising result and think it makes for a great story • Headlines are meant to sell the story. Scientific abstracts tell the whole story about the science. Results are results, unless there is a money trail telling a story. • Journalists are on a tight time-line (often a couple days), • Great ideas should be shared right away, before they get “scooped”. • One scientific result is enough to write a story. • Recent meta-analysis found that “spin” in news coverage correlated highly with “spin” in abstract article conclusions.

Research is routinely plagued
Research is plagued  Low levels of significance  Multiple testing  No acknowledgement of randomness in research design  Lack of context/repeated experiments  Scientists don’t know how to talk to journalists.  But if you are looking to find one scientist willing What can a journalist do?  Write about the levels of significance, bias, caveats  Ask the researchers about multiple testing. Did they adjust for them?  Write about absolute risks.  Look for a body of research rather than one specific paper  Cite your sources!  DON’T INDICATE

Doting on Data
What cannot be found in the data?
 Answers to our moral

Lessons to be Learned

• Don’t over-estimate the

  

questions (though they can be informed by data) Answers to extremely complicated questions 100% certainty Lots of data is unavailable. Answers that take care of all possibly

• •

ability of poor data to give answers. The world is complicated; many things interact with each other. The public voice is at least as loud as the scientific voice. Consensus is extremely important

To Life!

Thank you!