OezbekC07-How to Lie With Statistics

OezbekC07-How to Lie With Statistics

Published by: WebMedia Scit on Jan 08, 2012
Course "Empirical Evaluation in Informatics


How to lie with statistics
Christopher Oezbek, Prof. Dr. Lutz Prechelt Freie Universität Berlin, Institut für Informatik http://www.inf.fu-berlin.de/inst/ag-se/ • • • • • •
What do they mean? Biased measures Biased samples What is the real reason? Misleading averages Misleading visualizations

• • • • • •

Pseudo-precision Plain false statements What is not being said? "Just try again" Incomparable measures Invalid measures

Christopher Oezbek oezbek@inf.fu-berlin.de, Lutz Prechelt, prechelt@inf.fu-berlin.de

1 / 37

"Empirische Bewertung in der Informatik"

Wie man mit Statistik lügt
Christopher Oezbek, Prof. Dr. Lutz Prechelt Freie Universität Berlin, Institut für Informatik http://www.inf.fu-berlin.de/inst/ag-se/ • Was ist überhaupt gemeint? • Verzerrt das benutzte Maß? • Verzerrt die • • •
Stichprobenauswahl? Ist das wirklich der Grund? Irreführende Mittelwerte Irreführende Darstellungen

• • • • • •

Pseudopräzision Glatte Falschaussagen Was wird nicht gesagt? "Probier einfach noch mal" Unvergleichbare Daten Gültigkeit von Maßen

Christopher Oezbek oezbek@inf.fu-berlin.de, Lutz Prechelt, prechelt@inf.fu-berlin.de

2 / 37

Source • This slide set was created roughly along the lines of
Darrell Huff: "How to Lie With Statistics",
(Victor Gollancz 1954, Pelican Books 1973, Penguin Books 1991)

• I urge everyone to read this book in full
• It is short (120 p.), entertaining, and insightful • Many different editions available • Other, similar books exist as well

Christopher Oezbek oezbek@inf.fu-berlin.de, Lutz Prechelt, prechelt@inf.fu-berlin.de

3 / 37

prechelt@inf.fu-berlin. Lutz Prechelt.fu-berlin.de.de 4 / 37 .Example: Human Growth Hormone (HGH) Christopher Oezbek oezbek@inf.

de/biochemie/rubriken/01_doping/06. prechelt@inf. da bisher keine wissenschaftliche Studie zeigen konnte. Lutz Prechelt. zu Leistungssteigerungen führen kann." Christopher Oezbek oezbek@inf.dshs-koeln.fu-berlin. or harmful Note: • HGH is on the IOC doping list • http://www.html • "Für die therapeutische Anwendung von HGH kommen derzeit nur zwei wesentliche Krankheitsbilder in Frage: Zwergwuchs bei Kindern und HGHMangel beim Erwachsenen" • "Die Wirksamkeit von HGH bei Sportlern muss allerdings bisher stark in Frage gestellt werden. die eine normale HGH-Produktion aufweisen. useless.de 5 / 37 .Remark • We use this real spam email as an arbitrary example • and will make unwarranted assumptions about what is behind it • for illustrative purposes • I do not claim that HGH treatment is useful. dass eine zusätzliche HGH-Applikation bei Personen.de.fu-berlin.

can be measured • "Wrinkle reduction: up to 61%" • Maybe they count the wrinkles and measure their depth? • "Energy level: up to 84%" • What is this? • Also note they use language loosely: • Loss in percent: OK.Problem 1: What do they mean? • "Body fat loss: up to 82%" • OK.de 6 / 37 .de.fu-berlin.fu-berlin. Lutz Prechelt. reduction in percent: OK • Level in percent??? (should be 'increase') Christopher Oezbek oezbek@inf. prechelt@inf.

"Energy level" may be a subjective estimate of patients who knew they were treated with a "wonder drug" Christopher Oezbek oezbek@inf.de. there is no stringent definition at all • Or multiple different definitions are used • and incomparable data get mixed • Or the definition has dubious value • e.Lesson: Dare ask what • Always question the definition of the measures for which somebody gives you statistics • Surprisingly often. Lutz Prechelt.fu-berlin. prechelt@inf.fu-berlin.g.de 7 / 37 .

Problem 2: A maximum does not say much • Wrinkle reduction: up to 61% • So that was the best value.fu-berlin. What about the rest? • Maybe the distribution was like this: M oo o o o oo oooooo o ooooo oo o oo oo o o oo o o oo o ooooo oo oo o o o o o o o o oo oo ooooo oooo o o o o o o o o 0 10 20 30 reduction 40 50 60 Christopher Oezbek oezbek@inf. Lutz Prechelt.de 8 / 37 .fu-berlin.de. prechelt@inf.

Lesson: Dare ask for unbiased measures • Always ask for neutral. we need summary information about variability at the very least • e.g. Lutz Prechelt.de.fu-berlin. informative measures • in particular when talking to a party with vested interest • Extremes are rarely useful to show that someting is generally large (or small) • Averages are better • But even averages can be very misleading • see the following example later in this presentation • If the shape of the distribution is unknown.fu-berlin. prechelt@inf. rather different kinds of information might be required for judging something Christopher Oezbek oezbek@inf.de 9 / 37 . the data from the plot in the previous slide has arithmetic mean 10 and standard deviation 8 • Note: In different situations.

fu-berlin.de.Problem 3: Underlying population • Wrinkle reduction: up to 61% • Maybe they measured a very special set of people? heartAttack M oo ooo oooo o o o o o o ooo oooooooooo o o o o oo o o ooooooooo oo o o oo o o o o o No p u te re : T f a hi s nt d as at o y! a i s o healthy o oo M o oo ooo o oooo oooo oo oo oooooo ooo o o o o oo o oo oo o o ooo o o o o o o o o o oo o o oo ooo o o o o -20 0 20 reduction 40 60 Christopher Oezbek oezbek@inf. Lutz Prechelt. prechelt@inf.fu-berlin.de 10 / 37 .

prechelt@inf. Notes: • A biased sample may be the best one can get • Sometimes we can suspect that there is a bias. ask. Lutz Prechelt.de.de 11 / 37 . but cannot be sure Christopher Oezbek oezbek@inf.Lesson: Insist on unbiased samples • How and where from the data was collected can have a • • • tremendous impact on the results It is important to understand whether there is a certain (possibly intended) tendency in this A fair statistic talks about possible bias it contains If it does not.fu-berlin.fu-berlin.

noHGH M o oooooooooooo o oo ooooooooooo oo o o o o ooooooooooooo ooooo o oo o oo oo o o -20 0 20 reduction 40 60 Christopher Oezbek oezbek@inf.Problem 4: Is HGH even part of the cause? • Wrinkle reduction: up to 61% • Maybe that could happen even without HGH? No pu te: re T fa his nt d as at y! a i s o heartAttack M o M o oo ooooo oo o o ooooooooooooooo oo ooooooo o oo o o o o o o oo o o oo oo o oo o o healthy oo o o oo oo oo ooooo oooooo oo oooooo o oo oo o ooo o o o oo o o o o oo oo oo oo o o o o o oo o o o o o h.fu-berlin.de 12 / 37 . prechelt@inf.A.de.. Lutz Prechelt.fu-berlin.

Lutz Prechelt. prechelt@inf. it contains hardly anything else than bias • If somebody presents you with a presumably causal relationship ("A causes B").de 13 / 37 .de.fu-berlin. ask yourself • What other influences besides A may be important? • What is the relative weight of A compared to these? Christopher Oezbek oezbek@inf.fu-berlin.Lesson: Question causality • Sometimes the data is not just biased.

de 14 / 37 . prechelt@inf. Lutz Prechelt.fu-berlin.3% higher than in Bulugu" Christopher Oezbek oezbek@inf.de.fu-berlin.Example 2: Tungu and Bulugu • We look at the yearly per-capita income in two small hypothetic island states: Tungu and Bulugu • Statement: "The average yearly income in Tungu is 94.

Problem 1: Misleading averages • The island states are rather small: • 81 people in Tungu and 80 in Bulugu And the income distribution is not as even in Tungu: Tungu M o o o oo oo oo ooo oo oo oo o o o o o o oo o o ooooo oooooo oooooooooooo o o ooo o oo o oo o o o o o o Bulugu oo M oo o o ooooooo o o o o o o o o o o oo o o o o o o o o o o oo o oo ooo oooooo o ooooooo o o o oo o o o o o o o o o No pu te: re T fa his nt d as at y! a i s 0 1000 2000 income 3000 4000 5000 Christopher Oezbek oezbek@inf. prechelt@inf.fu-berlin.de 15 / 37 .fu-berlin. Lutz Prechelt.de.

de 16 / 37 .0 10^3.0 income 10^4. prechelt@inf.fu-berlin.0 Christopher Oezbek oezbek@inf.Misleading averages and outliers • The only reason is Dr.fu-berlin. Lutz Prechelt.5 10^4. owner of a small software company in Berlin. who since last year is enjoying his retirement in Tungu Tungu M o o o o oo o o o oooo o ooo ooooooooooo o oo ooooooooo o o ooo o oo o ooooooooooo o oo o o Bulugu M oooo o oo ooo o oooooooooo ooooo o oo oo oo o oo oo o o ooo ooooo ooo o o o o ooo o oo o 10^3. Waldner.5 10^5.de.

0.g. 0.de.5.25 quantile is equivalent to 25-percentile etc.de 17 / 37 . Tungu M o o o oo ooo o o o oooo o ooo ooooooooooo o ooo ooooooooo o ooo ooooooooooo o o oo o o Christopher Oezbek oezbek@inf.Lesson: Question appropriateness • A certain statistic (very often the arithmetic average) may be • inappropriate for characterizing a sample If there is any doubt. ask that additional information be provided • such as standard deviation • or some quantiles. 1 Note: 0.75. prechelt@inf. 0. Lutz Prechelt.fu-berlin.25.fu-berlin. e. 0.

Logarithmic axes • Waldner earns 160. Lutz Prechelt. prechelt@inf.000 per year.fu-berlin. is impossible to see on the logarithmic axis we just used Tungu M oo o oo oo ooo o oo o Bulugu M oo oo ooo o ooo o o 0 50000 100000 income 150000 Christopher Oezbek oezbek@inf. How much more that is than the other Tunguans have.de 18 / 37 .fu-berlin.de.

Lesson: Beware of inappropriate visualizations • Logarithmic axes are useful for reading hugely different • values from a graph with some precision But they totally defeat the imagination • There are many more kinds of inappropriate visualizations • see later in this presentation Christopher Oezbek oezbek@inf. prechelt@inf.fu-berlin.fu-berlin.de 19 / 37 .de. Lutz Prechelt.

fu-berlin. prechelt@inf.de 20 / 37 . Alulu Nirudu from Tungu gives birth to her twins • There are now 83 rather than 81 people on Tungu • The average income drops from 3922 to 3827 • The difference to Bulugu drops from 94.3% higher than in Bulugu" • Assume that tomorrow Mrs.7% Christopher Oezbek oezbek@inf. Lutz Prechelt.fu-berlin.de.3% to 89.Problem 3: Misleading precision • "The average yearly income in Tungu is 94.

fu-berlin.de 21 / 37 . prechelt@inf.fu-berlin.de.Lesson: Do not be easily impressed • The usual reason for presenting very precise numbers is the wish to impress people • "Round numbers are always false" • But round numbers are much easier to remember and compare • Clearly tell people you will not be impressed by precision • in particular if the precision is purely imaginary Christopher Oezbek oezbek@inf. Lutz Prechelt.

de.de 22 / 37 . Lutz Prechelt.fu-berlin.Example 3: Phantasmo Corporation stock price • We look at the recent development of the price of shares for Phantasmo Corporation 192 stock price 180 remarkably strong and consistent value growth and continues to be a top recommendation" 182 184 186 • "Phantasmo shows a 188 ta da is th y) d r an ina o g sm ima ta an rely h (P e pu ar 190 0 100 200 day 300 400 Christopher Oezbek oezbek@inf.fu-berlin. prechelt@inf.

fu-berlin. prechelt@inf. Lutz Prechelt.Problem: Looks can be misleading exactly the same data! 192 180 0 182 stock price 184 186 188 190 • The following two plots show • and the same as the plot on the previous slide! stock price 184 186 188 190 10 0 20 0 dy a 30 0 40 0 180 182 192 0 1 0 0 2 0 0 d a y 3 0 0 4 0 0 Christopher Oezbek oezbek@inf.de 23 / 37 .de.fu-berlin.

de 24 / 37 . Lutz Prechelt.de.fu-berlin.Problem: Scales can be misleading • What really happened is shown here • We intuitively interpret a trend plot on a ratio scale stock price 200 0 182 190 192 stock price 184 186 188 50 0 0 100 200 day 300 400 100 150 180 100 200 day 300 400 Christopher Oezbek oezbek@inf.fu-berlin. prechelt@inf.

msn. prechelt@inf. Lutz Prechelt.So look carefully! found on focus.fu-berlin.de.de on 2004-03-04: Christopher Oezbek oezbek@inf.de 25 / 37 .fu-berlin.

de. prechelt@inf.Problem: Scales can be missing • The most insolent persuaders may even leave the scale out altogether 0 100 200 day 300 400 Christopher Oezbek oezbek@inf.de 26 / 37 .fu-berlin. Lutz Prechelt.fu-berlin.

de.Problem: Scales can be abused • Observe the global impression first Christopher Oezbek oezbek@inf. Lutz Prechelt.fu-berlin.de 27 / 37 . prechelt@inf.fu-berlin.

07.de 28 / 37 .de.10. Lutz Prechelt.fu-berlin.Problem: People may invent unexpected things • Quelle: Werbeanzeige der DonauUniversität Krems • DIE ZEIT. prechelt@inf.2004 Christopher Oezbek oezbek@inf.fu-berlin.

fu-berlin.de 29 / 37 . prechelt@inf.fu-berlin. Lutz Prechelt.de.Lesson: Seeing is believing • but often it shouldn't be • Always consider what it really is that you are seeing • Do not believe anything purely intuitively • Do not believe anything that does not have a well-defined meaning Christopher Oezbek oezbek@inf.

prechelt@inf.fu-berlin.de.de 30 / 37 .Example 4: blend-a-med Night Effects • What do they not say? • What exactly does "sichtbar" mean? • What were the results of the clinical trials? • What other effects does Night Effects have? Christopher Oezbek oezbek@inf.fu-berlin. Lutz Prechelt.

fu-berlin. USA) • On 2003-10-30.Example 6: economic growth (D vs.fu-berlin.de. "Brutto-Inlandsprodukt". prechelt@inf.2% • Assume that same day the German Statistisches Bundesamt had announced • D economic growth in 3rd quarter: 2% • (Note: This value is fictitious) • Note: Both values refer to gross domestic product (GDP. Lutz Prechelt. BIP) • Which economy was growing faster? Christopher Oezbek oezbek@inf.de 31 / 37 . the US Buerau of Economic Analysis (BEA) announced • USA economic growth in 3rd quarter: 7.

Lutz Prechelt.de 32 / 37 . prechelt@inf. the actual US growth factor during (from start to end of) this quarter was only x. where x4 = 1.de.fu-berlin.Problem: Different definitions • The US BEA extrapolates the growth for each quarter to a full year • Statistisches Bundesamt does not • Thus.75% in this quarter Christopher Oezbek oezbek@inf.0175 • US growth was only 1. • x = 1.072.fu-berlin.

Real-world example: 25-fold reliability • "Warum billigere Tintenpatronen verwenden. Lutz Prechelt.fu-berlin. wenn Original HP Tinten bis zu 25-mal zuverlässiger sind?" • "Why use cheaper ink cartridges when genuine HP ink is up to 25 times more reliable?" Christopher Oezbek oezbek@inf.de.fu-berlin. prechelt@inf.de 33 / 37 .

de.de 34 / 37 .fu-berlin. non-DOA yield) • HU: high unusable (>10% pages with low quality) Christopher Oezbek oezbek@inf. prechelt@inf.25-fold reliability explanation color cartridges • DOA: Dead-on-arrival (<10 pages usable capacity) • PF: premature failure (<75% of avg.fu-berlin. Lutz Prechelt.

fu-berlin.fu-berlin.) per brand 40 percent 0 0 10 20 30 50 20 40 60 size 80 100 120 Christopher Oezbek oezbek@inf.de 35 / 37 . capacity of all cart's.de. Lutz Prechelt. prechelt@inf.25-fold reliability explanation (2) • Percentage of PF cartridges (less than 75% of the avg.

prechelt@inf. Lutz Prechelt.25-fold reliability explanation (3) More problems with this data: • 52/120 = 43% is what they used • 52/103 = 50% is right if PF excludes DOA (as claimed) • (52–17)/103 = 34% is right if PF includes DOA Christopher Oezbek oezbek@inf.fu-berlin.de.fu-berlin.de 36 / 37 .

fu-berlin. prechelt@inf. Lutz Prechelt.Thank you! Christopher Oezbek oezbek@inf.de 37 / 37 .fu-berlin.de.

