P. 1
OezbekC07-How to Lie With Statistics

OezbekC07-How to Lie With Statistics

|Views: 20|Likes:
Published by WebMedia Scit

More info:

Published by: WebMedia Scit on Jan 08, 2012
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Course "Empirical Evaluation in Informatics


How to lie with statistics
Christopher Oezbek, Prof. Dr. Lutz Prechelt Freie Universität Berlin, Institut für Informatik http://www.inf.fu-berlin.de/inst/ag-se/ • • • • • •
What do they mean? Biased measures Biased samples What is the real reason? Misleading averages Misleading visualizations

• • • • • •

Pseudo-precision Plain false statements What is not being said? "Just try again" Incomparable measures Invalid measures

Christopher Oezbek oezbek@inf.fu-berlin.de, Lutz Prechelt, prechelt@inf.fu-berlin.de

1 / 37

"Empirische Bewertung in der Informatik"

Wie man mit Statistik lügt
Christopher Oezbek, Prof. Dr. Lutz Prechelt Freie Universität Berlin, Institut für Informatik http://www.inf.fu-berlin.de/inst/ag-se/ • Was ist überhaupt gemeint? • Verzerrt das benutzte Maß? • Verzerrt die • • •
Stichprobenauswahl? Ist das wirklich der Grund? Irreführende Mittelwerte Irreführende Darstellungen

• • • • • •

Pseudopräzision Glatte Falschaussagen Was wird nicht gesagt? "Probier einfach noch mal" Unvergleichbare Daten Gültigkeit von Maßen

Christopher Oezbek oezbek@inf.fu-berlin.de, Lutz Prechelt, prechelt@inf.fu-berlin.de

2 / 37

Source • This slide set was created roughly along the lines of
Darrell Huff: "How to Lie With Statistics",
(Victor Gollancz 1954, Pelican Books 1973, Penguin Books 1991)

• I urge everyone to read this book in full
• It is short (120 p.), entertaining, and insightful • Many different editions available • Other, similar books exist as well

Christopher Oezbek oezbek@inf.fu-berlin.de, Lutz Prechelt, prechelt@inf.fu-berlin.de

3 / 37

prechelt@inf.fu-berlin. Lutz Prechelt.fu-berlin.de.de 4 / 37 .Example: Human Growth Hormone (HGH) Christopher Oezbek oezbek@inf.

de/biochemie/rubriken/01_doping/06. prechelt@inf. da bisher keine wissenschaftliche Studie zeigen konnte. Lutz Prechelt. zu Leistungssteigerungen führen kann." Christopher Oezbek oezbek@inf.dshs-koeln.fu-berlin. or harmful Note: • HGH is on the IOC doping list • http://www.html • "Für die therapeutische Anwendung von HGH kommen derzeit nur zwei wesentliche Krankheitsbilder in Frage: Zwergwuchs bei Kindern und HGHMangel beim Erwachsenen" • "Die Wirksamkeit von HGH bei Sportlern muss allerdings bisher stark in Frage gestellt werden. die eine normale HGH-Produktion aufweisen. useless.de 5 / 37 .Remark • We use this real spam email as an arbitrary example • and will make unwarranted assumptions about what is behind it • for illustrative purposes • I do not claim that HGH treatment is useful. dass eine zusätzliche HGH-Applikation bei Personen.de.fu-berlin.

can be measured • "Wrinkle reduction: up to 61%" • Maybe they count the wrinkles and measure their depth? • "Energy level: up to 84%" • What is this? • Also note they use language loosely: • Loss in percent: OK.Problem 1: What do they mean? • "Body fat loss: up to 82%" • OK.de 6 / 37 .de.fu-berlin.fu-berlin. Lutz Prechelt. reduction in percent: OK • Level in percent??? (should be 'increase') Christopher Oezbek oezbek@inf. prechelt@inf.

"Energy level" may be a subjective estimate of patients who knew they were treated with a "wonder drug" Christopher Oezbek oezbek@inf.de. there is no stringent definition at all • Or multiple different definitions are used • and incomparable data get mixed • Or the definition has dubious value • e.Lesson: Dare ask what • Always question the definition of the measures for which somebody gives you statistics • Surprisingly often. Lutz Prechelt.fu-berlin. prechelt@inf.fu-berlin.g.de 7 / 37 .

Problem 2: A maximum does not say much • Wrinkle reduction: up to 61% • So that was the best value.fu-berlin. What about the rest? • Maybe the distribution was like this: M oo o o o oo oooooo o ooooo oo o oo oo o o oo o o oo o ooooo oo oo o o o o o o o o oo oo ooooo oooo o o o o o o o o 0 10 20 30 reduction 40 50 60 Christopher Oezbek oezbek@inf. Lutz Prechelt.de 8 / 37 .fu-berlin.de. prechelt@inf.

Lesson: Dare ask for unbiased measures • Always ask for neutral. we need summary information about variability at the very least • e.g. Lutz Prechelt.de.fu-berlin. informative measures • in particular when talking to a party with vested interest • Extremes are rarely useful to show that someting is generally large (or small) • Averages are better • But even averages can be very misleading • see the following example later in this presentation • If the shape of the distribution is unknown.fu-berlin. prechelt@inf. rather different kinds of information might be required for judging something Christopher Oezbek oezbek@inf.de 9 / 37 . the data from the plot in the previous slide has arithmetic mean 10 and standard deviation 8 • Note: In different situations.

fu-berlin.de.Problem 3: Underlying population • Wrinkle reduction: up to 61% • Maybe they measured a very special set of people? heartAttack M oo ooo oooo o o o o o o ooo oooooooooo o o o o oo o o ooooooooo oo o o oo o o o o o No p u te re : T f a hi s nt d as at o y! a i s o healthy o oo M o oo ooo o oooo oooo oo oo oooooo ooo o o o o oo o oo oo o o ooo o o o o o o o o o oo o o oo ooo o o o o -20 0 20 reduction 40 60 Christopher Oezbek oezbek@inf. Lutz Prechelt. prechelt@inf.fu-berlin.de 10 / 37 .

prechelt@inf. Notes: • A biased sample may be the best one can get • Sometimes we can suspect that there is a bias. ask. Lutz Prechelt.de.de 11 / 37 . but cannot be sure Christopher Oezbek oezbek@inf.Lesson: Insist on unbiased samples • How and where from the data was collected can have a • • • tremendous impact on the results It is important to understand whether there is a certain (possibly intended) tendency in this A fair statistic talks about possible bias it contains If it does not.fu-berlin.fu-berlin.

noHGH M o oooooooooooo o oo ooooooooooo oo o o o o ooooooooooooo ooooo o oo o oo oo o o -20 0 20 reduction 40 60 Christopher Oezbek oezbek@inf.Problem 4: Is HGH even part of the cause? • Wrinkle reduction: up to 61% • Maybe that could happen even without HGH? No pu te: re T fa his nt d as at y! a i s o heartAttack M o M o oo ooooo oo o o ooooooooooooooo oo ooooooo o oo o o o o o o oo o o oo oo o oo o o healthy oo o o oo oo oo ooooo oooooo oo oooooo o oo oo o ooo o o o oo o o o o oo oo oo oo o o o o o oo o o o o o h.fu-berlin.de 12 / 37 . prechelt@inf.A.de.. Lutz Prechelt.fu-berlin.

Lutz Prechelt. prechelt@inf. it contains hardly anything else than bias • If somebody presents you with a presumably causal relationship ("A causes B").de 13 / 37 .de.fu-berlin. ask yourself • What other influences besides A may be important? • What is the relative weight of A compared to these? Christopher Oezbek oezbek@inf.fu-berlin.Lesson: Question causality • Sometimes the data is not just biased.

de 14 / 37 . prechelt@inf. Lutz Prechelt.fu-berlin.3% higher than in Bulugu" Christopher Oezbek oezbek@inf.de.fu-berlin.Example 2: Tungu and Bulugu • We look at the yearly per-capita income in two small hypothetic island states: Tungu and Bulugu • Statement: "The average yearly income in Tungu is 94.

Problem 1: Misleading averages • The island states are rather small: • 81 people in Tungu and 80 in Bulugu And the income distribution is not as even in Tungu: Tungu M o o o oo oo oo ooo oo oo oo o o o o o o oo o o ooooo oooooo oooooooooooo o o ooo o oo o oo o o o o o o Bulugu oo M oo o o ooooooo o o o o o o o o o o oo o o o o o o o o o o oo o oo ooo oooooo o ooooooo o o o oo o o o o o o o o o No pu te: re T fa his nt d as at y! a i s 0 1000 2000 income 3000 4000 5000 Christopher Oezbek oezbek@inf. prechelt@inf.fu-berlin.de 15 / 37 .fu-berlin. Lutz Prechelt.de.

de 16 / 37 .0 10^3.0 income 10^4. prechelt@inf.fu-berlin.0 Christopher Oezbek oezbek@inf.Misleading averages and outliers • The only reason is Dr.fu-berlin. Lutz Prechelt.5 10^4. owner of a small software company in Berlin. who since last year is enjoying his retirement in Tungu Tungu M o o o o oo o o o oooo o ooo ooooooooooo o oo ooooooooo o o ooo o oo o ooooooooooo o oo o o Bulugu M oooo o oo ooo o oooooooooo ooooo o oo oo oo o oo oo o o ooo ooooo ooo o o o o ooo o oo o 10^3. Waldner.5 10^5.de.

0.g. 0.de.5.25 quantile is equivalent to 25-percentile etc.de 17 / 37 . Tungu M o o o oo ooo o o o oooo o ooo ooooooooooo o ooo ooooooooo o ooo ooooooooooo o o oo o o Christopher Oezbek oezbek@inf.Lesson: Question appropriateness • A certain statistic (very often the arithmetic average) may be • inappropriate for characterizing a sample If there is any doubt. ask that additional information be provided • such as standard deviation • or some quantiles. 1 Note: 0.75. prechelt@inf. 0. Lutz Prechelt.fu-berlin.25.fu-berlin. e. 0.

Logarithmic axes • Waldner earns 160. Lutz Prechelt. prechelt@inf.000 per year.fu-berlin. is impossible to see on the logarithmic axis we just used Tungu M oo o oo oo ooo o oo o Bulugu M oo oo ooo o ooo o o 0 50000 100000 income 150000 Christopher Oezbek oezbek@inf. How much more that is than the other Tunguans have.de 18 / 37 .fu-berlin.de.

Lesson: Beware of inappropriate visualizations • Logarithmic axes are useful for reading hugely different • values from a graph with some precision But they totally defeat the imagination • There are many more kinds of inappropriate visualizations • see later in this presentation Christopher Oezbek oezbek@inf. prechelt@inf.fu-berlin.fu-berlin.de 19 / 37 .de. Lutz Prechelt.

fu-berlin. prechelt@inf.de 20 / 37 . Alulu Nirudu from Tungu gives birth to her twins • There are now 83 rather than 81 people on Tungu • The average income drops from 3922 to 3827 • The difference to Bulugu drops from 94.3% higher than in Bulugu" • Assume that tomorrow Mrs.7% Christopher Oezbek oezbek@inf. Lutz Prechelt.fu-berlin.de.3% to 89.Problem 3: Misleading precision • "The average yearly income in Tungu is 94.

fu-berlin.de 21 / 37 . prechelt@inf.fu-berlin.de.Lesson: Do not be easily impressed • The usual reason for presenting very precise numbers is the wish to impress people • "Round numbers are always false" • But round numbers are much easier to remember and compare • Clearly tell people you will not be impressed by precision • in particular if the precision is purely imaginary Christopher Oezbek oezbek@inf. Lutz Prechelt.

de.de 22 / 37 . Lutz Prechelt.fu-berlin.Example 3: Phantasmo Corporation stock price • We look at the recent development of the price of shares for Phantasmo Corporation 192 stock price 180 remarkably strong and consistent value growth and continues to be a top recommendation" 182 184 186 • "Phantasmo shows a 188 ta da is th y) d r an ina o g sm ima ta an rely h (P e pu ar 190 0 100 200 day 300 400 Christopher Oezbek oezbek@inf.fu-berlin. prechelt@inf.

fu-berlin. prechelt@inf. Lutz Prechelt.Problem: Looks can be misleading exactly the same data! 192 180 0 182 stock price 184 186 188 190 • The following two plots show • and the same as the plot on the previous slide! stock price 184 186 188 190 10 0 20 0 dy a 30 0 40 0 180 182 192 0 1 0 0 2 0 0 d a y 3 0 0 4 0 0 Christopher Oezbek oezbek@inf.de 23 / 37 .de.fu-berlin.

de 24 / 37 . Lutz Prechelt.de.fu-berlin.Problem: Scales can be misleading • What really happened is shown here • We intuitively interpret a trend plot on a ratio scale stock price 200 0 182 190 192 stock price 184 186 188 50 0 0 100 200 day 300 400 100 150 180 100 200 day 300 400 Christopher Oezbek oezbek@inf.fu-berlin. prechelt@inf.

msn. prechelt@inf. Lutz Prechelt.So look carefully! found on focus.fu-berlin.de.de on 2004-03-04: Christopher Oezbek oezbek@inf.de 25 / 37 .fu-berlin.

de. prechelt@inf.Problem: Scales can be missing • The most insolent persuaders may even leave the scale out altogether 0 100 200 day 300 400 Christopher Oezbek oezbek@inf.de 26 / 37 .fu-berlin. Lutz Prechelt.fu-berlin.

de.Problem: Scales can be abused • Observe the global impression first Christopher Oezbek oezbek@inf. Lutz Prechelt.fu-berlin.de 27 / 37 . prechelt@inf.fu-berlin.

07.de 28 / 37 .de.10. Lutz Prechelt.fu-berlin.Problem: People may invent unexpected things • Quelle: Werbeanzeige der DonauUniversität Krems • DIE ZEIT. prechelt@inf.2004 Christopher Oezbek oezbek@inf.fu-berlin.

fu-berlin.de 29 / 37 . prechelt@inf.fu-berlin. Lutz Prechelt.de.Lesson: Seeing is believing • but often it shouldn't be • Always consider what it really is that you are seeing • Do not believe anything purely intuitively • Do not believe anything that does not have a well-defined meaning Christopher Oezbek oezbek@inf.

prechelt@inf.fu-berlin.de.de 30 / 37 .Example 4: blend-a-med Night Effects • What do they not say? • What exactly does "sichtbar" mean? • What were the results of the clinical trials? • What other effects does Night Effects have? Christopher Oezbek oezbek@inf.fu-berlin. Lutz Prechelt.

fu-berlin. USA) • On 2003-10-30.Example 6: economic growth (D vs.fu-berlin.de. "Brutto-Inlandsprodukt". prechelt@inf.2% • Assume that same day the German Statistisches Bundesamt had announced • D economic growth in 3rd quarter: 2% • (Note: This value is fictitious) • Note: Both values refer to gross domestic product (GDP. Lutz Prechelt. BIP) • Which economy was growing faster? Christopher Oezbek oezbek@inf.de 31 / 37 . the US Buerau of Economic Analysis (BEA) announced • USA economic growth in 3rd quarter: 7.

Lutz Prechelt.de 32 / 37 . prechelt@inf. the actual US growth factor during (from start to end of) this quarter was only x. where x4 = 1.de.fu-berlin.Problem: Different definitions • The US BEA extrapolates the growth for each quarter to a full year • Statistisches Bundesamt does not • Thus.75% in this quarter Christopher Oezbek oezbek@inf.0175 • US growth was only 1. • x = 1.072.fu-berlin.

Real-world example: 25-fold reliability • "Warum billigere Tintenpatronen verwenden. Lutz Prechelt.fu-berlin. wenn Original HP Tinten bis zu 25-mal zuverlässiger sind?" • "Why use cheaper ink cartridges when genuine HP ink is up to 25 times more reliable?" Christopher Oezbek oezbek@inf.de.fu-berlin. prechelt@inf.de 33 / 37 .

de.de 34 / 37 .fu-berlin. non-DOA yield) • HU: high unusable (>10% pages with low quality) Christopher Oezbek oezbek@inf. prechelt@inf.25-fold reliability explanation color cartridges • DOA: Dead-on-arrival (<10 pages usable capacity) • PF: premature failure (<75% of avg.fu-berlin. Lutz Prechelt.

fu-berlin.fu-berlin.) per brand 40 percent 0 0 10 20 30 50 20 40 60 size 80 100 120 Christopher Oezbek oezbek@inf.de 35 / 37 . capacity of all cart's.de. Lutz Prechelt. prechelt@inf.25-fold reliability explanation (2) • Percentage of PF cartridges (less than 75% of the avg.

prechelt@inf. Lutz Prechelt.25-fold reliability explanation (3) More problems with this data: • 52/120 = 43% is what they used • 52/103 = 50% is right if PF excludes DOA (as claimed) • (52–17)/103 = 34% is right if PF includes DOA Christopher Oezbek oezbek@inf.fu-berlin.de.fu-berlin.de 36 / 37 .

fu-berlin. prechelt@inf. Lutz Prechelt.Thank you! Christopher Oezbek oezbek@inf.de 37 / 37 .fu-berlin.de.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->