You are on page 1of 12

IEMS 303 Statistics

Introduction

Instructor: Jing Dong

Statistics examples:
Coffee by the numbers from the National Coffee Association:
1. 54% of Americans over the age of 18 drinks coffee every day
2. Among those who drink coffee, they drink an average of 3.1
cups a day and the average size of a cup is 9 oz.
3. The average price of an espresso-based drink is $2.45. The
average price for a brewed cup of coffee is $1.38
4. The U.S. spends $40 billion on coffee each year.
For more fun statistics about coffee, check
http://www.hsph.harvard.edu/news/topic/coffee/

Statistics examples:
New product introduction: A brand group wishes to investigate
consumer reaction to an improved package design for its line of
cookies.
Manufacturing quality control: A manufacturer wishes to
monitor the quality of springs it produces in order to make sure
its product meets customer requirements.
Sales forecasting: A sales manager wishes to predict the sales
performance of each of the companys sales representatives on
the basis of historical performance of each representative.

What is Statistics?
The goal of statistics is to make inferences based on data. It
involves a range of techniques and procedures for collecting data,
describing data and making decisions based on data.
How numbers are collected ? Experimental design/ sampling
How statistics are calculated ? Descriptive Statistics
How results are interpreted ? Inferential Statistics
We make hypothesis about what is true, collect data in experiments,
describe the results, and the infer from the results the strength of the
evidence concerning our hypothesis.

Example 1. Suppose we have a diagnostic HIV test which is 99%


accurate. A person is picked at random and tested. The test gives a
positive result. What is the probability that the person actually have
the disease?

(source: TED talk by Peter Donnelly: How stats fool juries)

Example 1. Suppose we have a diagnostic test HIV which is 99%


accurate. A person is picked at random and tested. The test gives a
positive result. What is the probability that the person actually have
the disease?
It depends on how common or rare the disease is.
Suppose the disease affects 0.01% of the population (1 person in
10,000).
Bayes rule:
P(P | D)P(D)
P(D | P) =
P(P | D)P(D) + P(P | ND)P(ND)
0.99 0.0001
=
= 0.0098
0.99 0.0001+ 0.01 0.9999
(source: TED talk by Peter Donnelly: How stats fool juries)

Example 2. A studies shows that the more churches in a city, the


more crimes there are. Can we make the statement that churches
lead to crimes?

(Source: OnlineStatBook Project)

Example 2. A studies shows that the more churches in a city, the


more crimes there are. Can we make the statement that churches
lead to crimes?
No! In fact increased churches and crime rates can be explained by
larger population.

Correlation does not imply causation.


(Source: OnlineStatBook Project)

Example 3. In the 1970s, many parts of the US began to allow drivers to turn right at a red light. For
many years prior, road designers and civil engineers argued that allowing right turns on a red light
would be a safety hazard, causing many additional crashes and pedestrian deaths. But the 1973 oil
crisis spurred politicians to consider allowing right turn on red to save fuel wasted by commuters
waiting at red lights. Several studies were conducted to evaluate the safety impact of the change.
A consultant for the Virginia Department of Highways and Transportation conducted a before-andafter study of twenty intersections which began to allow right turns on red. Before the change there
were 308 accidents at the intersections; after, there were 337 in a similar length of time. However, this
difference was not statistically significant, and so the consultant concluded there was no safety
impact. Several subsequent studies had similar findings: small increases in the number of crashes, but
not enough data to conclude these increases were significant. As one report concluded,
``There is no reason to suspect that pedestrian accidents involving RT operations (right turns) have
increased after the adoption of [right turn on red]..."
Based on this data, more cities and states began to allow right turns at red lights. The problem is that
these studies were underpowered. More pedestrians were being run over and more cars were involved
in collisions, but nobody collected enough data to show this conclusively until several years later,
when studies arrived clearly showing the results: significant increases in collisions and pedestrian
accidents (sometimes up to 100% increases). The misinterpretation of underpowered studies cost
lives.
(source: the wrong turn on red from Statistics Done Wrong )

Hypothesis tests

True or false?
1. Reject the hypothesis when it is true
2. Fail to reject the hypothesis is true when it is false

Statistics v.s. Probability


All statistical statements are at bottom about probability.
Probability: You have a fair coin and you will toss it 100 times.
What is the probability of 60 or more heads?
Statistics: You have a coin of unknown provenance. To
investigate whether it is fair, you toss it 100 times and count the
number of heads, lets say you count 60 heads. What conclusion
(inference) can you draw from the data?
Probability reasons from the population to the sample (deductive),
whereas inferential statistics reason from the sample to the population
(inductive)

Frequentist v.s. Bayesian Schools


Different interpretations of the meaning of probability:
Frequentist: Probability measures the frequency of various
outcomes of an experiment. It has long been dominant in fields like
biology, medicine, public health and social science
Bayesian: Probability measure a state of knowledge or a degree of
belief in a given proposition. It has enjoyed a resurgence in the era
of powerful computers and big data.
Today statisticians are creating powerful tools by using both
approaches in complementary ways. In this course we will focus on the
frequentist school of statistics