Statistical Significance

BusinessStatisticsL3 MA202
Walpole Chapter 10
An Example of a Hypothesis
Teste de Hipoteses Artigo
Exam MATH2199 2012 (With Solutions)(1)
PSY 315 TUTORS Learn by Doing- Psy315tutors.com
Business Research Methods
Hypothesis Testing
Data Analysis
Les4e Ppt Ch07
Ch3 Slides
Assignment: Silk Road An Evaluation
Midterm Exam With Answers
Testing of Hypothesis-AGBS
An Analysis of Survey Data on Students from the University Park Campus at Penn State
Stats 3 Review
wp0704
Hypothesis Testing Examples and Exercises
Estadística - Ritchey Ch09
Inferentials Stastistical Analysis

ing at least as extreme results given that the null hy-

pothesis is true.

It is an integral part of

statistical hypothesis testing where it helps investigators

to decide if a null hypothesis can be rejected.

In any

experiment or observation that involves drawing a sample

from a population, there is always the possibility that an

observed eect would have occurred due to sampling er-

ror alone.

But if the probability of obtaining at least

as extreme result (large dierence between two or more

sample means), given the null hypothesis is true, is less

than a pre-determined threshold (e.g. 5% chance), then

an investigator can conclude that the observed eect ac-

tually reects the characteristics of the population rather

than just sampling error.

The present-day concept of statistical signicance orig-

inated from Ronald Fisher when he developed statisti-

cal hypothesis testing in the early 20th century.

These tests are used to determine whether the outcome

of a study would lead to a rejection of the null hypothesis

based on a pre-specied low probability threshold called

p-values, which can help an investigator to decide if a re-

sult contains sucient information to cast doubt on the

null hypothesis.

P-values are often coupled to a signicance or alpha ()

level, which is also set ahead of time, usually at 0.05

(5%).

Thus, if a p-value was found to be less than 0.05,

then the result would be considered statistically signi-

cant and the null hypothesis would be rejected.

Other

signicance levels, such as 0.1 or 0.01, are also used, de-

pending on the eld of study.

In statistics, statistical signicance is not the same as re-

search, theoretical, or practical signicance.

1 History

History of statistics

The concept of statistical signicance was originated by

Ronald Fisher when he developed statistical hypothesis

testing, which he described as tests of signicance,

in his 1925 publication, Statistical Methods for Research

Workers.

Fisher suggested a probability of one in

twenty (0.05) as a convenient cutolevel to reject the null

hypothesis.

In their 1933 paper, Jerzy Neyman and

Egon Pearson recommended that the signicance level

(e.g. 0.05), which they called , be set ahead of time,

prior to any data collection.

Despite his initial suggestion of 0.05 as a signicance

level, Fisher did not intend this cuto value to be xed,

and in his 1956 publication Statistical methods and scien-

tic inference he recommended that signicant levels be

set according to specic circumstances.

2 Role in statistical hypothesis testing

ing

Statistical signicance plays a pivotal role in statistical

In a two-tailed test, the rejection region or level is partitioned

to both ends of the sampling distribution and make up only 5%

of the area under the curve.

hypothesis testing, where it is used to determine if a null

hypothesis should be rejected or retained. A null hypoth-

esis is the general or default statement that nothing hap-

pened or changed.

For a null hypothesis to be rejected

as false, the result has to be identied as being statisti-

cally signicant, i.e. unlikely to have occurred by chance

alone.

To determine if a result is statistically signicant, a re-

searcher would have to calculate a p-value, which is the

probability of observing an eect given that the null hy-

pothesis is true.

The null hypothesis is rejected if the p-

value is less than the signicance or level. The level is

the probability of rejecting the null hypothesis given that

it is true (type I error) and is most often set at 0.05 (5%).

If the level is 0.05, then the conditional probability of a

type I error, given that the null hypothesis is true, is 5%.

Then a statistically signicant result is one in which the

observed p-value is less than 5%, which is formally writ-

ten as p < 0.05.

If an observed p-value is not lower than the signicance

level, then rather than simply accepting the null hypoth-

esis, where feasible it would often be appropriate to in-

crease the sample size of the study, and see if the signif-

icance level is reached.

If the level is set at 0.05, it means that the rejection re-

gion comprises 5% of the sampling distribution.

This

5% can be allocated to one side of the sampling distri-

bution as in a one-tailed test or partitioned to both sides

of the distribution as in a two-tailed test, with each tail

(or rejection region) containing 2.5% of the distribution.

One-tailed tests are more powerful than two-tailed tests,

as a null hypothesis can be rejected with a less extreme

result.

3 Defining significance in terms of sigma (σ)

sigma ()

In specic elds such as particle physics and

manufacturing, statistical signicance is often ex-

pressed in multiples of the standard deviation or sigma

() of a normal distribution, with signicance thresholds

set at a much stricter level (e.g. 5).

For instance,

the certainty of the Higgs boson particles existence was

based on the 5 criterion, which corresponds to a p-value

of about 1 in 3.5 million.

4 Effect size

Researchers focusing solely on whether their results are

statistically signicant might report ndings that are not

necessarily substantive.

To gauge the research signif-

icance of their result, researchers are also encouraged to

report the eect size along with p-values (in cases where

the eect being tested for is dened in terms of an eect

size): the eect size quanties the strength of an eect,

such as the distance between two means or the correlation

between two variables.

Ziliak, Stephen, and McCloskey, Deirdre, (2008).

The Cult of Statistical Signicance: How the Stan-

dard Error Costs Us Jobs, Justice, and Lives. Ann

Arbor, University of Michigan Press, 2009.

Thompson, Bruce, (2004). The signicance cri-

sis in psychology and education. Journal of Socio-

Economics, 33, pp. 607613.

Chow, Siu L., (1996). Statistical Signicance: Ra-

tionale, Validity and Utility, Volume 1 of series In-

troducing Statistical Methods, Sage Publications Ltd,

ISBN 978-0-7619-5205-3 argues that statistical

signicance is useful in certain circumstances.

Kline, Rex, (2004). Beyond Signicance Testing:

Reforming Data Analysis Methods in Behavioral Re-

search Washington, DC: American Psychological

Association.

The article "Earliest Known Uses of Some of the

Words of Mathematics (S)" contains an entry on Sig-

nicance that provides some historical information.

"The Concept of Statistical Signicance Testing"

(February 1994): article by Bruce Thompon hosted

by the ERIC Clearinghouse on Assessment and

Evaluation, Washington, D.C.

"What does it mean for a result to be statistically

signicant"?" (no date): an article from the Statis-

tical Assessment Service at George Mason Univer-

sity, Washington, D.C.

