Professional Documents
Culture Documents
Descriptive Statistics
Describing data
Moment
Non-mean based measure Mode, median Range (max-min), Interquartile range (1st3rd quartile) ---
Center Spread
Skew Peaked
Quartile
In descriptive statistics, a quartile is any of the three values which divide the sorted data set into four equal parts, so that each part represents 1/4th of the sample or population.
first quartile (designated Q1) = lower quartile cuts off lowest 25% of data (25th percentile ) second quartile (designated Q2) = median cuts data set in half (50th percentile ) third quartile (designated Q3) = upper quartile cuts off highest 25% of data, or lowest 75% (75th percentile )
The difference between the upper and lower quartiles is called the interquartile range.
( xi Q ) 2 n 1 ! s , i !1
Degrees of freedom
variance
( xi Q ) n 1 ! s i !1
Standard deviation
In statistics, the term degrees of freedom (df) is a measure of the number of independent pieces of information on which the precision of a parameter estimate is based.
http://www.jerrydallal.com/LHSP/dof.htm Jack Good's 1973 article in the American Statistician "What are Degrees of Freedom?" 27, 227-228
Skewness
Frequency
Value
Box-whisker plots
No precision
Random error
Distributions
Normal, binomial, Poisson, hypergeometric, t-distribution, chi-square What parameters describe their shapes How these distributions can be useful
Normal distribution
1 ( x Q ) / 2W 2 f ( x) ! e W 2T
A Normal Distribution . . .
For a mean of 5 and a standard deviation of 1:
You wont get the same answer every time, but if you make a lot of measurements, a histogram of your measurements will approach the appearance of a normal distribution. Any situation in which the exact value of a continuous variable is altered randomly from trial to trial. The random uncertainty or random error
Use the area UNDER the normal distribution For example, the area under the curve between x=a and x=b is the probability that your next measurement of x will fall between a and b
55.4
75
94.6
A normal distribution with a mean of 75 and a standard deviation of 10. The shaded area contains 95% of the area and extends from 55.4 to 94.6. For all normal distributions, 95% of the area is within 1.96 standard deviations of the mean.
55.4
94.6
Integration
1 ( x Q ) / 2W 2 f ( x) ! e W 2
If you made an infinite number of measurements, their mean would be Q and their standard deviation would be W
In practice, you have a finite number of measurements with mean x and standard deviation s
The z-score indicates the number of standard deviations that value x is away from the mean Q
z-transform
z=1.96 2.5%
mean=5 std=3
x>=10 4.8%
mean=0 std=1
x>=1,66 transform of mean: z = (5-5)/3 = 0 transform of other value: z = (10-5)/3 = 1.66
Exercises 1
*If scores are normally distributed with a mean of 30 and a standard deviation of 5, what percent of the scores is: (a) greater than 30? (b) greater than 37? (c) between 28 and 34? *What proportion of a normal distribution is within one standard deviation of the mean? *What proportion is more than 1.8 standard deviations from the mean? *A test is normally distributed with a mean of 40 and a standard deviation of 7. What value would be needed to be in the 85th percentile?
Binomial distribution
yes/no experiments (two possible outcomes) The probability of getting all tails if you throw a coin three times The probability of getting all male puppies in a litter of 8 The probability of getting two defective batteries in a package of six
Exercise 2
What is the probability of getting one 2 when you roll six dice?
bionomial coefficient
The probability of getting the result of interest k times out of n, if the overall probability of the result is p Note that here, k is a discrete variable
Integer values only
Binomial Distribution
n=6 p = 1/6 k = [0 1 2 3 4 5 6] number of dice rolled probability of rolling a 2 # of 2s out of 6 0.402
Binomial Distribution
n = 8; p = 1/2; k = [0 1 2 3 4 5 6 7 8]; number of puppies in litter probability of any pup being male # of males out of 8
Exercise 3
While you are in the bathroom, your little brother claims to have rolled a Yahtzee in 6s (five dice all 6s) in one roll of the five dice. How justified would you be in beating him up for cheating?
Poisson distribution
e Q Pn ( Q ) ! n!
P( )
Q
variance == mean
Poisson distribution
Randomly placed dots over 50 scale divisions. On average =1 dot per interval
=1
e Q Pn (Q) ! n!
P( )
Q
Exercise 4
e Q n (Q ) ! n!
Q
Average number of phone calls in 1 hour = 2.1 What is probability of getting 4 calls?
Exercise 5
e Q n (Q ) ! n!
Q
Average number of phone calls in 1 hour = 2.1 What is probability of getting 0 calls? Does this simplify the formula?
Hypergeometric distribution
Hypergeometric Distribution
Suppose that we have an urn with N balls in it; of these m are yellow and others are blue. Then k balls are drawn from the urn without replacement and of these X are observed to be yellow. X is a random variable following hypergeometric distribution What is the probability of observing X=6 yellow balls?
Hypergeometric Distribution
N=20,
0. 0
=n=k=10
Black k X nk+X n k
P(X)
N-k N
0.00
0.15
4 X
10
P(X = x) =
m n x k-x N k
We often want to ask whether there are more white balls in the sample than expected by chance.
White Balls drawn Remained in urn x m x m Black k x n k + x n k N-k N
P(X u x) =
x= x' to min(m,k)
If the probability is small, it is less likely that we get the result by chance.
Hypergeometric example
How many ALL and AML samples do you expect when you randomly select samples from the dataset?
Hypergeometric example
How many ALL and AML samples do you expect when you randomly select samples from the dataset?
Answer: 23.5 ALL 12.5 AML ratio = 1.88 (original ratio = 47/25=1.88)
Hypergeometric example
AML 7 18 25 36 36 72
36
= 0.006
Conclusion: This sample is significantly enriched with ALL samples possible bias in sample selection
Sampling Distribution
Every time we take a random sample and calculate a statistic, the value of the statistic changes (remember, a statistic is a random variable). If we continue to take random samples and calculate a given statistic over time, we will build up a distribution of values for the statistic. This distribution is referred to as a sampling distribution. A sampling distribution is a distribution that describes the chance fluctuations of a statistic calculated from a random sample.
The distribution of X , for a given sample size n, describes the variability of sample averages around the population mean .
n
Further,
X Qx Z! W n
n(100,5)
n(100,1.58)
80
85
90
95
100 X
105
110
115
120
80
85
90
95
100 X(10)
105
110
115
120
n(100,3.54)
n(100,1)
5/sqrt(2)=3.54
80 85 90 95 100 X(2) 105 110 115 120
80 85 90 95 100 X(25) 105 110 115 120
.42
28.5
29
29.5
30 X
30.5
31
31.5
. 26 .02
28.5 29 29.5 30 X(10) 30.5 31 31.5 28.5 29 29.5 30 X(100) 30.5 31 31.5
X) ! Q
The mean has variance Variance =
n
n As n increase, _ W x decrease.
standard error:
_ W x = W
When the Population is Normal then the Sampling Distribution is Also Normal Population Distribution Central Tendency
= 10
Q _ = Q
x
Q = 50 X
Variation W _ = W x n
Sampling Distributions
n=4 DX = 5
QX = 50 X
n =16 DX = 2.5
X
X X
W = 10
Q = 50 X Sampling Distributions
n=4 WDX = n =30 WDX = 1.8
Q _ = Q
x Variation W _ = W x n
QX ! 50
t-Distribution
So far, we have been assuming that we knew the value of . This may be true if one has a large amount of experience with a certain process. However, it is often true that one is estimating from the same set of data. along with
(Xi X ) W !s!
n 1
t-Distribution
To allow for such a situation, we will consider the t statistic:
X Qx T! S n
which follows a t-distribution.
S n
t-Distribution
t-Distribution
t(n=6)
t(n=g) = Z
t(n=3)
-4
-3
-2
-1
0 t
t-Distribution
If X is the mean of a random sample of size n taken from a normal population having the mean and variance 2, and
(Xi X ) S !
2
then
n 1
X Q t! S/ n
is a random variable following the t- distribution with parameter = n 1, where is degrees of freedom.
t-Distribution
The t-distribution has been tabularized. t represents the t-value that has an area of it. Note, due to symmetry, t1- = -t
t-Distribution
to the right of
-4
-3
-2
-1
0 t(n=3)
t.95
t.80
t.20
t.05
Example: t-Distribution
The resistivity of batches of electrolyte follow a normal distribution. We sample 5 batches and get the following readings: 1400, 1450, 1375, 1500, 1550.
X ! 1455
S ! 72
Example: t-Distribution
Support Refute
-4 -3 -2 -1 0 t(n=5) 1 2
p=0.025
Refute
3 4
1.71
t=2.78
Nonstatistical Hypothesis Testing A criminal trial is an example of hypothesis testing without the statistics. In a trial a jury must decide between two hypotheses. The null hypothesis is H0: The defendant is innocent The alternative hypothesis or research hypothesis is H1: The defendant is guilty The jury does not know which hypothesis is true. They must make a decision on the basis of evidence presented.
Nonstatistical Hypothesis Testing In the language of statistics convicting the defendant is called rejecting the null hypothesis in favor of the alternative hypothesis.
That is, the jury is saying that there is enough evidence to conclude that the defendant is guilty (i.e., there is enough evidence to support the alternative hypothesis).
If the jury acquits it is stating that there is not enough evidence to support the alternative hypothesis.
Notice that the jury is not saying that the defendant is innocent, only that there is not enough evidence to support the alternative hypothesis. That is why we never say that we accept the null hypothesis.
Nonstatistical Hypothesis Testing There are two possible errors. A Type I error occurs when we reject a true null hypothesis. That is, a Type I error occurs when the jury convicts an innocent person. A Type II error occurs when we dont reject a false null hypothesis. That occurs when a guilty defendant is acquitted.
Nonstatistical Hypothesis Testing The probability of a Type I error is denoted as . The probability of a type II error is . The two probabilities are inversely related. Decreasing one increases the other.
3.
3.
4.
Sample
Inference
Statistic Parameter
Hypothesis testing allows us to determine whether enough statistical evidence exists to conclude that a belief (i.e. hypothesis) about a parameter is supported by the data.
There are probabilities associated with each type of error: P(Type I error) = P(Type II error ) = is called the significance level.
Types of Errors
A Type I error occurs when we reject a true null hypothesis (i.e. Reject H0 when it is TRUE)
A Type II error occurs when we dont reject a false null hypothesis (i.e. Do NOT reject H0 when it is FALSE)
Example
A department store manager determines that a new billing system will be cost-effective only if the mean monthly account is more than 170. A random sample of 400 monthly accounts is drawn, for which the sample mean is 178. The accounts are approximately normally distributed with a standard deviation of 65. Can we conclude that the new system will be cost-effective?
Example
The system will be cost effective if the mean account balance for all customers is greater than 170. We express this belief as a our research hypothesis, that is: H1: > 170 (this is what we want to determine)
Thus, our null hypothesis becomes: H0: = 170 (this specifies a single value for the parameter of interest)
Example
What we want to show: H1: > 170 H0: = 170 (well assume this is true) We know: n = 400 = 178 = 65 Hmm. What to do next?!
Example
To test our hypotheses, we can use two different approaches: The rejection region approach (typically used when computing statistics manually), and The p-value approach (which is generally used with a computer and statistical software). We will explore both in turn
to reject H0.
Example
It seems reasonable to reject the null hypothesis in favor of the alternative if the value of the sample mean is large relative to 170, that is if > .
Example
All thats left to do is calculate and compare it to 178.
Example
At a 5% significance level (i.e. =0.05), we get
Solving we compute =175.34 Since our sample mean (178) is greater than the critical value we calculated (175.34), we reject the null hypothesis in favor of H1, i.e. that: > 170 and that it is cost effective to install the new billing system
=175.34 =178
p-Value
The p-value of a test is the probability of observing a test statistic at least as extreme as the one computed given that the null hypothesis is true. In the case of our department store example, what is the probability of observing a sample mean at least as extreme as the one already observed (i.e. = 178), given that the null hypothesis (H0: = 170) is true?
p-value
Another example The objective of the study is to draw a conclusion about the mean payment period. Thus, the parameter to be tested is the population mean. We want to know whether there is enough statistical evidence to show that the population mean is less than 22 days. Thus, the alternative hypothesis is H1: < 22 The null hypothesis is H0: = 22
Another example
The test statistic is
z!
x Q W/ n
We wish to reject the null hypothesis in favor of the alternative only if the sample mean and hence the value of the test statistic is small enough. As a result we locate the rejection region in the left tail of the sampling distribution. We set the significance level at 10%.
Another example
Rejection region: Assume
z
i
z E ! z.10 ! 1.28
4 , 759 ! ! 21 . 63 220
x x !
220
and
z!
x Q / n
21.63 22 6 / 220
! .91
Conclusion: There is not enough evidence to infer that the mean is less than 22.
One and TwoTail Testing The payment period example is a left tail test because the rejection region was located in the left tail of the sampling distribution.
Right-Tail Testing
Calculate the critical value of the mean ( ) and compare against the observed value of the sample mean ( )
Left-Tail Testing
Calculate the critical value of the mean ( ) and compare against the observed value of the sample mean ( )
TwoTail Testing
Two tail testing is used when we want to test a research hypothesis that a parameter is not equal () to some value
Example
KPN argues that its rates are such that customers wont see a difference in their phone bills between them and their competitors. They calculate the mean and standard deviation for all their customers at 17.09 and 3.87 (respectively). They then sample 100 customers at random and recalculate a monthly phone bill based on competitors rates. What we want to show is whether or not: H1: 17.09. We do this by assuming that: H0: = 17.09
Example
The rejection region is set up so we can reject the null hypothesis when the test statistic is large or when it is small.
stat is small
stat is large
That is, we set up a two-tail rejection region. The total area in the rejection region must sum to , so we divide this probability by 2.
Example
At a 5% significance level (i.e. = .05), we have /2 = .025. Thus, z.025 = 1.96 and our rejection region is: z < 1.96 -orz > 1.96
-z.025
+z.025
Example
From the data, we calculate = 17.55
We find that: Since z = 1.19 is not greater than 1.96, nor less than 1.96 we cannot reject the null hypothesis in favor of H1. There is insufficient evidence to infer that there is a difference between the bills of KPN and the competitor.
Two-Tail Test
Recall previous example H0: = 170 H1: > 170 At a significance level of 5% we rejected H0 in favor of H1 since our sample mean (178) was greater than the critical value of (175.34)
Example
= P( < 175.34 given that the null hypothesis is false)
We need to compute for some new value of . For example, suppose the true mean account balance is 180.
= P(
= 180), thus
Example
Our original hypothesis
Effects on
of Changing
, increases the value of
Consider this diagram again. Shifting the critical value line to the right (to decrease ) will mean a larger area under the lower curve for (and vice versa)
) goes to a negligible
E
100 - F (POWER)
Exercises
Exercises z-test (see word document)
The t-test
Recall t distribution.
Take random sample of size n from a N(Q,W2) population.
X Q W n
Consider
X Q . S n
This is approximately normal if n is large. If n is small, S is not expected to be close to W. S introduces additional variability. Thus this statistic will be more variable that a standard normal random variable. This statistic follows a t distribution with n-1degrees of freedom.
Confidence Intervals.
Suppose that the population is normally distributed with mean Q and variance W2. Then If W is known, a 100(1-E)% confidence interval for Q is.
xsz
/2
W
n
x s tE / 2 (n 1) s
One-sample t-test
We can use a confidence interval to test or decide whether a population mean has a given value. For example, suppose we want to test whether the mean height of women at University of South Florida (USF) is less than 68 inches. We randomly sample 50 women students at USF. We find that their mean height is 63.05 inches. The SD of height in the sample is 5.75 inches. Then we find the standard error of the mean by dividing SD by sqrt(N) = 5.75/sqrt(50) = .81. The critical value of t with (50-1) df is 2.01(find this in a t-table; alpha=0.025). Our confidence interval is, therefore, 63.05 plus/minus 1.63.
N=50
8
Pop Mean = 68
M = 63.05 SD=5.75
! .8 1
t=2.01
4
Frequenc
ci ! X s 1.63
2 0 40 50 60 70 80
Take a sample, set a confidence interval around the sample mean. Does the interval contain the hypothesized value?
Hei
t i I c es in Inc
X ! 63.05
12
Q ! 68
t istri ution
X Q ! 4.95
S X ! .8 1
4 .9 5 X Q ! ! 6 .1 1 t ! .8 1 SX
Frequenc
0 62
Height in Inches
70
The sample mean is roughly six standard deviations (St. Errors) from the hypothesized population mean. If the population mean is really 68 inches, it is very, very unlikely that we would find a sample with a mean as small as 63.05 inches.
Two-sample t-test
Used when we have two groups, e.g.,
Experimental vs. control group Males vs. females New training vs. old training method
The standard error of the difference is the root of the sum of squared standard errors of the mean.
SD 2 36 2 ! 100 WX ! N
SD ! 6 / 10 ! .6 WX ! N
W X1 X 2 ! W
2 X1
W
2 X2
The standard error for each group mean is .6, for the difference in means, it is .85.
We generally dont have population values. We usually estimate population values with sample data, thus:
S X1 X 2 ! S S
S 2 2 here S X ! N
2 X1
2 X2
All this says is that we replace the population variance of error with the appropriate sample estimators.
S X1 X 2 ! S S
2 X1
2 X2
We can use this formula when the sample sizes for the two groups are equal.
When the sample sizes are not equal across groups, we find the pooled standard error. The pooled standard error is a weighted average, where the weights are the groups degrees of freedom.
S X1 X 2
This says we find the value of t by taking the difference in the two sample means and dividing by the standard error of the difference in means.
Empathy Scores
Person
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Psychology
10 12 13 10 8 15 13 14 10 12 10 12 13 10 8
Physics
8 14 12 8 12 9 10 11 12 13 8 14 12 8 12
Empathy
From data N Mean SD SD2
2 SX
t X1 X 2
Physics 15 10.87 2.20 4.84 .323 Result .46 .78 .59 28
X1 X 2 ! S X1 X 2
Psychology 15 11.33 2.09 4.38 4.38/15=.292 Calculation 11.33-10.87 Sqrt(.292+.323) .46/.78 15+15-2 2.05
Term
X1 X 2
S X1 X 2
t df t(.05, 28 df) 2 tail
2.05>.59, n.s.
Exercise
Exercises t-test, see word document
Chi-square
2. Each observation falls into a cell (or class). 3. Observed frequencies in each cell: O1, O2, O3, , Ok. Sum of the observed frequencies is n.
O1 O2 O3 . Ok ! n
4. Expected, or theoretical, frequencies: E1, E2, E3, . . . , Ek.
1
.
!n
.
. .
kth
k k
Total n n
Goal: 1. Compare the observed frequencies with the expected frequencies. 2. Decide whether the observed frequencies seem to agree or seem to disagree with the expected frequencies. Methodology: Use a chi-square statistic:
(O E ) 2 G2 ! E all cells
Small values of G2: Observed frequencies close to expected frequencies. Large values of G2: Observed frequencies do not agree with expected frequencies.
Sampling Distribution of G2*: When n is large and all expected frequencies are greater than or equal to 5, then G2* has approximately a chi-square distribution. Recall: Properties of the Chi-Square Distribution: 1. G2 is nonnegative in value; it is zero or positively valued. 2. G2 is not symmetrical; it is skewed to the right. 3. G2 is distributed so as to form a family of distributions, a separate distribution for each different number of degrees of freedom.
Critical values for chi-square: 1. See Table. 2. Identified by degrees of freedom (df) and the area under the curve to the right of the critical value. 3. G2(df, E): critical value of a chi-square distribution with df degrees of freedom and E area to the right. 4. Chi-square distribution is not symmetrical: critical values associated with right and left tails are given separately.
G 2 (df , E )
G2
0.05
G 2 (16,0.05)
G2
Portion of Table
Area to the right df
0.05 26.3
/
16
Testing Procedure: 1. H0: The probabilities p1, p2, . . . , pk are correct. Ha: At least two probabilities are incorrect.
Example: A market research firm conducted a consumer-preference experiment to determine which of 5 new breakfast cereals was the most appealing to adults. A sample of 100 consumers tried each cereal and indicated the cereal he or she preferred. The results are given in the following table:
Cereal Frequency
A 25
B 17
C 15
D 22
E 21
Total 100
Is there any evidence to suggest the consumers had a preference for one cereal, or did they indicate each cereal was equally likely to be selected? Use E = 0.05.
Solution: If no preference was shown, we expect the 100 consumers to be equally distributed among the 5 cereals. Thus, if no preference is given, we expect (100)(0.2) = 20 consumers in each class. 1. The Set-up: a. Population parameter of concern: Preference for each cereal, the probability that a particular cereal is selected. b. The null and alternative hypotheses: H0: There was no preference shown (equally distributed). Ha: There was a preference shown (not equally distributed). 2. The Hypothesis Test Criteria: a. Assumptions: The 100 consumers represent a random sample. b. Test statistic: G2* with df = k 1 = 5 1 = 4 c. Level of significance: E = 0.05.
3. The Sample Evidence: a. Sample information: Table given in the statement of the problem. b. Calculate the value of the test statistic:
O 25 17 15 22 21 100
G2 = 3.2
E 20 20 20 20 20 100
O E 5 -3 -5 2 1 0
a. Critical value: G2(k 1, 0.05) = G2(4, 0.05) = 9.49 b. G2* is not in the critical region. 5. The Probability Distribution (p-Value Approach): a. The p-value: ! P ( G 2 * " 3.2 | df ! 4). Using computer: P = 0.5429. b. The p-value is larger than the level of significance, E. 6. The Results: a. Decision: Fail to reject H0. b. Conclusion: At the 0.05 level of significance, there is no evidence to suggest the consumers showed a preference for any one cereal.
r v c Contingency Table: 1. r: number of rows; c: number of columns. 2. Used to test the independence of the row factor and the column factor. Degrees of freedom: d ! ( r 1) (c 1) n = grand total. Expected frequency in the ith row and the jth column:
3. 4. 5.
Ei , j
Each Ei,j should be at least 5. 6. R1, R2, . . . , Rr and C1, C2, . . . Cc: marginal totals.
Political Party Democrat Republican Independent Total 34 11 12 57 (23.98) (15.33) (17.69) 17 12 18 47 (19.77) (12.64) (14.59) 10 16 15 41 (17.25) (11.03) (12.72) 61 39 45 145
4. The Probability Distribution (Classical Approach): a. Critical value: G2(4, 0.01) = 13.3 b. G2* is in the critical region. 4. The Probability Distribution (p-Value Approach): a. The p-value: ! P ( G 2 * " 14.16 | df ! 4) By computer: P = 0.0068. b. The p-value is smaller than the level of significance, E. 5. The Results: a. Decision: Reject H0. b. Conclusion: There is evidence to suggest that opinion on tax reform and political party are not independent.
ANOVA
Analysis of Variance
From t to F
In the independent samples t test, you learned how to use the t distribution to test the hypothesis of no difference between two population means. Suppose, however, that we wish to know about the relative effect of three or more different treatments?
From t to F
We could use the t test to make comparisons among each possible combination of two means. However, this method is inadequate in several ways. It is tedious to compare all possible combinations of groups. Any statistic that is based on only part of the evidence (as is the case when any two groups are compared) is less stable than one based on all of the evidence. There are so many comparisons that some will be significant by chance.
From t to F
What we need is some kind of survey test that will tell us whether there is any significant difference anywhere in an array of categories. If it tells us no, there will be no point in searching further. Such an overall test of significance is the F test, or the analysis of variance, or ANOVA.
ANOVA
Total Variation Among Scores
Between Groups Variation Variation due to chance and treat ent effect (if any exi ti .
There is a lot of variability from one mean to the next. Large differences between means probably are not due to chance. It is difficult to imagine that all six groups are random samples taken from the same population. The null hypothesis is rejected, indicating a treatment effect in at least one of the groups.
Same amount of variability between group means. However, there is more variability within each group. The larger the variability within each group, the less confident we can be that we are dealing with samples drawn from different populations.
The F Ratio
ANOVA (F)
Total Variation Among Scores Within- roups Variation Variation due to chance. Between- roups Variation Variation due to chance and treatment effect (if any existis).
F !1
The F Ratio
ANOVA (F)
Total Variation Among Scores Within- roups Variation Variation due to chance. Between- roups Variation Variation due to chance and treatment effect (if any existis).
MSbetween F! MSwithin
The F Ratio
MSbetween F! MSwithin
sum of squares within sum of squares between
MSwithin
SSwithin ! dfwithin
MSbetween
SSbetween ! df between
s !
(X X)
n 1
The F Ratio
MSbetween F! MSwithin
MSwithin SSwithin ! dfwithin
sum of squares total
MSbetween
SSbetween ! df between
SS bet
Find each group total, square it, and divide by the number of Grand Total (add all of the subjects in the group. scores together, then 2 2 T G square the total)
een
!7
N
Total number of subjects.
SS within ! 7( X X group )
Square each individual score and then add up all of the squared scores.
2
SS within ! 7X 7
n
Number of subjects in each group.
SStotal
G ! X N
2
Grand Total (add all of the scores together, then square the total)
Square each score, then add all of the squared scores together.
An Example: ANOVA
A study compared the intensity of pain among three groups of treatment. Determine the significance of the difference among groups, using the .05 level of significance. Treatment 1 7 6 5 6 Treatment 2 12 8 9 11 Treatment 3 8 10 12 10
An Example: ANOVA
State the research hypothesis.
Do ratings of the intensity of pain differ for the three treatments?
H 0 : Q1 ! Q 2 ! Q3 H A : H 0 is false.
Nondirectional Test
In testing the hypothesis of no difference between two means, a distinction was made between directional and nondirectional alternative hypotheses. Such a distinction no longer makes sense when the number of means exceeds two. A directional test is possible only in situations where there are only two ways (directions) that the null hypothesis could be false. H0 may be false in any number of ways. Two or more group means may be alike and the remainder differ, all may be different, and so on.
Degrees of Freedom
Between: df between ! number o groups - 1 Within:
df within ! n1 1 n 2 1 n3 1... df within ! total number o sub ects - total number o groups
An Example: ANOVA
Set decision rule.
An Example: ANOVA
Set the decision rule.
E ! .05 df between ! 2
df within ! 9 Fcrit ! 4.26
An Example: ANOVA
Calculate the test statistic.
X2
X2
X2
SS within
T2 ! 7X 7 n
2
SS within
An Example: ANOVA
Calculate the test statistic.
X2
X2
X2
SS between
T 2 G2 !7 n N
SS between
An Example: ANOVA
MS between ! MS within MS between MS within SS between 42.67 ! ! ! 21.34 df between 2 SS within 20 ! ! ! 2.22 df within 9
An Example: ANOVA
Determine if your result is significant. Reject H0, 9.61>4.26 Interpret your results. There is a significant difference between the treatments. ANOVA Summary Table In the literature, the ANOVA results are often summarized in a table.
df 2 9 11
SS 42.67 20 62.67
MS 21.34 2.22
F 9.61
Exercise 6: ANOVA
A psychologist interested in artistic preference randomly assigns a group of 15 subjects to one of three conditions in which they view a series of unfamiliar abstract paintings.
The 5 participants in the famous condition are led to believe that these are each famous paintings. The 5 participants in the critically acclaimed condition are led to believe that these are paintings that are not famous but are highly thought of by a group of professional art critics. The 5 in the control condition are given no special information about the paintings.
Does what people are told about paintings make a difference in how well they are liked? Use the .01 level of significance.
Famous 10 7 5 10 8 Critically Acclaimed 5 1 3 7 4 No Information 4 6 9 3 3
Linear models
Find values for a and b such that sum of squared error is minimized
minimize R ( a, b) !
(y
i !1 n i !1
* i
yi )
! ((axi b) yi )
A minimum of a function (R) is characterized by a zero first derivative with respect to the parameters
dy x !0 ! 2 (0) ! 0 dx
minimize
(a, b) ! ((axi b ) yi )
i !1
Example:
y ! 2 x ax
Example:
y ! a log( x)
F1 2 y ! F 0 F 2 x2 x1
-y is not a linear combination of xs -linear in the parameters -We can use MLR if variables are transformed x1=1/x1 x2=x2 y = 0 + 1x1 + 2x2 +
y!e
F0 F1 x
y ! e F0 F1x dy ! xe F0 F1x d F1
Nonlinear model
Multiple testing
Say that you perform a statistical test with a 0.05 threshold, but you repeat the test on twenty different observations. Assume that all of the observations are explainable by the null hypothesis. What is the chance that at least one of the observations will receive a p-value less than 0.05?
Multiple testing
Say that you perform a statistical test with a 0.05 threshold, but you repeat the test on twenty different observations. Assuming that all of the observations are explainable by the null hypothesis, what is the chance that at least one of the observations will receive a p-value less than 0.05? Pr(making a mistake) = 0.05 Pr(not making a mistake) = 0.95 Pr(not making any mistake) = 0.9520 = 0.358 Pr(making at least one mistake) = 1 - 0.358 = 0.642 There is a 64.2% chance of making at least one mistake.
no difference
no difference
Day 1 Day 2
statistical test (alpha=0.05) statistical test (alpha=0.05) Change of 64.2% of finding at least one significant difference Overall Type 1 error = 64.2%
Day 20
Bonferroni correction
Assume that individual tests are independent. Divide the desired p-value threshold by the number of tests performed. For the previous example, 0.05 / 20 = 0.0025. Pr(making a mistake) = 0.0025 Pr(not making a mistake) = 0.9975 Pr(not making any mistake) = 0.997520 = 0.9512 Pr(making at least one mistake) = 1 - 0.9512 = 0.0488
meaning that the probability of one of the total number of tests being wrongfully said to be significantly different is of magnitude alpha (0.0488) This is also known as correcting for the Family Wise Error (FWE). It is clear though that this highly increases the beta error (false negative), which is that many tests that should show an effect get below the corrected threshold.
no difference
Day 1 Day 2
statistical test (alpha=0.0025) statistical test (alpha=0.0025) Change of 4.88% of finding at least one significant difference Overall Type 1 error = 4.88%
Day 20
2. Multiple comparison
# of nonrejected hypotheses #Ho = true U # of rejected hypotheses
V
(false positives)
m0
# Ho = false
T
(false negatives)
m1
m-R
R
discoveries
R = # rejected hypotheses = # discoveries of these may be in error = # false discoveries The error (type I) in the entire study is measured by
V Q! R !0
R"0 R!0
i.e. the proportion of false discoveries among the discoveries (0 if none found) FDR = E(Q) Does it make sense?
Reject
0 2)
,...,
0 k)
FDR example
q=0.05 m=1000
Rank 1 2 3 4 5 6 7 8 9 10 1000 (j )/m 0.00005 0.00010 0.00015 0.00020 0.00025 0.00030 0.00035 0.00040 0.00045 0.00050 0.05000 p-value 0.0000008 0.0000012 0.0000013 0.0000056 0.0000078 0.0000235 0.0000945 0.0002450 0.0004700 0.0008900 1.0000000
Choose the threshold so that, for all the genes above it, (jq)/m is larger than the corresponding pvalue. Approximately 5% of the examples above the line are expected to be false positives.
False discovery rate When to use FWER and when to use FDR? FWER=Pr(V>0)=Pr(at least one false positives) FDR=E(Q)=E(V/R)
1. Choose FWER if high confidence in ALL selected genes is desired (for example, selecting candidate genes for RTPCR validation). Loss of power due to strong control of type-I error.
2. Use more flexible FDR procedures if certain proportions of false positives are tolerable (e.g. gene discovery).