You are on page 1of 43

Stats 95

t-Tests

William Sealy Gosset

• Single Sample
• Paired Samples
• Independent Samples
t Distributions
• t dist. are used when we
know the mean of the
population but not the SD of
the population from which
our sample is drawn
• t dist. are useful when we
have small samples.
• t dist is flatter and has fatter • Same Three Assumptions
tails • Dependent Variable is scale
• As sample size approaches • Random selection
30, t looks like z (normal) • Normal Distribution
dist.
Fat Tails Lose Weight With Larger
Sample Size
The Robust Nature of the t
Statistics
Unfortunately, we very seldom know the if the population is
normal because usually all the information we have about a
population is in our study, a sample of 10-20.
Fortunately,
1) distributions in social sciences often approximate a normal
curve, and
2) according to Central Limit Theorem the sample mean you
have gathered is part of a normal distribution of sample
means, and
3) in practice t tests statisticians have found the test is accurate
even with populations far from normal
The Robust Nature of the t
Statistics
• The only situation in which using a t test is
likely to give a seriously distorted result is
when you are using a one-tailed test and the
population is highly skewed.
When To Use Single Sample
t-Statistic
• 1 Nominal Independent Variable
• 1 Scale Dependent Variable
• Population Mean, No Population Standard
Deviation
Check Assumptions
• Random Selection
• Normal Shape Distribution
z Statistic Versus t Statistic
z Statistic t Statistic
• When you know the Mean • When you do not know the
and Standard deviation of a Mean and Standard
population. Deviation of the population
– E.g., a farmer picks 200,000 – E.g., a farmer picks 30 out of
apples, the mean weight is his 200,000 apples, and finds
112 grams, the SD is the sample has a Mean of 112
12grams. grams.
• Calculate the Standard Error • Calculate the Estimate of
of the sample mean the Standard Error of the
sample mean
Scenarios When you would use a
Single Sample t test
• A newspaper article reported that the typical American family spent an
average of $81 for Halloween candy and costumes last year. A sample of
N = 16 families this year reported spending a mean of M = $85, with s =
$20. What statistical test would we use to determine whether these data
indicate a significant change in holiday spending?

• Many companies that manufacture lightbulbs advertise their 60-watt bulbs


as having an average life of 1000 hours. A cynical consumer bought 30
bulbs and burned them until they failed. He found that they burned for an
average of M = 1233, with a standard deviation of s = 232.06. What
statistical test would this consumer use to determine whether the average
burn time of lightbulbs differs significantly from that advertised?
Difference Between Calculating
z Statistic and t Statistic
z Statistic t Statistic
( X  M ) 2 Standard Deviation
 X   
2
  s of a Sample:
N N 1 Estimates the
Population
Standard Deviation

 s Standard Error of a
m  sm  Sample: estimates
N N
the Sample Error of
the Population

(M  M ) (M   m ) t Statistic
z t for Single-
M sm Sample t
Test
Estimating Population from a
Sample
• Main difference between t Tests and z score:
– use the standard deviation of the sample to estimate the
standard deviation of the population.
• How? Subtract 1 from sample size! (called degrees of
freedom)
 X   
Standard Deviation
2
( X  M ) 2
SD  s of a Sample:
N N 1 Estimates the
Population
Standard Deviation

• Use degrees of freedom (df) in the t distribution chart


t Distribution
Table
Example of Single Sample t Test
• The mean emission of all engines of a new design needs to be below
20ppm if the design is to meet new emission requirements. Ten engines are
manufactured for testing purposes, and the emission level of each is
determined. Data:
• 15.6, 16.2, 22.5, 20.5, 16.4, 19.4, 16.6, 17.9, 12.7, 13.9
• Does the data supply sufficient evidence to conclude that type of engine
meets the new standard, assuming we are willing to risk a Type I error
(false alarm, reject the Null when it is true) with a probability = 0.01?
• Step 1: Assumptions: dependent variable is scale, Randomization, Normal
Distribution
• Step 2: State H0 and H1:
– H0 Emissions are equal to (or lesser than) 20ppm;
– H1 Emissions are greater than 20ppm (One-Tailed Test)
Example of Single Sample t Test
• The mean emission of all engines of a new design needs to be below
20ppm if the design is to meet new emission requirements. Ten engines are
manufactured for testing purposes, and the emission level of each is
determined. Data:
• 15.6, 16.2, 22.5, 20.5, 16.4, 19.4, 16.6, 17.9, 12.7, 13.9
• Step 3: Determine Characteristics of Sample ( X  M ) 2
s
Mean =
Standard Deviation of Sample =
N 1
Standard Error of Sample =
• Step 4: Determine Cutoff s
– df = N-1 = 10-1 =9 s m 
– t statistic cut-off = -2.822 N
Example of Single Sample t Test
• The mean emission of all engines of a new design needs to be below
20ppm if the design is to meet new emission requirements. Ten engines are
manufactured for testing purposes, and the emission level of each is
determined. Data:
• 15.6, 16.2, 22.5, 20.5, 16.4, 19.4, 16.6, 17.9, 12.7, 13.9
• Step 3: Determine Characteristics of Sample
Mean M = 17.17
( X  M )
2
s
Standard Deviation of Sample s = 2.98 N  1
s
Standard Error of Sample sm =s m0.942

N
• Step 4: Determine Cutoff
– df = N-1 = 10-1 =9 Step 5: Calculate t Statistic
– t statistic cut-off = -2.822
(M   m )
t
sm
Example of Single Sample t Test
• The mean emission of all engines of a new design needs to be below
20ppm if the design is to meet new emission requirements. Ten engines are
manufactured for testing purposes, and the emission level of each is
determined. Data:
• 15.6, 16.2, 22.5, 20.5, 16.4, 19.4, 16.6, 17.9, 12.7, 13.9
• Mean M = 17.17 Standard Deviation of Sample s = 2.98
Standard Error of Sample sm = 0.942

Step 5: Calculate t Statistic Step 6: Decide (Draw It)


t statistic cut-off = -2.822
( M   m ) (17.17  20) t statistic = -3.00
t   3.00 Decide to reject the Null
sm 0.942 Hypothesis
How to Write Results
• t(7) = -.79, p < .265, d = -.29
• t Indicates that we are using a t-Test
• (9) Indicates the degrees of freedom associated with this t-Test
• -.79 Indicates the obtained t statistic value
• p < .265 Indicates the probability of obtaining the given t
value by chance alone
• d = -.29 Indicates the effect size for the significant effect (the
magnitude of the effect is measured in standard deviation
units)
Paired Sample t Test
• The paired samples test is a kind of research called repeated
measures test (aka, within-subjects design), commonly used in
before-after-designs.
• Comparing a mean of difference scores to a distribution of
means of difference scores
• Checklist for Paired-Samples:
– 1 Nominal DICHOTOMOUS (with two levels) Independent Variable
– 1 Scale Dependent Variable
– Paired Observations are Dependent
– Assumptions
• Random selection & Shape
Paired-Samples t-Test
Paired Sample t Test
• The paired samples test is a kind of research called repeated
measures test (aka, within-subjects design), commonly used in
before-after-designs.
• Comparing a mean of difference scores to a distribution of
means of difference scores
– Population of measures at Time 1 and Time 2
– Population of difference between measures at Time 1 and Time 2
– Population of mean difference between measures at Time 1 and Time 2
– (Whew!)
Paired Sample t Test
Single-Sample Paired-Sample
• Single observation from • Two observations from each
each participant participant
• The observation is • The second observation is
independent from that of dependent upon the first
the other participants since they come from the
• Comparing a mean score to same person.
a distribution of mean • Comparing a mean of
scores . difference scores to a
distribution of means of
difference scores
• (I don’t make this stuff up)
Paired Sample t Test

• A distribution of
• Two distribution of scores.
differences between
scores.
• Central Limit Theorem Revisited. If you plot the mean of
randomly sampled observations, the plot will approach a
normal distribution. This is true for scores and for
differences between matched scores.
Difference Between Calculating
Single-Sample t and Paired-Sample t Statistic
Single Sample t Statistic Paired Sample t Statistic

( X  M ) 2 Standard
SS ( D x  y  M x  y ) 2 SS
s  Deviation s 
N 1 N 1 of a N 1 N 1
Sample
Standard Deviation of Sample
Standard Error of a Differences
s
sm  Sample

N s Standard Error of
sm  Sample
Differences
(M   m ) t Statistic for N
t Single-
sm Sample t
Test (M x y   x y ) t Statistic for
t Paired-Sample t
sm Test
Paired Sample t Test Example
• We need to know if there is a difference in the salary for the same job in
Boise, ID, and LA, CA. The salary of 6 employees in the 25 th percentile in
the two cities is given .
Profession Boise Los Angeles
Executive Chef 53,047 62,490
Genetics Counselor 49,958 58,850
Grants Writer 41,974 49,445
Librarian 44,366 52,263
School teacher 40,470 47,674
Social Worker 36,963 43,542

• Six Steps of Hypothesis testing for Paired Sample Test


Paired Sample t Test Example
• We need to know if there is a difference in the salary for the
same job in Boise, ID, and LA, CA.
• Step 1: Define Pops. Distribution and Comparison
Distribution and Assumptions
– Pop. 1. Jobs in Boise
– Pop. 2.. Jobs in LA
– Comparison distribution will be a distribution of mean differences, it
will be a paired-samples test because every job sampled contributes
two scores, one in each condition.
– Assumptions: the dependent variable is scale, we do not know if the
distribution is normal, we must proceed with caution; the jobs are not
randomly selected, so we must proceed with caution
Paired Sample t Test Example
• We need to know if there is a difference in the salary for the same job in Boise, ID,
and LA, CA.
• Step 3: Determine the Characteristics of Comparison Distribution (mean,
standard deviation, standard error)
• M = 7914.333 Sum of Squares (SS) = 5,777,187.333
SS 5,777,186.333 s 1074.93
s   1074.93 sm    438.83
N 1 5 N 6

Profession Boise Los Angeles X-Y D (X-Y)-M D^2


M = -7914.33
Executive Chef 53,047 62,490 -9,443 -1,528.67 2,336,821.78
Genetic Counselor 49,958 58,850 -8,892 -977.67 955,832.11
Grants Writer 41,974 49,445 -7,471 443.33 196,544.44
Librarian 44,366 52,263 -7,897 17.33 300.44
School teacher 40,470 47,674 -7,204 710.33 504,573.44
Social Worker 36,963 43,542 -6,579 1,335.33 1,783,115.11
Paired Sample t Test Example
• We need to know if there is a difference in the salary for the same job in Boise, ID, and LA, CA.
• Step 4: Determine Critical Cutoff
• df = N-1 = 6-1= 5
• t statistic for 5 df , p < .05, two-tailed, are -2.571 and 2.571
• Step 5: Calculate t Statistic

• Step 6 Decide

(M x y   x y ) (7914.333  0)
t   18.04
sm 438.333
How to Write Results
• t(5) = 18.04, p < .00001, d = xxxx.
• t Indicates that we are using a t-Test
• (5) Indicates the degrees of freedom associated with this t-Test
• 18.04 Indicates the obtained t statistic value
• p < .000001 Indicates the probability of obtaining the given t
value by chance alone
• d = xxxx Indicates the effect size for the significant effect (the
magnitude of the effect is measured in units of standard
deviations)
Independent t-Test
Independent t Test
• Compares the difference between two means of
two independent groups.
• You are comparing a difference between means to
a distribution of differences between means.
– Sample means from Group 1 and Group 2 compared to
a Population of differences between means of Group 1
and Group 2
Independent t Test
Paired-Sample Independent t Test
• Two observations from each • Single observation from each
participant participant from two
• The second observation is independent groups
dependent upon the first • The observation from the
since they come from the second group is independent
same person. from the first since they come
from different subjects.
• Comparing a mean
difference to a distribution • Comparing the difference
of mean difference scores between two means to a
distribution of differences
between mean scores .
Independent t Test: Steps
Calculate the corrected  ( X  M ) 2
 (Y  M ) 2
Step 1
variance for each sample s x2  s y2 
N 1 N 1

Calculate the degrees of


Step 2 freedom
df total  df x  df y

 df x  2  df y  2
Step 3 Calculate the pooled s 2
pooled    s x    s y
variance  df total   df total 
In a z test you find SD, for ind.
t, you take weighted avg of SD2
s 2pooled s 2pooled Like the
Step 4 Calculate the squared  
2 2
s s My
estimate of the standard Mx
Nx Ny Standard
error for each sample error in a
z test
Step 5 Calculate estimated variance
2
s Difference  s Mx
2
 s My
2
of the dist. of mean
differences
Independent t Test: Steps
Calculate the squared s 2pooled s 2pooled
Step 4
2
s Mx  2
s My 
estimate of the standard Nx Ny
error for each sample

Calculate estimated variance


Step 5
of the dist. of mean
2
s Difference  s Mx
2
 s My
2

differences

Calculate estimated standard


Step 6 deviation of the dist. of mean s Difference  s Difference
2

differences

Step 7
t Statistic for an independent- (M X  M Y )
Samples t Test t
s Difference
Confidence Interval
( M X  M Y ) Lower  t ( s Difference )  ( M X  M Y ) Sample
for Independent
Samples t-test ( M X  M Y )Upper  t ( s Difference )  ( M X  M Y ) Sample

Confidence Interval for Paired Sample


t-test with example from Stroop
question.
M Lower
 t ( s m )  M Sample  2.365(2.052)  13.556  8.70302
M Upper  t ( s m )  M Sample  2.365(2.052)  13.556  18.4089

Effect Size for (M X  M Y ) Effect Size for (M   )


d Dependent Samples t- d
Independent Samples s pooled s
t-test test
s Standard Error of a Sample:
sm  estimates the Sample Error of
the Population
N (M   m ) t Statistic
t for Single-
sm Sample t
Test

Standard Deviation of a Sample:


( X  M ) 2
s Estimates the Population Standard
N 1 Deviation
(M  M y )
x t
t Statistic
s Difference for
Indepen

df  N  1 Degrees of Freedom for Single


Sample t Test, and Paired Sample
dent t
Test

( X  M ) 2 Variance for a (M x y   x y ) T Statistic for

s 
2 sample t Paired-Sample t
sm Test
N 1
df  df ( x)  df ( y ) Degrees of Freedom for
Independent Samples t Test

Pooled Variance. Like adding


 df x  2  df y  2 together the weighted average of
s Pooled
2
   s x    s y the variance from Variable X and
 df total   df total  Variable Y.

s 2 s 2pooled Variance for a Distribution of


2
s Mx 
pooled
s 2
My  means for Indep.-Samples t Test
Nx Ny

s 2
Difference s 2
Mx s 2 Variance for a Distribution of
My Differences between Means

SD of the distribution of
s Difference  s 2
Difference
Differences Between Means
( X  M )  Y2 M ) 2
SS s 760 760  SS  272  272  136.333
( 2
sx 
2
    190
y
N 1 df 5  1 N 41 df 3  1 2

2
s Difference  s Mx
2
 s My
2
 34.422  57.37  91.792

 df x  2  df y  2
 s y   190   136.333  172.11
4 2
s 2
pooled    s x  
 df total   df total  6 6

df total  df x  df y  4  2  6
s 2pooled 172.11
s 2
My    57.37
    11.333  .1235
Ny 3
2
s
s 2
( M X  172
pooled
47  58.333
M.Y11) 34.422
t Mx
Nx 5 
s Difference 91.792 91.792
ssm 

s ( M

5
x
.
 y  
804862 x y )
5 2
.804862 SS 2.052329
61.09
2
N 8 2 . 
82843  
N 1 N 1 6 1
( M   m ) 29.86  30
t   -0.2171
sm .645

( M x  y   x  y ) 2 SS
s  
N 1 Ns 13.49543
sm   
N 6
( M x  y   x  y ) 2 SS
s   12.218
N 1 N 1
( X  M ) 2 SSx 760
s  2
x   190
N 1 N 1 4 s 2

s 2pooled

172.1
 57.37
s 2pooled
My

172.1 Ny 3

df total  df s dfy N   34.42


2
x Mx 6 2
x
s Difference  s Mx  s My  34.42  57.37  91.7926
2 2

 df  2  df y  2
 s y   190   136.3  172.1
4 2
s 2
pooled   x  s x  
 df total   df total  6 6

s 2
pooled s 2
pooled
s Difference  s Difference
2
 91.7926  9.580845
s 2
Mx  s 2
My 
Nx Ny
( M X  M Y ) 47  58.333  11 .333
t    1.18292
s Difference 9.580845 9.580845
2
s Difference  s Mx
2
 s My
2

(Y  M ) 2 SSy 272.667


s    2
 136.3
M   17.5  20.4  2N.9 1 N 1 2
y
Central Limit Theorem in Single
Independent T-tests: Samples

Two samples (moms/non-moms; drunk /sober students


etc.) independent of each other (subjects participate in
only one condition) which you draw from a universe of
respective possible samples.
Central Limit Theorem in Single Independent t-tests:
Distribution of Samples

Two samples (moms/non-moms; drunk /sober students


etc.) independent of each other (subjects participate in
only one condition) is drawn from a hypothetical
distribution of samples of the same size, with mean of
samples and a standard error.
Central Limit Theorem in Single Independent t-tests:
Distribution of Samples
Difference between the Difference between the means. Would it
means. Δ Mx-My occur with a probability of 5% or less?

We want to know if there is a statistically significant


difference between the means.
Central Limit Theorem in Single Independent t-tests:
Distribution of Differences Between Means
Difference between the means, Δ Mx-
My, comes from a distribution of Δ Mx-
My that has a standard deviation drawn
from the pooled variance of the sample
distributions

We want to know if there is a statistically significant


difference between the means.
Central Limit Theorem in Single
Independent T-tests

You might also like