You are on page 1of 68

EX AM 3 REVIEW

H C D 3 0 0 – B I O S TA S T I S T I C
WHAT WILL BE COVERED?
• UNIT 6:
– CHAPTER 11 & 12: TWO SAMPLE T-TEST (6.1 & 6.2)
– CHAPTER 13 & 14: ANOVA (6.3 & 6.4)

• UNIT 7:
– CHAPTER 15: CORRELATION AND SHARED VARIANCE (7.1)
– CHAPTER 16: LINEAR REGRESSION (7.2)
– CHAPTER 17: CHI-SQUARE (7.3 & 7.4)
T-TEST OR STUDENT’S T-TEST
• One of the most commonly
used techniques for testing Z – TEST T - TEST
a hypothesis * Sample size over 30 * Sample size under 30
• Determines a probability * σ is known * σ is unknown
that two groups are the * not used often in * used most often in
same with respect to the practice, because σ is practice
variable tested rarely known, and
when sample size is
• Can be used to compare >30, the t-distribution
the mean of one sample mirrors z- distribution
and a population or to
compare the mean of two
samples
TWO SAMPLE T-TEST
• A two-sample t-test is used to examine • Independent Samples: If values in one
differences between two samples sample reveal no information about or
• Two-sample t-tests have the following are unrelated to those of the other
assumptions: sample. For independent samples, you
have to specify variance of the two
– Random sampling or selection
groups.
– Interval or ratio scale of
• Dependent samples: If the values in
measurement
one sample affect or are related to the
– Normality values in the other sample.
• Sample type : Independent or • As a general rule, you can assume
dependent sample equal variance if the ratio of the
• Variance : Equal or unequal sample variances (s12/s22) is between
0.5 and 2.
TWO-SAMPLE TEST FOR
INDEPENDENT SAMPLES
TWO-SAMPLE T-TEST: INDEPENDENT SAMPLES
• Compare the difference in means between two groups
• Groups do not need to be equal sizes
• Can be calculated by hand, excel, and online calculators

EQUAL VARIANCE: UNEQUAL VARIANCE:

X1  X 2
t
 (n1  1) s12  (n2  1) s22   n1  n2 
 n1  n2  2  nn 
  1 2 
• X 1 mean for Group 1
• X 2 mean for Group 2
• n1 number of participants in Group 1
• n2 number of participants in Group 2
• s12 variance for Group 1
• s22 variance for Group 2
CALCULATED BY EXCEL FUNCTION

• =T.TEST(array1, array2, tails, type)


array1 = cell addresses for the first set of data
array2 = cell addresses for the first set of data
tails = 1 or 2 depending on whether the test is a one-tailed or two-tailed test
type =
• 1 if a paired t test
• 2 if a two-sample test (independent with equal variances)
• 3 if it is a two-sample test with unequal variances
• Note that the T.TEST result is not the t test statistic, it is the p-value
CALCULATED BY DATA ANALYSIS TOOL
PAK
DESCRIPTION OF RESULTS
Statistic Description
Mean average score for each variable
Variance Variance for each variable
Observations Number of observations in each group
Pooled Variance Variance for both groups
Hypothesized Mean What you may have indicated to be the difference you expect
Difference
df Degrees of freedom
t Stat Value of the t statistic
P(T<=t) one-tail Probability of t occurring by chance for a one-tailed test
t Critical one-tail Critical value one needs to exceed for a one-tailed test
P(T<=t) two-tail Probability of t occurring by chance for a two-tailed test
t Critical two-tail Critical value one needs to exceed for a two-tailed test
DEGREE FREEDOM OF INDEPENDENT
SAMPLES T-TEST
• For the two-sample t-test with independent samples (equal variance ONLY):
n1  1  n2  1
where n is the number of observations

Degrees of Freedom (Unequal Variance):


Observe output from Data Analysis on Excel 𝑠2 2 2
1 + 𝑠2
v = n-1 𝑛1 𝑛2
s = standard deviation Df = 𝑠4 4
1 + 𝑠2
n = observation per sample 𝑛2 2
1 𝑣1 𝑛1 𝑣2
INDEPENDENT SAMPLE T-TEST
• Interpretation • The Steps Restated

t58  0.14, p  .05 1. State the null and alternative


hypothesis
t represents the test statistic used 2. State the test statistic formula
58 is the number of degrees of freedom 3. State the level of significance
-0.14 is the obtained value (from the formula) 4. Compute the test statistic/obtained
p > .05 indicates the p-value value

Limitations: 5. Determine the critical value or p-


value
Only evaluates means
6. Determine the statistical conclusion
Affected by sample size
7. State the experimental conclusion
Results from different studies cannot
be compared
COHEN’S D FOR TWO SAMPLE TEST
• Measure of effect
• Equal variance: • Unequal variance:
| 𝑥1 − 𝑥2 | | 𝑥1 − 𝑥2 | Effect Size for Cohen’s d
ES = ES = 𝑠2 + 𝑠2
√𝑠𝑝2 1 2

2
• ES effect size small .20
X1 • ES effect size
• mean for Group 1
X2 • 𝑥 1 mean for Group 1
• mean for Group 2 medium .50
• 𝑥 2 mean for Group 2
• 𝑠𝑝2 pooled variance • 𝑠12 Variance for Group 1 large .80
𝑛1−1 𝑠12 + 𝑛2−1 𝑠12
𝑠𝑝2 = • 𝑠22 Variance for Group 2
𝑛1+𝑛2−2
TWO-SAMPLE TEST FOR
DEPENDENT SAMPLE
TWO-SAMPLE TEST: DEPENDENT SAMPLES
• Compare the difference in means Formula for calculation by hand:
between two groups
• Also called matched sample or paired t
D
t-test n D 2  ( D ) 2
n 1
• Groups are matched, so sample size
will be equal
–  D = sum of all the
• Degrees of freedom approximate the differences between groups
sample size
–  D = sum of differences
2

• Degrees of freedom can vary based on squared between groups


the test statistic selected
– n = number of paired
• For this two-sample t-test with
dependent samples:
observations
n – 1 (where n is the number of pairs)
CALCULATE TWO-SAMPLE T-TEST: DEPENDENT
SAMPLES BY EXCEL
• Functions
– T.TEST
• Analysis toolpak
DESCRIPTION OF RESULT
Statistic Description
Mean average score for each variable
Variance Variance for each variable
Observations Number of observations in each group
Pearson correlation Correlation between the two variables
Hypothesized Mean What you may have indicated to be the difference you expect
Difference
df Degrees of freedom
t Stat Value of the t statistic
P(T<=t) one-tail Probability of t occurring by chance for a one-tailed test
t Critical one-tail Critical value one needs to exceed for a one-tailed test
P(T<=t) two-tail Probability of t occurring by chance for a two-tailed test
t Critical two-tail Critical value one needs to exceed for a two-tailed test
LIMITATIONS:
• Assumption of normal distribution
• Assumption of equal variances
• Only evaluate means
– Can not make conclusions about individual scores
• Affected by sample size
– Results of analyses from different studies can not be compared
• Can be use for independent and dependent samples: make sure you select the proper
test
• Because the samples are related to each other, there are more complicated methods
for calculating a measure of effect for dependent samples. For simplicity we will be
using Cohen’s d for both independent and dependent samples
| 𝑥1 − 𝑥2 |
ES =
√𝑠𝑝2
ANOVA – ANALYSIS OF VARIANCE
ANOVA
• ANOVA examines the variance within and • ANOVA has the following
between groups assumptions:
• Similar to the t test, but with more than two – Random sampling or
groups selection
• Test statistic = F ratio – Quantitative data
• F ratio will be 0 or positive, never negative – Normality
• F test is an “omnibus test” and tells you only – Samples are independent
that a difference exists – Equal variance
• Must conduct follow-up t tests to find out where
the difference is
ONE-WAY ANOVA
• One-way ANOVA: also called simple or single factor
ANOVA
– One treatment or factor is being examined

STEPS FOR TESTING ONE-WAY ANOVA


• Develop hypothesis
H 0 : 1  2  3  ...
H1 : At least one mean is different
• Run one-way ANOVA test
• Interpret results
CALCULATING BY HAND
• Calculated by dividing the difference between the
groups by the difference within the group
MeanSquaresBetween
F
MeanSquaresWithin

• Want the within-group variance to be small and the


between-group variance to be large to find significance
DEGREE OF FREEDOM
• Between-groups estimate
• Number of groups minus
one
k-1
• Within-group estimate
• Total sample size minus
the number of groups
n-k
CALCULATING BY EXCEL
TWO-WAY ANOVA
• Compares the means of different groups that have
been split on independent variables (called factors)
• ANOVA with two or more independent variables
• Primary purpose is to understand if there is an
interaction between the two independent variables
• Main effects: effect of each of the independent
variables
• Interaction: the effect of one variable depends on the
level of the other
• Cannot use ANOVA without replication in Excel to
determine if there is interaction between variables
STEPS TO RUN TWO-WAY ANOVA
• Develop hypotheses
• H0: 𝜇𝐴1 = 𝜇𝐴2
• H0: 𝜇𝐵1 = 𝜇𝐵2
• H0: 𝜇𝐴1∗𝐵1 = 𝜇𝐴2∗𝐵1 = 𝜇𝐴2∗𝐵2 = 𝜇𝐴2∗𝐵2
• Alternative hypotheses
– At least one group is different on the main effects
– There is an interaction effect (the results are different
depending on the combination of independent variables)
• Test for interaction
• Run two-way ANOVA test
• Interpret results
CALCULATING TWO-WAY ANOVA
• With replication • Without replication
– Comparable to dependent samples – Comparable to independent
– Also called repeated measures ANOVA samples
– Use replication if you have more than – Also called within measures
one measure per level ANOVA
INTERACTION
• The effect of one variable may depend F(2.27)  8.80, p  .05
on the level of another variable
• F = test statistic
• Interaction term included in the output
• 2,27 = df between groups and df
for a two-way ANOVA with replication
within groups
• 8.80 = obtained value
• p < 0.05 = p-value is less than 0.05
CORRELATION
CORRELATION
• Correlation is a measure of effect • Pearson’s r correlation
• Way to express the relationship between – Interval ratio- continuous data
two variables – Assumptions:
• Random selection
• Take data, make a scatterplot, and fit a line
• Data is normally distributed
over the data
• Linearity
• Positive correlation • Homoscedasticity
– Direction correlation • Phi coefficient
• When variables change in the – Nominal or categorical data
same direction – Assumptions:
• Negative correlation • Random selection
– Indirect correlation • Data is independent

• When variables change in


opposite directions
CALCULATING CORRELATIONS: PEARSON’S R: BY HAND

• What do these symbols represent?


– rxy correlation coefficient between X and Y
– n size of the sample
– X individual’s score on the X variable
– Y individual’s score on the Y variable
– XY product of each X score times Y score
– X 2 individual X score, squared
– Y 2 individual Y score, squared
CALCULATING CORRELATIONS: PEARSON’S R: EXCEL

• Three Methods:
– Scatterplot
– Functions
• CORREL
– Data Analysis Toolpak
THE CORRELATION COEFFICIENT

• rXY = correlation between X and Y Pearson r


• Gives you a number between -1 and 1 Strength of the Negative Positive
• The number tells you about the association Relationship
weak -.00 to -.39 .00 to .39
– The closer to 1 the stronger the relationship
moderate -.40 to -.59 .40 to .59
• The sign tells you about the direction
strong -.60 to -.79 .60 to .79
– + = a positive relationship
very strong -.80 to -.99 .80 to .99
– - = a negative relationship
perfect -1 1
HOW TO DETERMINE IF R IS SIGNIFICANT
Use the Table B.4 in Appendix in the text
• One-tail or two? Does the alternative or research hypothesis state
whether the relationship will be positive or negative? If so, it is a one-
tailed test.
• Non-directional tests are two tailed.
Degrees of freedom = n - 2
• n = number of pairs used to compute the correlation coefficient
– The table provides the critical value for r. If the test statistic (r) is more
extreme the finding is significant.
• Correlation: the relationship between two variables
• Causation: one variable causes the effect in another variable
• Correlation is not causation
LINEAR REGRESSION
REGRESSION
• Correlation: relationship between two variables
• Regression: using the relationship between two or more correlated variables to predict
values of one variable based on values of the other. It is useful to make a prediction
Can find a predicted y-value by substituting the x-value into the regression
equation
Equation: y = mx + b with slope is m
What is linear regression?
• Fitting a line to the data
• Allows you to describe strength of relationship
• Allows you to make predictions about y for given value of x
REGRESSION
REGRESSION TERMS REGRESSION STEPS
• Explanatory variable: also called independent • Create scatterplot of data
variable, this is the x axis
• Fit a line to the data
• Response variable: also called dependent
variable, this is the y axis • Resulting formula tells us about the
• Actual value: observed value relationship between the two variables
• Predicted value: value that we estimate or are
predicting, denoted by
• Intercept: Denotes where the regression line
intercepts the y axis
• Constant: Designates the amount of change in y
for every one unit change in x
• Residual (standard error of estimate): error
that is not explained by the regression equation

ŷ  b0  b1 x
LINEAR REGRESSION - ASSUMPTIONS
• Used for continuous variables
• Assumptions:
– Linear relationship
– Multivariate normality
• Data is normally distributed
– No or little multicollinearity
• Two independent variables are not highly correlated
– No auto-correlation
• Correlation at different time points (usually from multiple observations of the same subject)
– Homoscedasticity
• Similar variance among observations
CALCULATING LINEAR REGRESSION: EXCEL -
SCATTERPLOTS

Step 1 Step 2
Step 3
C A L C U L AT I N G L I N E A R
REGRESSION: EXCEL
FUNCTIONS: SLOPE
 Computes the slope of the line

 The b1 in the regression equation

Step 1 Step 2
Step 3

Step 4
C A L C U L AT I N G L I N E A R
REGRESSION: EXCEL
FUNCTIONS: INTERCEPT

Computes the
location where the
regression line
crosses the y-axis
* The bO in the linear
regression equation

Step 1
Step 2

Step 3
Step 4
CALCULATING LINEAR REGRESSION:
EXCEL : DATA ANALYSIS TOOLPAK

Step 1 Step 2
Step 3
LINEAR REGRESSION INTERPRETATION:
• For every one unit increase in x, there is a corresponding
_____ increase/reduction in y
PREDICTION ERRORS AND MULTIPLE
REGRESSION
PREDICTION ERRORS MULTIPLE REGRESSION
 Standard error of estimate  Multiple Regression formula:
 The measure of how much each Y '  b1 X 1  b2 X 2  a
data point (on average) differs from  Y’: the value of the predicted score
the predicted data point or a
 X1: the value of the first independent variable
standard deviation of all the error
scores  X2: the value of the second independent variable
 Tells you how wrong the regression  b: the regression weight for each variable
model is on average using the units  When using multiple predictors your independent
of the response variable (y, variables (X) should be related to the dependent
dependent variable) variable (Y)
 The higher the correlation between  The independent variables should not be related
two variable (and the better the
prediction), the lower the error will
be
CHI SQUARE
PARAMETRIC VS. NONPARAMETRIC TESTS
Parametric Tests Nonparametric Tests
• Make assumptions: • Do not make same
– Tests being used to assumptions:
estimate a population – Not attempting to
parameter estimate parameters
– Populations based on – No assumption of
normal distributions normality “distribution
– Samples are also free”
normally distributed – Can be used for nominal
• Used for interval/ratio or data
ordinal data
SO FAR WE HAVE LEARNED
Parametric Test Nonparametric Test

t-test for independent samples Mann-Whitney test

t-test for dependent samples Wilcoxson matched-pairs signed rank test

ANOVA Kruskall-Wallis test

Pearson correlation Spearman rank-order correlation

Linear regression Nonparametric regression

Two-sample proportions test Chi square


CHI SQUARE BACKGROUND
• Chi-square allows you to determine if what you observe in a distribution
of frequencies is what you would expect to occur by chance
• Chi-square answers the following question:
– Are the frequencies of the categories we observed different from the frequencies we
expected?
• Expected frequencies may come from
– Previous theory or research
– Population distribution
– Equal distributions among the categories
• Used for nominal or categorical variables
• Used for sample data is displayed in a contingency table
CHI SQUARE…
• Assumptions: Steps to run chi square test
– Nominal data • Develop hypothesis
– The sampling method is random – Null
– Independence of observations • variables are independent
• Each person is only counted one time – Research
– Categories are mutually exclusive • variables are not independent
– Minimum expected frequencies • Set level of significance
• None of the expected frequencies
should be equal to zero
• Run chi-square test
• 80% of the expected frequencies • Interpret results
should be greater than 5
– Statistical conclusion
– Experimental conclusion
ONE-WAY CHI SQUARE
• One-sample chi-square (goodness-of-fit test) has only one dimension

 O  E
2
Degrees of freedom:
 2
 Degrees of freedom approximate the number
E
of categories in which data has been
X2: chi-square value organized
𝜮: summation sign For chi square:
df = c – 1; where c is the number of
O: observed frequency categories
E: expected frequency Expected values (two methods):
 Evenly divide the total among
categories
 Based on population distribution
CALCUL ATING ONE-WAY CHI-SQUARE:
EXCEL FUNCTIONS
CALCUL ATING ONE-WAY CHI-SQUARE:
EXCEL FUNCTIONS
CALCUL ATING ONE-WAY CHI-SQUARE:
EXCEL FUNCTIONS
CALCUL ATING ONE-WAY CHI-SQUARE:
EXCEL FUNCTIONS

*note that the CHISQ.TEST result is not the chi-square statistic, it is the
p-value*
TWO-WAY CHI SQUARE – TWO DIMENSIONS
 O  E
2

 2
 • Degrees of freedom:
E • Degrees of freedom approximate the number
of categories in which data has been organized
X2: chi-square value
• For chi square: df = (r – 1)(c – 1); here r is
𝜮: summation sign the number of rows and c is the number of
O: observed frequency categories

E: expected frequency • Expected values (one method):


• Determine number that should be in each
cell based on distribution of total sample
CALCUL ATING TWO-WAY CHI-SQUARE:
EXCEL FUNCTIONS
CALCUL ATING TWO-WAY CHI-SQUARE:
EXCEL FUNCTIONS
CALCUL ATING TWO-WAY CHI-SQUARE:
EXCEL FUNCTIONS
CALCUL ATING TWO-WAY CHI-SQUARE:
EXCEL FUNCTIONS
CALCUL ATING TWO-WAY CHI-SQUARE:
EXCEL FUNCTIONS

*note that the CHISQ.TEST result is not the chi-square statistic, it is the p-value*
STATISTICAL SIGNIFICANCE AND
INTERPRETATION OF CHI-SQUARE
STATISTICAL SIGNIFICANCE INTERPRETATION
 Compare p-value to _____.  (2)
2
 20.6, p  .05
 Compare critical value to _____.  X2(2): represent the test statistic
 Can look up critical values and p- with two degrees of freedom
values in a table or use an online  2: number of degrees of freedom
calculator  20.6: the obtained value
 p<0.05: the probability
MEASURE OF EFFECT FOR CHI-SQUARE
PHI CRAMER’S V
o Measures the relationship between o Used for tables which have more
two binary variables than 2x2 rows and columns
o Applies to 2x2 nominal tables only o n = total observations
o Interpreted like a correlation o k = number of columns or rows
coefficient (whichever is smaller)
o Can determine the amount of
shared variance
o N = total observation
INTERPRETING PHI AND CRAMER’S V

PHI CRAMER’S V
Effect Size for Phi Effect Size for Cramer’s V
df Small Medium Large
small .10 to .29 1 .10 to .29 .30 to .49 ≥ .50
2 .07 to .20 .21 to .34 ≥ .35
medium .30 to .49
3 .06 to .16 .17 to .28 ≥ .29

large 4 .07 to .20 .21 to .24 ≥ .25


≥ .50
≥5 .05 to .12 .13 to .21 ≥ .22
GOOD LUCK

You might also like