0 Sampling Technique and Survey Studies
Correlation and Association between Variables
Are ticket prices for professional basketball games related to attendance at the
games? Is there a statistical significant relationship?
We would like to predict university grade point average of newly admitted
students. Do grade 13 marks or SAT scores predict first year university grades
accurately?
How accurately can we predict gas consumption from temperatures?
Is there a relationship between muscle strength and functional capacity in
arthritis patients?
• A statistical procedure to examine the degree of correlation is required.
If the variables tend to increase or decrease together,
Positive correlation
As one variable increases in value, the other tend to decreases,
Negative correlation
Correlation Between Interval or Ratio Measurements
• Correlation coefficients are used to quantitatively describe the strength and
direction of a relationship between two variables.
• When both variables are at least interval measurements, may report Pearson
product moment coefficient of correlation that is also known as the
correlation coefficient, and is denoted by ‘r’.
• Pearson correlation coefficient is only appropriate to describe linear
correlation. The appropriateness of using this coefficient could be examined
through scatter plots. The rationale of this statistic to measure linear
correlation is to be discussed in class.
• A statistic that measures the correlation between two ‘rank’ measurements is
Spearman’s ρ, a nonparametric analog of Pearson’s r.
• Spearman’s ρ is appropriate for skewed continuous or ordinal measurements.
• Correlation matrix presents the correlation coefficients for all pairs of
variables in a matrix form. Appropriateness of using r will be examined.
ADMS 3352 3.0 Sampling Technique and Survey Studies
• Statistical tests are available to test hypotheses on ρ. Ho: There is no
correlation between the two variables (H0: ρ = 0).
Analysis of Two-way Contingency Tables
• Sampling models:
Multinomial
Independent Binomial
Poisson
• Correlation between ordinal or nominal measurements are usually referred to
as association
• Examine the association through a contingency table. (Try a scattergram. The
need for further display of information is very transparent.)
• Chi Square test of independence of the Row and Column Variables
• Testing of independence using the likelihood ratio chi-squared statistic G
2
• Fisher’s Exact Test of independence
If one can consider margins to be fixed
è Assume hypergeometric distribution
è Use Fisher’s Exact Test
• Odd Ratio (OR) as a measure of association
Let p1=n11/ n1+, p2=n21/ n2+
OR = [ p1 / (1- p1 ) ] / [ p2 / (1- p2 ) ]
Retrospective studies: OR estimates relative risks (RR)
When outcome is a rare event (n11 and n21 are small): OR estimates RR
In prospective studies:
RR=p1/ p2
• For independent groups (say, the Row variable), one may compare the proportion in
Column C
j
given Row R
i
to that of Row R
i’
, and test the difference between the two
proportions, d. Pearson’s Chi Square statistic is proportional to d
2
.
ADMS 3352 3.0 Sampling Technique and Survey Studies
The Chi square Statistic
Assumptions:
1. Frequencies represent individual counts
2. Categories are exhaustive and mutually exclusive
Rationale:
Test of independence between the Row and Column Variables:
Compare the observed to the expected cell counts under the
assumption of independence.
Test Goodness of Fit:
Compare the observed to the expected cell counts under the
theoretical distribution.
Validity:
Expected cell size > 5
Yate’s correction
General note on Chi Square Statistics
1. Require large samples
2. Chi square statistic is sensitive to increase in sample size. Increase in
sample size increases Chi square even if the association is the same.
3. Ignore information if the variables were ordinal in nature à less powerful
Common Coefficients of Association for Ordinal Variables
Pearson’s product-moment correlation
Spearman’s rho (ρ)
Cochran-Armitage trend test
Kendall’s tau, Gamma, and Somer’s D statistics
1. Based on the classification of all possible pairs of subjects in the table as
concordant or discordant pairs
2. All take on values from –1 to +1
3. Somer’s D: Adjustment for ties are made on the independent variable only
4. Gamma is the least conservative among three
5. Gamma ignores ties
ADMS 3352 3.0 Sampling Technique and Survey Studies
Nominal – Ordinal Tables
Mantel-Haenszel correlation statistic
1. Measures association between two variables (ordinal) across strata of the
third variable.
2. The MH statistic is approximately Chi-square distributed
3. Validity: requires the across-strata sum of sample size to be at lease 40.
4. The Mantel-Haenszel test is not sensitive to association of different
directions across strata.
Kappa
Cohen’s kappa coefficient assesses raters’ agreement
Measures the extent of agreement beyond the expected due to chance.
Lambda coefficient
Measures how well the knowledge of one categorical variable predicts the
other.
Correlation versus Comparison
Correlation does not provide any information relative to the difference
between the variables, only to the relative order of the scores. Therefore, it is
inappropriate to draw conclusions on the differences or similarities between
distributions of the variables based on correlation coefficient.
Causation and Correlation
Knowing that two variables, X and Y, correlate does not provide any
information on how they relate. The correlation could be a result of:
1. Common response: Both variables X and Y respond to changes in some
unobserved variable(s).
2. Confounding: X’s effect on Y is hopelessly mixed up with another
unobserved variable’s effect on Y.
3. X causes Y: The order of events has to be clear. Usually, valid
conclusion can only be based on controlled experiments.