**0 Sampling Technique and Survey Studies
**

Correlation and Association between Variables

• To answer questions like:

Are ticket prices for professional basketball games related to attendance at the

games? Is there a statistical significant relationship?

We would like to predict university grade point average of newly admitted

students. Do grade 13 marks or SAT scores predict first year university grades

accurately?

How accurately can we predict gas consumption from temperatures?

Is there a relationship between muscle strength and functional capacity in

arthritis patients?

• A statistical procedure to examine the degree of correlation is required.

If the variables tend to increase or decrease together,

Positive correlation

As one variable increases in value, the other tend to decreases,

Negative correlation

Correlation Between Interval or Ratio Measurements

• Correlation coefficients are used to quantitatively describe the strength and

direction of a relationship between two variables.

• When both variables are at least interval measurements, may report Pearson

product moment coefficient of correlation that is also known as the

correlation coefficient, and is denoted by ‘r’.

• Pearson correlation coefficient is only appropriate to describe linear

correlation. The appropriateness of using this coefficient could be examined

through scatter plots. The rationale of this statistic to measure linear

correlation is to be discussed in class.

• A statistic that measures the correlation between two ‘rank’ measurements is

Spearman’s ρ, a nonparametric analog of Pearson’s r.

• Spearman’s ρ is appropriate for skewed continuous or ordinal measurements.

• Correlation matrix presents the correlation coefficients for all pairs of

variables in a matrix form. Appropriateness of using r will be examined.

ADMS 3352 3.0 Sampling Technique and Survey Studies

• Statistical tests are available to test hypotheses on ρ. Ho: There is no

correlation between the two variables (H0: ρ = 0).

Analysis of Two-way Contingency Tables

• Sampling models:

Multinomial

Independent Binomial

Poisson

• Correlation between ordinal or nominal measurements are usually referred to

as association

• Examine the association through a contingency table. (Try a scattergram. The

need for further display of information is very transparent.)

• Chi Square test of independence of the Row and Column Variables

• Testing of independence using the likelihood ratio chi-squared statistic G

2

• Fisher’s Exact Test of independence

If one can consider margins to be fixed

è Assume hypergeometric distribution

è Use Fisher’s Exact Test

• Odd Ratio (OR) as a measure of association

Let p1=n11/ n1+, p2=n21/ n2+

OR = [ p1 / (1- p1 ) ] / [ p2 / (1- p2 ) ]

Retrospective studies: OR estimates relative risks (RR)

When outcome is a rare event (n11 and n21 are small): OR estimates RR

In prospective studies:

RR=p1/ p2

• For independent groups (say, the Row variable), one may compare the proportion in

Column C

j

given Row R

i

to that of Row R

i’

, and test the difference between the two

proportions, d. Pearson’s Chi Square statistic is proportional to d

2

.

ADMS 3352 3.0 Sampling Technique and Survey Studies

The Chi square Statistic

Assumptions:

1. Frequencies represent individual counts

2. Categories are exhaustive and mutually exclusive

Rationale:

Test of independence between the Row and Column Variables:

Compare the observed to the expected cell counts under the

assumption of independence.

Test Goodness of Fit:

Compare the observed to the expected cell counts under the

theoretical distribution.

Validity:

Expected cell size > 5

Yate’s correction

General note on Chi Square Statistics

1. Require large samples

2. Chi square statistic is sensitive to increase in sample size. Increase in

sample size increases Chi square even if the association is the same.

3. Ignore information if the variables were ordinal in nature à less powerful

Common Coefficients of Association for Ordinal Variables

Pearson’s product-moment correlation

Spearman’s rho (ρ)

Cochran-Armitage trend test

Kendall’s tau, Gamma, and Somer’s D statistics

1. Based on the classification of all possible pairs of subjects in the table as

concordant or discordant pairs

2. All take on values from –1 to +1

3. Somer’s D: Adjustment for ties are made on the independent variable only

4. Gamma is the least conservative among three

5. Gamma ignores ties

ADMS 3352 3.0 Sampling Technique and Survey Studies

Nominal – Ordinal Tables

Mantel-Haenszel correlation statistic

1. Measures association between two variables (ordinal) across strata of the

third variable.

2. The MH statistic is approximately Chi-square distributed

3. Validity: requires the across-strata sum of sample size to be at lease 40.

4. The Mantel-Haenszel test is not sensitive to association of different

directions across strata.

Kappa

Cohen’s kappa coefficient assesses raters’ agreement

Measures the extent of agreement beyond the expected due to chance.

Lambda coefficient

Measures how well the knowledge of one categorical variable predicts the

other.

Correlation versus Comparison

Correlation does not provide any information relative to the difference

between the variables, only to the relative order of the scores. Therefore, it is

inappropriate to draw conclusions on the differences or similarities between

distributions of the variables based on correlation coefficient.

Causation and Correlation

Knowing that two variables, X and Y, correlate does not provide any

information on how they relate. The correlation could be a result of:

1. Common response: Both variables X and Y respond to changes in some

unobserved variable(s).

2. Confounding: X’s effect on Y is hopelessly mixed up with another

unobserved variable’s effect on Y.

3. X causes Y: The order of events has to be clear. Usually, valid

conclusion can only be based on controlled experiments.

