You are on page 1of 29

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/272259980

Correlation Coefficient According to Data
Classification

Article in SSRN Electronic Journal · January 2014
DOI: 10.2139/ssrn.2417910

CITATIONS READS

2 75

1 author:

Paul Louangrath
Bangkok University
32 PUBLICATIONS 3 CITATIONS

SEE PROFILE

Available from: Paul Louangrath
Retrieved on: 24 July 2016

Correlation Coefficient According to Data Classification Dr. Paul T.I. Louangrath

Correlation Coefficient According to Data Classification
Dr. Paul LOUANGRATH
Bangkok University - The International College
The Family Enterprise Research Center (FERC)
Rama IV Road, Klongtoey
Bangkok 10110 THAILAND
Phone: +662 3503500 ext.1643 Fax: +662 3502453
Email: Lecturepedia@gmail.com

ABSTRACT
The purpose of this writing is to introduce researchers to correlation coefficient
calculation. The Pearson Product Moment Correlation Coefficient is the most
common type of correlation; however, the Pearson correlation coefficient may not be
applicable in all cases. There are many types of correlation coefficient. The correct
choice of correlation coefficient depends on the classification of the independent
variable (X) and dependent variable (Y). Data are classified into one of three types:
quantitative, nominal and ordinal. This writing explains various types of correlations
on the basis of X-by-Y data type combination.

Key words:
Biserial correlation coefficient, Goodman-Kruskal lambda, Pearson correlation
coefficient, point biserial correlation coefficient, polychoric coerrelation coefficient,
rank biserial correlation coefficient, Spearman rho, and tetrachoric correlation
coefficient.

JEL Code:
C10, C12, C13 & C19

Citation:
Louangrath, Paul I., 2417910 (March 30, 2014). Available at SSRN: http://ssrn.com/abstract=2417910

1.0 Introduction to Various Types of Correlation Coefficient
Reliability test uses correlation coefficient as one of the means to test the degree of
association. In reliability context, correlation coefficient is interpreted as the ability to
replicate the data of a prior study as the current experiment represents one array and
the prior study represents the second array. The function of correlation coefficient is
to give an index of association between two data arrays. The researcher must be aware
of various types of correlation coefficient calculations and which one to use in a given
situation. The situation is defined by the type of data available. The variable may be
defined as X and Y . The objective of correlation coefficient calculation is to
determine the relationship between X and Y through the measurement of association
between X and Y . The table below illustrates the type of data crossing according to
data types.

Variable Quantitative Ordinal Nominal Nota
(X,Y) X X X Bene
Quantitative Y Pearson r Biserial rb Point Determine the
Biserial rpb type of data for

-1-

such as multiplication and division may be allowed.3 Ordinal Data and its Central Tendency Measurement Ordinal is a second type of qualitative data. such as (Yes | No). such as: -Completely agree -Most agree -Most disagree -Completely disagree The median is used for the measurement of central tendency. and (2) ordinal data. there are also two further categories: (1) interval scale. 1. or nominal for the variables X and Y . ordinal. the mode may also be used as a -2- . 3rd.Correlation Coefficient According to Data Classification Dr. Before examining each type of correlation. correlation. No mathematical operations. each type of the correlation coefficient is used according to the characteristic of the data arrays: quantitative. 2nd. 2nd. it is used to differentiate between items or subjects based on names or meta-categories. For qualitative data. nationality. 1. it is important to be familiar with data classification. Louangrath Ordinal Y Biserial rb Spearman rho Rank X and Y then Tetrachoric & Biserial rrb select the Polychoric appropriate Nominal Y Point Rank Phi. x is not the measure for central tendency. in addition to the median.1 Data Classification Data can be classified into two main types: (i) qualitative and (ii) quantitative. 3rd. In statistical analysis. The ranking may be dichotomous in form. Quantitative data are those that have non-arbitrary zero point and the data may be subjected to mathematical operations. The median is the middle-ranked. Biserial rpb Biserial rrb Lambda Table 1. 1.0: Various types of correlation coefficient. or non-dichotomous. and (2) ratio scale. these two types of data are further divided into subcategories. such as addition. subtraction. The use of each type depends on the type of data arrays: X and Y .2 Nominal data and its Central Tendency Measurement Nominal data is qualitative data. This section of the writing includes ten types of correlation coefficients. C. The mode is used as the measurement for central tendency.… The data can be sorted on ascending order or descending order.… data may not be added or divided. The mode is defined as the value that appears most in frequency in a data set. The mean x is not allowed because the ranked order: 1st. the mod is the X at which the probability mass function is maximized. such as gender. For quantitative data.I. This is a ranked order: 1st. In discrete probability. Qualitative data are those that are used for identification. Paul T. ethnicity and language. This extension of Pearson’s 2 × 2 contingency is called polychoric correlation. Therefore. one additional type is a variance of the tetrachoric correlation ( 2 × 2 ) made to accommodate K × L contingency data for ordinal-x-ordinal data. There are nine common correlation coefficient types. L. multiplication and division. there are two subcategories: (1) nominal data. However.

there is no point of origin. However. In affine space. Ratio scale has unique and meaningful zero. The point of origin is arbitrary defined. IQ test. -3- . The raw data from the response itself may be read as a numerical data. therefore.” A mathematical term for scaled variable is affine line. 2. i. plane angle and energy are measured by ratio scale. The interval variable is sometimes referred to as “scaled variable.0 Pearson Correlation Coefficient r The Pearson correlation coefficient is the most commonly used form of correlation. or transformation. 355–383. studentized range and coefficient of variation may not be calculated. Psychology: An Introduction. 88. length. 1. The Pearson correlation coefficient is given by: 1 ⎛ X i − X ⎞ ⎛ Yi − Y ⎞ r= ⎜ ⎟⎜ ⎟ (1) n − 1 ⎝ s X ⎠ ⎝ sY ⎠ where … 1 Mussen.Q.e. ISBN 0- 669-61382-7. multiplication and division are not allowed Since division is not allowed. Paul Henry (1973). This type of data may be accommodated by the Pearson correlation coefficient. Quantitative science and the definition of measurement in psychology. geometric mean and harmonic mean may also be used. In interval scaled data. Paul T. The Pearson r is used when both X and Y are quantitative data. Quantitative data that is obtained through a measurement is this type of data. however. p. British Journal of Psychology. ratios are allowed.4 Ratio Data and its Central Tendency Measurement Ratio scaled data is quantitative. Quantitative data is the numerical measurement produced by the instrument without any intermediary interpretation. 1 There is no measurement to quantify intelligence. median. Louangrath measure for central tendency. Studentized range and the coefficient of variation are used to measure dispersion. “The I. J. The measurement of the dispersion include range and standard deviation. arithmetic mean are the basic measurements. The central tendency is measured by the mode. thus.Correlation Coefficient According to Data Classification Dr. a ratio of Celsius is not allowed.3 Interval Data and its Central Tendency Measurement Interval data is quantitative. is essentially a rank. Each item may be different. (1997). The measurement is the estimation of the ratio 2 between a magnitude of a continuous and a unit magnitude of the same kind. duration. Lexington (MA): Heath. for instance. there are no true ‘units’ of intellectual ability. Since zero is not arbitrary value.” 2 Michell. The central tendency of an interval data is measured by the mode. 363. This means that no division may be performed. Mass. median and arithmetic mean. One cannot say that 20 degree Celsius is “twice” as hot as 10 degree Celsius because zero degree Celsius is an arbitrary number. Coefficient of variation may not be determined since the mean is a moment about the origin. Multiplication and division are allowable mathematical operations. For example. 0 Celsius is equal to -273. Celsius scale is an interval scale. no ratios among items are allowed. it is ranked data. is ordinal data. In addition. the central moment may be determined.15 kelvin. 1.I. translation.

.Correlation Coefficient According to Data Classification Dr. Paul T. and c is the forecast error. If there is an increase in X. A value of zero means that there is no association between the two arrays. x2 . yn ) n∑ X − ( ∑ X ) 2 2 (3) and the standard deviation is generally given by: 1 n 1 n sX = ∑ ( xi − x )2 and sY = ∑ ( yi − y )2 (4) n − 1 i =1 n − 1 i =1 The value of the correlation coefficient ranges between -1 and +1. In basic statistics.. and 1 n Y = ∑ Yi is the mean of Yi .I. thus: ⎛s ⎞ r = b⎜ X ⎟ (2) ⎝ sY ⎠ where … n∑ XY − ( ∑ X )( ∑ Y ) b= where X : ( x1. b is the slope of the linear regression line... there is a 3 Y = a + bX + c where a is the Y-intercept.. Louangrath Xi − X Z= is the standard score measuring how far the individual data sX point is located away from the mean. the linear regression line equation is obtained through the following three statements: I XY = N ∑ XY − ( ∑ X )( ∑ Y ) a = Y − bX 2 I II X = N ∑ X 2 − ( ∑ X ) where b = XY II X 2 IIIY = N ∑ Y 2 − ( ∑ Y ) ⎡ 1 ⎤ ⎡⎢ ⎛ ( I )2 ⎞ ⎤ c= ⎢ ⎥ ( Y ) ⎜ XY ⎟ ⎥ III − ⎣ N ( N − 2) ⎦ ⎢⎣ ⎜ II X ⎟ ⎥ ⎝ ⎠⎦ -4- . Negative coefficient means that there is an obverse association. y2 .. n − 1 i =1 3 Another means to define r is to use the slope of the linear equation Y = a + bX + c as the parameter and multiply the slope by the quotient of the standard deviation of X divided by the standard deviation of Y. 1 n X= ∑ Xi n − 1 i =1 is the mean of X i ... xn ) and Y : ( y1.

there is also an increase in Y. For example.I. -5- . 90 = 2nd.e. i. 2. Louangrath corresponding decrease in Y and vice versa. (6) n+2 q = 1− p (7) The population standard deviation ( σ ) may not be known. The biserial correlation is given by: (Y1 − Y0 ) ⎛⎜ pq ⎞ ⎟ rb = ⎝ Y ⎠ (5) σY where … Y = Y score means for data pairs with x : (1. X may represent the raw score of test performance of n number of students and Y represents the ranking placement of students according to their test scores. x −µ t= (8) S/ n From the t-equation.Correlation Coefficient According to Data Classification Dr. and so on. determine the population mean ( µ ). thus: ⎛ S ⎞ µ =t⎜ ⎟− x (9) ⎝ n⎠ 4 Viz. however. and σY = population standard deviation for the y data and Y is the height of the standardized normal distribution at point z where P ( z ' < z ) = q and P( z ' > z ) = p . Paul T. 0) : n n 1 1 1 0 Y1 = ∑ yi and n1 i =1 Y0 = ∑ yi n0 i =1 q = 1− p . 0) . Laplace. p = proportion of data pairs with scores x : (1. Pierre-Simon (1814). Essai philosophique sur les probabilities. Note that the probability for p and q may be given by the Laplace Rule of 4 Succession: s +1 p= where s = number of success and n = total observations. If there is an increase in X.1 Biserial Correlation Coefficient rb Biserial correlation is used when the X array is quantitative and the Y array is ordinal data. Paris: Courcier. A positive coefficient means that there is a perfect association. 80 = 3rd. 100 = 1st place. it may be determined indirectly through two-steps process: (i) t-equation and (ii) Z-equation.

.. discrete probability is used because Y (response variable) exists as a ranked or ordinal variable. This type of distribution is produced by {Yes | No} answer category. the formula is different from the Pearson product moment correlation.Correlation Coefficient According to Data Classification Dr. the population standard deviation may be determined through the Z- equation. In addition. The point-biserial correlation ( rpb ) is given by: ⎛ M − M 0 ⎞ n1n0 rpb = ⎜ 1 ⎟ (12) 2 ⎝ sn ⎠ n where sn is the standard deviation of the combined population or pooled standard deviation. The mathematical equivalence is: rXY = rpb . yn ) either falls with a rank placement: [1st .I. y2 . Therefore.. thus: ⎛ x −µ ⎞ σ =⎜ ⎟ n (11) ⎝ Z ⎠ Note that p and q are used when discrete probability is involved. and (ii) X is quantitative data and Y is nominal. pq n 2. Although the point biserial correlation is equivalent to the Pearson correlation. The test statistic for the binomial probability is given by: X −p Zbin = n See infra. In point biserial correlation. p and q of the discrete binomial probability is used. such as Y in the series of X and Y is dichotomous. 2nd .. Louangrath With known µ . if one variable.2 Point Biserial rpb There are two cases where point-biserial correlation is used: (i) X is nominal and Y is quantitative data.. ith ] or it does not. The Z-equation is given by: x −µ Z= (10) σ/ n Now solve for the population standard deviation ( σ ). Paul T.. point-biserial correlation is also used. This “either or” argument dichotomizes the ordinal variable into {Yes | No} identifier which could be score as Yes = 1 and No = 0. Dichotomous data are categorical data that gives a binomial distribution.. thus: 1 n sn = ∑ n i =1 ( Xi − X ) 2 (13) -6- . The response Y : ( y1.

Louangrath In order to obtain Sn . “EDF Statistics for Goodness of Fit and Some Comparisons. Normal distribution 5 may be verified by the Anderson-Darling test.” Journal of the American Statistical Association 69: 730–737.Correlation Coefficient According to Data Classification Dr. Stephens (1986).. A.I. Thus rpb is written as: ⎛ M − M 0 ⎞ n1n0 rpb = ⎜ 1 ⎟ (14) ⎝ sn −1 ⎠ n(n − 1) The standard deviation for the “sample only” data set is given by: 1 n sn −1 = ∑ n − 1 i =1 ( Xi − X ) 2 (15) The two equations using sn and sn −1 are equivalent. x2 . The combined sample size is given by: n = n1 + n2 . This pooled standard deviation is presented as Sn . thus.1) . a more accurate biserial coefficient is given by: ⎛ M − M 0 ⎞ ⎛ n1n0 ⎞ rb = ⎜ 1 ⎟⎜ 2 ⎟ (17) ⎝ sn ⎠⎝ n u ⎠ where u is the abscissa or Y of the normal distribution N (0. If the data array of X is normally distributed... both arrays must be combined: n1 + n2 = n . and M. (1974). A. The term M1 is the mean value for the continuous X : x1. ⎛ M − M 0 ⎞ n1n0 ⎛ M1 − M 0 ⎞ n1n0 rpb = ⎜ 1 ⎟ =⎜ ⎟ 2 ⎝ sn ⎠ n ⎝ sn −1 ⎠ n(n − 1) The test statistic is the t-test which is given by: n1 + n0 − 2 t pb = rpb (16) 2 1 − rpb The degrees of freedom is v = n1 + n0 − 2 . M. Paul T. n 1 1 therefore: M1 = ∑ Xi n1 i =1 for all data points in group 1 with size n1 and n 1 2 M0 = ∑ X i . “Tests Based on EDF -7- . 5 Stephens. If a data comes n2 i =1 from only one sample of the population. sn −1 is used for the standard deviation. xn .. and the standard deviation of the combined string n is calculated to obtain sn .

I. R. Paul T. -8- . and (iii) correlation adjusted for the bias resulted from the inclusion of the items scores.A.” In D’Agostino. (ii) the Pearson correlation between item scores and total test score excluding the item scores. Louangrath There are three types of point-biserial correlation. The frequency a is equal to X in P( X ) as illustrated in the table below.Correlation Coefficient According to Data Classification Dr.3 Spearman Correlation Coefficient ρ The Spearman correlation coefficient is used when both the independent variable (X) and dependent variable (Y) are ordinal. and q is 1 − p or the probability of failure. p is the probability of success of the observed value over the total number of events. The correlation adjusted for the bias resulted from the inclusion of the items scores is given by: M1 − M 0 − 1 rupb = (18) ⎛ n2 s 2 ⎞ ⎜ ⎜ n1n0 ⎟ ( 1 0) n ⎟ − 2 M − M +1 ⎝ ⎠ Note that for point-wise or specific probability of X value. ISBN 0-8247-7487-6. Goodness-of-Fit Techniques. and Stephens. namely (i) the Pearson correlation between item scores and total test scores including the item scores. Ordinal data is defined as a ranked order type Statistics. 2. The test statistic for the binomial distribution is: X −p Zbin = n (20) pq n Recall that the term a in the 2 × 2 contingency table is the frequency of for perfect match of {Yes: observed} and {Yes: forecast}. Y YES NO Forecast YES a b P( F ) = a + b X NO c d 1 − P( F ) Observation P (O) = a + c 1 − P(O) a+b+c+d The term a + b + c + d is the combined joint probability of all events in the set. X is the specified value to be predicted. the binomial distribution for the categorical data is given by: n! P( X ) = p X q X −n (19) (n − X )!n ! where n is the total number of observations. M. New York: Marcel Dekker.B.

(2003). Zermelo's Axiom of Choice: Its Origin.. thus: 1 − 6∑ di ρ= (22) ( n n2 − 1 ) where di = xi − yi and n is the number of elements in the paired set: i in d . The test statistic under ⎜ Sy ⎟ ⎝ ⎠ the Z-equation is given by: 6 Dauben. This method is not used if the researcher is looking for top X . ISBN 0-8058-4037-0. G. The Fisher’s transformation for the correlation is given by: 1 ⎛ 1+ r ⎞ F (r ) = ln ⎜ ⎟ (23) 2 ⎝ 1− r ⎠ ⎛S ⎞ where r = b ⎜ x ⎟ and ln is the natural log of base e = 2. equation (21) is used. third. p. Jmp For Basic Univariate And Multivariate Statistics: A Step-by-step Guide. p. second. (1990). H. Georg Cantor: His Mathematics and Philosophy of the Infinite. Lawrence Erlbaum. nth }. p. Chapman & Hall. J. it is necessary to find the Fisher’s transformation of r . Development. The ordinal data of these variable are xi and yi respectively. Paul T.. S. Ann (2005).. p. Jerome L. 52. ISBN 1-59047-576-3. and Suppes. This section focuses on the ordinal data of both dependent and independent variables. Arnold D. Research Design and Statistical Analysis (2nd ed.Correlation Coefficient According to Data Classification Dr. Distribution-Free Statistical Methods. Axiomatic Set Theory. (1982). W. Cary. New York: Springer-Verlag. 217. 129. J. Assume that there are two arrays of data called independent variable: X i and dependent variable: Yi . Generally. p. and Maritz. The correlation coefficient of xi and yi is given by: ρ= ∑ i ( xi − x )( yi − y ) (21) 2 2 ∑ i ( xi − x ) ∑ i ( yi − y ) There is an alternative calculation of rho through the use of the difference of two 8 ranked arrays xi and yi . However. 199. …. New York: Dover.I. (1981).). 508. NJ: Princeton University Press. 8 Myers. Louangrath 6 of a well order set: {first. Princeton. 123. The test statistic used for the Spearman rank correlation is given by the Z-test or t-test. ISBN 0-412-15940-6. and Influence. Well. 7 Lehman. -9- . Moore. (1972). NC: SAS Press. p.718. P. there is a claim made by Lehman that the Spearman coefficient can be used for both continuous and discrete 7 variable. In order to use the Z-score test.

The data is commonly presented in 2 × 2 contingency table. 11 Kendall. Stuart. H. S. C. Below is an example of the 2 × 2 contingency table and its scoring. ISBN 978-3-540-21120-4. p. (1973). Studies in Fuzziness and Soft Computing 151. I. there are two correlation tests used for binary data. Pleszczyńska E. Kendall’s tau is beyond the scope of the present topic. G. The equivalence of the above 12 determination is the Kendall’s tau.19. 2. Ruland F. (1977). .I. and Fieller. C. The argument in support of 11 tr approach rests on the idea of permutation. The definitions are based on the 2 × 2 contingency table below: 9 Choi. A. A series of definition for the terms used in tetrachoric correlation coefficient must be provided in order to gain a clearer understanding. S. The Advanced Theory of Statistics. Hartley. 10 Press.06 ⎠ 9 The null hypothesis is that r = 0 which means that there is a statistical independence or no dependent association.21).Correlation Coefficient According to Data Classification Dr. Louangrath ⎛ n−3 ⎞ z = ⎜⎜ ⎟⎟ F (r ) (24) ⎝ 1. 12 Viz.4 Tetrachoric Correlation Coefficient rtet The tetrachoric correlation coefficient is used when both the independent ( X ) and dependent (Y ) variables are dichotomous or binary data and both are ordinal. 640. Generally.) (2004). Volume 2: Inference and Relationship.. 31.” Biometrika 64(3): 645–647.10 - .” Biometrika 44: 470–481. Kowalczyk. E. Flannery (1992). Vettering. In the alternative. the test statistic may also be determined by the t-test thus: n−2 tr = r (25) 1− r2 10 The degree of freedom is given by df = n − 2 . Y Yes No Yes a b pF X No c d 1 − pF po 1 − po Table 2. E. Berlin Heidelberg New York: Springer Verlag. Numerical Recipes in C: The Art of Scientific Computing (2nd ed. Teukolsky. Pearson.).0: The 2 × 2 contingency table expressing the frequencies in terms of joint probabilities. (eds. ISBN 0-85264-215-6. (1957). T. “Tests for rank correlation coefficients. Griffin. . Grade Models and Methods for Data Analysis with Applications for the Analysis of Data Populations. namely phi-coefficient and tetrachoric correlation coefficient. Paul T. (Sections 31.... O. M. “Tests of Equality of Dependent Correlation Coefficients.

r ) = 1 ⎢ exp − 1 2 2 ⎥ ( x1 − 2rx1x2 + x2 . . r ) is the bivariate normal p.. The tetrachoric correlation coefficient (TCC). po and pF determine the values in the table. “Application of tetrachoric and polychoric correlation coefficients to forecast verification. x2 . zO ∫z F (2) Where φ ( x1. r ) dx1 / dx2 . 1. xFn ) The three frequencies: a . Paul T. The bias of this determination is given by: P a+b Bias = F or Bias = (26) Po a+c Juras and Pasaric (2006) formally explained tetrachoric correlation coefficient as: “Let zO = Φ −1 ( PO ) and z F = Φ −1 ( PF ) be the standard normal deviates (SND) corresponding to marginal probabilities PO and PF .Correlation Coefficient According to Data Classification Dr. x2 . respectively. 23. 1959): a= 1 π 2π ∫arccos r ⎡ 1 2 exp ⎢ − zO ⎣ 2 ( ⎤ + z F2 − 2 zO z F cos ω cosec2ω ⎥d ω ⎦ ) (4) Showing that the joint frequency a is a monotone function of r is well defined by (2). Louangrath po and pF = marginal frequencies. Vol. x2 .. No. ⎡ ⎤ φ ( x1.” See Josip Juras and Zuran Pasaric (2006)..(3) ) 2π 1 − r 2 ⎢ ⎣⎢ 2 1 − r 2 ( ) ⎥ ⎦⎥ The line x1 = zO and x2 = z F divide the bivariate normal into four quadrants whose probabilities correspond to relative frequencies in the 2 × 2 table. The double integral in (2) can be expressed as (National Bureau of Standards. the SDN − S zO and z F are uniquely determine by PO and PF . xOn ) F = forecast which is comprised of F → X F : ( xF1 + xF 2 +.. Clearly. introduced by Pearson (1900). 64.f. a = joint frequency of the contingency table O = observations which is comprised of O → X O : ( xO1 + xO 2 +.. p. po = probability of the observed and pF = probability of the forecast.d.” GEOFIZIKA. is the correlation coefficient r that satisfies ∞ ∞ a=∫ φ ( x1..I.. respectively..11 - .

Φ −1 (1 − pY ) . Paul T. For practical purpose. Pos. Juras and Pasaric gave an extensive treatment of the tetrachoric correlation coefficient when they provided the Peirce measure ( sP ). The term pa may be written as: ( pa = Φ Φ −1 (1 − p X ) . pa pb pX X Neg. r )dx2 dx1 (27) where Φ ( x) is the standard normal distribution and φ2 ( x1. the Yule’s odd ratio skill score is said to also give the approximation of the tetrachoric correlation coefficient: a − PO PF SY = (32) a ⎡⎣1 − 2 ( PO + PF ) + 2a ⎤⎦ + PO PF . rtc ) (28) These formal definitions are not helpful for the actual calculation of the tetrachoric correlation coefficient. pc pd 1 − pX pY 1 − pY Table 3.12 - . ρ ) is the bivariate standard normal density function. Neg.0: The 2 × 2 contingency table expressing the probability of frequencies in terms of joint probabilities. assume that the 2 × 2 contingency table below as the basis for further discussion of the tetrachoric correlation coefficient: rtc . Heidke measure ( sH ) and the Doolitlle measure ( sD ) as the estimate of rtc . All these measures are comparable calculation for the tetrachoric correlation coefficient. x2 .Correlation Coefficient According to Data Classification Dr. Y Pos. x . These measures are provided as: a − PO PF sP = (29) PO (1 − PO ) 2 ( a − PO PF ) sH = (30) PO + PF − 2 PO PF a − PO PF sD = (31) PO (1 − PO ) PF (1 − PF ) In addition. the tetrachoric correlation coefficient is defined as the solution given by rtc to the integral equation: ∞ ∞ pa = ∫ Φ=1(1− p X ) ∫Φ=1(1− pY ) 2 1 2 tc φ ( x .I. Louangrath Another version.

ABCD are relative frequencies. p. Mathematical Contributions to the Theory of Evolution. Assume that Table 4. B = Pb . . Pp. 101-167. 5-21. This short-hand version is less complicated than the Pearson’s original 14 version.0 has the following data set: Y 0 1 1 A = 10 B=5 A + B = 15 X 0 C =5 D = 10 C + D = 15 A + C = 15 B + D = 15 30 13 Juras and Pasaric. S. Soc. Wiley. S. and Kotz. 4. R. the actually tetrachoric correlation coefficient is given by: ⎛π ⎞ Sr = sin ⎜ (4a − 1) ⎟ (33) ⎝2 ⎠ Note that S with subscripts is equivalent to rtc . (1972). p. 67 citing Sheppard (1998): On the application of the theory of error to cases of normal distribution and normal correlations. Biometric Series I.13 - . Philos. Draper’s Company Research Memoirs. 333. One alternative to calculating tetrachoric correlation coefficient is given by the alpha ratio: α −1 rtc = (34) α +1 π /4 ⎛ AD ⎞ where α = ⎜ ⎟ and the equivalence of ABCD are A = Pa . See also Johnson. Yet another shorthand formula for tetrachoric correlation coefficient is given by: ⎛ 180 ⎞ rtc = cos ⎜ ⎟ (35) ⎜ ⎝1+ ( BC / AD ) ⎠⎟ where the contingency table is given by: Y 0 1 1 A B A+ B X 0 C D C+D A+C B+D Table 4.L. On the Theory of Contingency and Its Relation to Association and Normal Correlation. Continuous Multivariate Distributions.-A. XIII. Paul T. Karl (1904). N. 14 Pearson. Tr. 192. Distributions in Statistics.I. Louangrath 13 Finally.0: The 2 × 2 contingency table with score of 0 and 1. C = Pc and ⎝ BC ⎠ D = Pd . New York.Correlation Coefficient According to Data Classification Dr.

the calculation follows: ⎛ π ⎛ ad − bc ⎞ ⎞ ⎛ π ⎛ 10(10) − 5(5) ⎞ ⎞ rtc = sin ⎜ ⎜⎜ ⎟ ⎟⎟⎟ = sin ⎜ ⎜ ⎟⎟ ⎟ ⎜ ⎜ ⎜ ⎟ ⎝ 2 ⎝ ad + bc ⎠ ⎠ ⎝ 2 ⎝ 10(10) + 5(5) ⎠ ⎠ ⎛ π ⎛ 100 − 25 ⎞ ⎞ ⎛ π ⎛ 10 − 5 ⎞ ⎞ rtc = sin ⎜ ⎜⎜ ⎟ ⎟ = sin ⎜ 2 ⎜ 10 + 5 ⎟ ⎟ ⎜ 2 100 + 25 ⎟ ⎟ ⎝ ⎝ ⎠⎠ ⎝ ⎝ ⎠⎠ ⎛ π ⎛ 5 ⎞⎞ rtc = sin ⎜ ⎜ ⎟ ⎟ = sin (1.81 ⎝ 1. then … α = 40.50 ⎠ Using equation (34).52) ⎝ 2 ⎝ 15 ⎠ ⎠ rtc = 0.14 / 4 3.97 Another means of determining the testrachoric correlation coefficient is given by: ⎛ π ⎛ ad − bc ⎞ ⎞ rtc = sin ⎜ ⎜⎜ ⎟⎟ ⎟⎟ (36) ⎜2 ⎝ ⎝ ad + bc ⎠ ⎠ According to this method.Correlation Coefficient According to Data Classification Dr.50 ⎟ ⎝ 1 + 25 /100 ⎠ ⎝ 1 + 0. Louangrath The calculation for rtc follows: ⎛ 180 ⎞ rtc = cos ⎜ ⎟ ⎜ 1 + ( BC / AD ) ⎟ ⎝ ⎠ ⎛ 180 ⎞ ⎛ 180 ⎞ rtc = cos ⎜ ⎟ = cos ⎜ ⎟ ⎜ 1 + (5)(5) /(10)(10) ⎟ ⎝ 1 + 25 /100 ⎠ ⎝ ⎠ ⎛ 180 ⎞ ⎛ 180 ⎞ ⎛ 180 ⎞ rtc = cos ⎜ ⎟ = cos ⎜ ⎟ = cos ⎜ 1 + 0.33) ) = sin(0.57(0.50 α + 1 2.97 − 1 1.97 rtc = = = = 0.I.25 ⎠ ⎝ ⎠ ⎛ 180 ⎞ rtc = cos ⎜ ⎟ = cos(120) = 0.14 / 4 ⎝ BC ⎠ ⎝ 5(5) ⎠ ⎝ 25 ⎠ . Paul T.14 / 4 ⎛ AD ⎞ ⎛ 10(10) ⎞ ⎛ 100 ⎞ α =⎜ ⎟ =⎜ ⎟ =⎜ ⎟ = 43.97 α − 1 2.97 + 1 3.79 = 2.14 - . the calculation for alpha follows: π /4 3.50 The result of the calculation shows that equations (34) and (36): .

Doolittle and Yule. Paul T. The calculation under the Heidke measure follows: 2 ( a − PO PF ) 2 (10 − 15(15) ) 2 (10 − 225 ) sH = = = PO + PF − 2 PO PF 15 + 15 − 2(15(15)) 15 + 15 − 2(225) 2 (10 − 225 ) 2(−215) −430 sH = = = 30 − 450 −420 −420 sH = 1.57(39) ) = sin(61.14 ⎞ Sr = sin ⎜ (39) ⎟ = sin (1.02 The result does not seem to be accurate because the range of a correlation coefficient is between -1 and +1. In comparison. For convenience the following definition and value are given: PO = a + c = 10 + 5 = 15 PF = a + b = 10 + 5 = 15 The calculation for the Peirce measure follows: a − PO PF 10 − (15)(15) 10 − 225 −215 sP = = = = PO (1 − PO ) 15(1 − 15) 15(−14) −210 sP = 1.02 Again a different number is obtained.15 - . ⎜ 1 + ( BC / AD ) ⎟ ⎝ ⎠ equation (33) yields the following computation: ⎛π ⎞ Sr = sin ⎜ (4a − 1) ⎟ ⎝2 ⎠ ⎛π ⎞ ⎛π ⎞ ⎛π ⎞ Sr = sin ⎜ (4(10) − 1) ⎟ = sin ⎜ (40 − 1) ⎟ = sin ⎜ (39) ⎟ ⎝2 ⎠ ⎝2 ⎠ ⎝2 ⎠ ⎛ 3.I. The calculation under the Doolittle measure follows: . Heike.23) ⎝ 2 ⎠ S r = −1 The results of the computation are from various claims of method are not in agreement.Correlation Coefficient According to Data Classification Dr. Louangrath α −1 ⎛ π ⎛ ad − bc ⎞ ⎞ rtc = rtc = sin ⎜ ⎜⎜ and ⎟⎟ ⎟⎟ produce the same result and α +1 ⎜2 ⎝ ⎝ ad + bc ⎠ ⎠ ⎛ 180 ⎞ equation rtc = cos ⎜ ⎟ produces a higher coefficient. Below are the computation of the Peirce’s measure.

02 This result coincides with the Peirce measure.672 0.65 -1.672 0. The results of the various measures may be summarized thus: sP = 1. However.369 1. Paul T. except the sign for Sr . There is also a case where the categories of the .774 1.9425 0.I.02 sH = 1.00 The values appear to be consistent. Louangrath a − PO PF 10 − 15(15) 215 sD = = = PO (1 − PO ) PF (1 − PF ) 15(1 − 15)15(1 − 15) 15(14)15(14) 215 215 215 215 215 sD = = = = = 15(14)15(14) 15(14)210 210(210) 4410 210 sD = 1.672 -1.9425 0.672 0. the Sr value would point to an opposite meaning in interpretation.00.65 1.Correlation Coefficient According to Data Classification Dr. with the negative Sr . the interpretation of the correlation coefficient would otherwise be consistent. All estimates are consistent except Sr showing significant difference: −1.00 .30 0. The above analysis deals with correlation coefficient called tetrachoric for 2 × 2 contingency table.369 1.9425 0.16 - .30 The result under the Yule’s measure also exceeds 1. Laslty.65 .9425 0.348 0.672 0.00 0.348 0.65 Table 5.65 1.369 1.628 0.666 1.02 sD = 1.30 Sr = −1.65 1.02 sY = 1. Bias in each case is determined by: Bias = PF / PO = 15 /15 = 1.774 > 1. With the exception of the negative sign of Sr .02 0.0: Determining the level of significance of the for the variance estimates for the tetrachoric correlation. Xi X ( Xi − X ) S Zi = ( X i − X ) / S Zˆi 1.02 0.348 0.9425 -1.672 0.02 0. under the Yule’s method the calculation follows: a − PO PF SY = a ⎡⎣1 − 2 ( PO + PF ) + 2a ⎤⎦ + PO PF 10 − (15)(15) 10 − 225 SY = = 10 [1 − 2(15 + 15) + 2(10) ] + 15(15) 10 [1 − 2(30) + 20] + 225 −215 −215 −215 −215 SY = = = = 10 [1 − 60 + 20] + 225 10 ( −39 ) + 225 −390 + 225 −165 SY = 1.

. i. the purpose of the illustration here is to give an example of a latent variable. The variables X * and Y * are ordinal variables that are assumed to follow bivariate normal distribution.... The values of X * and Y * are known as the underlying or latent variables. F... These are ordinal data. ξ 2 . yn* } . the use of polychoric correlation is 15 suggested. X = 1st. First. Assume that there are discrete and random variables X and Y (note that there is no asterisk marking this “unobserved” set) which relates to X * and Y * where xi = k if ξk −1 < xi* ≤ ξ k and (37) yi =l if ηl −1 < yi* ≤ ηl (38) For xi the threshold ξ k comes from ξ1.. ξ k and ξ0 = −∞ and ξ k = +∞ . and Vila–Abad E (2010).Correlation Coefficient According to Data Classification Dr. .. Paul T. however. 153-166. nth and Y = 1st. X * = {x1* .. In this case. …. The corresponding correlation coefficient of multiple categorical data is polychoric correlation coefficient. I.I. collect the observations of X * and Y * . 2. the threshold for yi is given by ηl which comes from η1 + η2 + . y2* .. intelligence is the unobservable X and Y . Louangrath contingency table is expanded to K × L . Similarly. 2nd.. For instance. Barbero–García. Chacón–Moscoso. The corresponding values of the observed X * and Y * and the unobserved X and Y may be represented as: 15 Holgado–Tello... These are observed data.5 Polychoric Correlation Coefficient When both X and Y are ordinal data. …. IQ is the score from an indirect test used to determine intelligence. S.. These observations may be given as: ⎡ x* ⎤ ⎡ y* ⎤ ⎢ 1⎥ ⎢ 1⎥ ⎢ x* ⎥ ⎢ y* ⎥ X* = ⎢ 2⎥ and Y* = ⎢ 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ *⎥ ⎢ *⎥ ⎣ xn ⎦ ⎣ yn ⎦ Thus. and the categories of the data exceeds two. nth . x2* . IQ test scores are the observed data X * and Y * . 2nd. + ηl and η0 = −∞ and ηl = +∞ .P. xn* } and Y * = { y1* . Polychoric versus Pearson correlations in exploratory and confirmatory factor analysis of ordinal variables. These variables cannot be measured by direct observations. Quality and Quantity 44(1)..17 - . IQ test is a score obtained from a test battery intended to measure the level of intelligence. Polychoric correlation is an estimate of the correlation between two unobserved variables X and Y where both X and Y are continuous by using the observed variables X * and Y * as the basis.e.

The objective of polychoric correlation is to find the value for π KL . Pobs. . Recall that π is the joint probability of X and Y pair that is not observed. .. .Correlation Coefficient According to Data Classification Dr.1 Pobs. polychoric correlation can now be discussed. thus: ⎡ ⎤ ⎢π π1L ⎥⎥ ⎢ 11 π12 ⎢ π12 π 22 π 2L ⎥ (41) ⎢ ⎥ ⎢ ⎥ ⎢⎣π K1 π K 2 π KL ⎥⎦ The term π KL is called the discrete cell proportion.. recall that the tetrachoric 16 Ritchie-Scott used the notation as r × s in labeling the contingency matrix... With the above set up.... P2 K PF 2 . P2 K PF1 FORECAST C2 P21 P22 ..... In general.. PKK PFK Pobs. PKK . Zoran Pasoric and Josip Juras uses K × L . Common statistics designated the multivariable contingency table as K × K .2 ...... CK C1 P11 P12 ... the K × K or K × L contingency table is provided as: OBSERVATIONS C1 C2 . Polychoric correlation is developed as the result of the inadequacy of the tetrachoric 16 correlation to handle a K × L contingency table. Louangrath ⎡ x* → x ⎤ ⎡ y* → y ⎤ ⎢ 1 1 ⎥ ⎢ 1 1 ⎥ * ⎢ x2 → x2 ⎥ ⎢ y2* → y2 ⎥ X*→ X = ⎢ ⎥ and Y* → Y = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ * ⎥ ⎢ * ⎥ ⎣ xn → xk ⎦ ⎢⎣ yl → yη ⎥⎦ The joint distribution of the unobserved X and Y is given by: P [ x = k ] = pk for X and (39) P [ y = l ] = ql for Y (40) The cloud of the unobserved variables X * and Y * as defined by xi* and yi* may be projected onto a space.I.. CK PK1 PK 2 . K Polychoric Joint Probability . Paul T.18 - .

K −1 / P0. The notation above uses: Prow. Ritchie-Scott’s K × L handles polytomous tests. Pearson introduced tetrachoric correlation calculation as an attempt to obtain a quantitative measurement of a continuous variable. The objective of polychoric correlation is to obtained the correlation of the unobserved arrays X and Y from the product-moment or correlation of the observed arrays X * and Y * where xi* and yi* is jointly normally distributed. Note that there is switching of observation and forecast above. 17 Pearson. A. In 1900.. This alternation does not change the number of interpretation of the result..η1 ) .I. ρ ) = exp − (42) 2π 1 − ρ 2 ⎢ ⎣⎢ 2 1− ρ 2 ( ⎥ ⎦⎥ ) The cumulative distribution function (CDF) for the bivariate normal X * and Y * is given by: ξ1 ξ1 Φ 2 (ξi . the following generalization may be made: Bias = ( PF . Louangrath 17 correlation is confined to a 2 × 2 contingency table. Where as Pearson’s 2 × 2 tetrachoric handles dichotomous data.Correlation Coefficient According to Data Classification Dr. Paul T. The work was further expanded by Ritchie-Scott to cover a K × L scenario which became known as polychoric correlation today. Recall that ρ is the correlation of the observed X * and Y * where the threshold is (ξ1.19 - . the probability is: π11 = φ (ξ1. A new measure of rank correlation. However. y*. Biometrika. ρ ) dy * dx * (43) −∞ −∞ for xi = 1 and yi = 1 . K (1900).1.. thus..1 / P1i . Series A. The observed array X * and Y * are assumed to be bivariate normal. Pobs.ηi . and the correlation between X * and Y * is given by ρ (rho) given series of unobserved data set of xi* and yi* . The probability density function (PDF) of the bivariate normal X * and Y * is given by: ⎡ ⎤ 1 ⎢ x *2 −2 ρ x * y * + y * ⎥ φ ( x*. Pearson’s earlier attempt was confined to 2 × 2 18 scenario.” Philosophical Transaction of the Royal Society of London. 195: 1-47. column . ρ ) (44) The probability of xi = 1 and yi = 1 is a function of ρ . The polychoric correlation is estimated from the discrete cell proportion π KL . 18 Ritchie-Scott. From the cumulative distribution function Φ 2 .. “Mathematical contributions to the theory of evolution.η1. y*. On the correlation of characters not quantitatively measurable. this estimated value is designated as πˆ KL from the K × L contingency table (5)..k −1 ) . . ρ ) = ∫ ∫ φ ( x*.. (1918).. PF . 30:81-93. K −1 ) and Pobs = ( Pobs. Vii.

xn | µ . and (iii) Poisson distribution. hKL (θ ) ] ' .. π KL ] ' and the likelihood of θ may be generally stated as h[θ ] = [ h11 (θ ). Louangrath hkl (θ ) = Φ 2 (ξ k .Correlation Coefficient According to Data Classification Dr. 19 Olsson.. The maximum likelihood function for normal distribution is provided thus: 1 ⎡ ( x − µ )2 ⎤ f ( x1..θ1 ] = [ ρ .. ξ1. σ ) = ∏ exp ⎢ − i ⎥ (50) σ 2π ⎢ 2 ⎥ ⎣ 2σ ⎦ which may be make into a general statement as: −π / 2 ⎡ ∑ ( x − µ )2 ⎤ f ( x1. x2 . U... The appropriate form of maximum likelihood function type is one that is used for normal distribution. There are three kinds of maximum likelihood functions used according to the type of data distribution: (i) Bernoulli distribution. Paul T...η1.. Psychometriko..... Ordinal data is ranked data set.20 - .ηl ) − Φ 2 (ξ k −1... x2 . the polychoric equation may be written as: π = h(θ ) (48) 19 Olsson provides a close estimate of π as the maximum log likelihood which is equivalent to the estimated theta θˆ . Maximum likelihood estimation of the polychoric correlation coefficient..ηl −1 ) + Φ 2 (ξ k −1. Polychoric correlation deals with ordinal data.η 'L −1 ] (46) The likelihood for the discrete cell proportion is written as: π kl = hkl (θ ) (47) A general statement may now be made about π . xn | µ .. Now.ηl −1 ) (45) The term hkl stands for the likelihood of event k and l occurring and this likelihood is a function of theta θ and θ is given by: θ = [ ρ . now let π = [π11.. . therefore.... Olsson’s maximum log likelihood is given by: K L ln L = ∑ ∑ π kl loghkl (θ ) (49) k =1 l =1 Recall that theta θ is the likelihood function... (1979). (ii) normal distribution.ηl ) − Φ 2 (ξ k . σ ) = ( 2π ) exp ⎢ − i ⎥ (51) ⎢2 ⎥ σ n ⎣ 2σ ⎦ The maximum likelihood is express as the natural log of the function. ξ K −1.I. 44: 443-460..

The objective of the Mann-Whitney U Test is to verify the claim that the standard deviation of population A is the same as the standard deviation of population B.6 Rank Biserial Correlation Coefficient rrb In a case where Y is dichotomous and X is rank data. If the null hypothesis is true. rank biserial correlation 20 coefficient is used. it is an adequate explanation. the data is distributed as Mann-Whitney U. for purposes of demonstrating how the maximum likelihood is calculated in the context of the maximum likelihood of πˆ KL . the derivative of the maximum likelihood function is taken by: ∂ (ln f ) ∑ ( xi − µ ) = =0 which gives the expected mean as: (53) ∂µ σ2 ∑ xi µˆ = (54) n Using the same rationale. nevertheless.Correlation Coefficient According to Data Classification Dr. This may be a biased estimate. the expected standard deviation follows: 2 ∂ ( ln f ) n ∑ ( xi − µ ) =− + =0 (55) ∂σ σ σ3 2 ∑ ( xi − µˆ ) σˆ = (56) n The above steps obtained the maximum likelihood of mean and standard deviation as the mean and standard deviation of the sample. if so. The formula for the rank biserial correlation is given by: ⎛ M − M0 ⎞ rrb = 2 ⎜ 1 ⎟ (57) ⎝ n1 + n0 ⎠ where the subscripts 1 and 0 refers to the score of 1 and 0 in the 2 × 2 contingency table. . i.).e.21 - . M is the mean of the frequency of the scores. Glass and Kenneth D.I. The null hypothesis is that rrb = 0 . Gene V. then the two populations are identical. Hopkins (1995). meaning there is no correlation. and n is the sample size. Statistical Methods in Education and Psychology (3rd ed. 2. Louangrath 2 1 ∑ ( xi − µ ) ln f = − n ln(2π ) − n ln σ − (52) 2 2σ 2 To find the expected mean of the function. ISBN 0-205-14212-5. Allyn & Bacon. Paul T. the 20 Viz. except for their locations.

i. < nN . their means are different. Paul T. 4. Therefore. must equal to zero. 2. This number is fixed at 0. The logic follows that “if the two standard deviations are the same.e. calculate the test statistic for the Mann-Whitney U test according to the formula below: ⎛ n1 ( n1 + n2 + 1) ⎞ ⎜ ⎟ Z = W1 − ⎜ 2 ⎟+C (59) ⎜ Sw ⎟ ⎜ ⎟ ⎝ ⎠ where … n1 W1 = ∑ Rank ( X lk ) (60) k =1 This ( W1 ) is called the rank sum.I. i. The sample size of the two samples may be equal or unequal.e.. However.22 - . conventional practice dictates that treat the largest sample as n1 and the smaller sample as n2 . Combine the two samples into one array. Each step is explained below thus: 1. one set as shown below: n = n1 + n2 (58) 3.. Mark one sample as n1 and the second sample n2 .” The null hypothesis ( H 0 ) states the obverse: “the two populations are different.50 if the numerator if Z is negative and -0. The claim by the alternative hypothesis ( H A ) is that the two populations are the same and have the same population standard deviation. t2 = number of observations tied at value two.. this is a case that could be termed parallel group.e. Rank the combined sample ( n ) in an ascending order.” The procedure for conducting the Mann-Whitney U test involves five steps. C = correction factor. Louangrath populations in two cities have the same income. σ1 − σ 2 ≠ 0 . Collect a sample from each population. The case involves two population located at a different place. from low to high so that the elements of the set is arranged as: n1 < n 2. i. It does not mater which one is designated as the first or the second sample.Correlation Coefficient According to Data Classification Dr.50 if the numerator of Z is positive. σ1 − σ 2 = 0 . . The standard deviation of the ranked set is given by: ⎛ ⎛ ⎞ ⎞ ⎜ n1n2 ⎜ ∑ ti3 − ti ⎟ ⎟ n1n2 ( n1 + n2 + 1) ⎝ i =1 ⎠ Sw = − ⎜ ⎟ (61) 12 ⎜ 12 ( n1 + n2 )( n1 + n2 − 1) ⎟ ⎜ ⎟ ⎝ ⎠ where t1 = number of observations tied at value one. there difference. and so on.

“Ordinal” means that the data score. 22 Everitt. i.Correlation Coefficient According to Data Classification Dr. and H A : σ < 0 . 2000). 95. and He. The Z equation is given by: x −µ Z= (63) σ/ n From equation (43).” Journal of the American Statistical Association Vol. Louangrath 5. Use the following decision rule to determine whether to accept or reject the null hypothesis: H A : σ = 0 . H A : σ > 0 . (2002). 21 “Robustness” means that the final result is not unduly affected by the outliers. Outliers are extreme value. This problem is eliminated through the use of ranking the data by arranging the combined sets of n = n1 + n2 into one set ranking from lowest value to highest value. the decision rule is governed by Z < Zα . Generally. The Cambridge Dictionary of Statistics. and third are not equal. For example. the population standard deviation may be written as: 21 Portnoy S. the decision rule is governed by Z < Zα / 2 or Z > Z1−α / 2 . 1st place. p. a scale of 1 (lowest) to 5 (highest) would not be able to use the Mann- Whitney U test. 1331–1335. Whereas. Recall that the conventional t-test is given by: x −µ t= (62) S/ n The Mann-Whitney U Test requires the comparison of the population standard deviations σ1 − σ 2 = 0 . extreme value tends to create bias estimate by the estimator because outliers or extreme values creates greater variance and thus larger standard deviation. The estimator is desirable if it yields an optimal result. Cambridge University Press. . where the distance between the first. 2nd place. Recall further that in order to determine the population standard deviation one must use the Z-equation. “Efficiency” is the measure of the desirability of the estimator. (ii) the values of the data is non-parametric. No. X. The Mann-Whitney U test is used in the following cases: (i) the test involves the comparison of two populations.23 - . If the system is robust. 128.I. and (iv) it is said that the Mann-Whitney test is more robust and efficient than the conventional t-test. the decision rule is governed by Z > Z1−α / 2 . it is an alternative test to the conventional t-test. it will not be affected by extreme value.e. 452 (Dec. and 3rd place type of answer choice. Brian S. ISBN 0-521-81099-X.. has the spacing between each score unit is unequal or non-constant. second. therefore. (iii) the data is classified as ordinal and NOT interval scale. Paul T. answer choice. It yields the optimal result it the observed 22 data meets or comes closest to the expected value. “A Robust Journey in the New Millennium. may be appropriate for this test.

2.33 Note that equations (66) and (67) is equivalent to: χ2 φ= (68) n where χ = 2 (n − 1) S 2 or χ = ∑ 2 ( Oi − Ei )2 . equation (59). Paul T. 625 225 rphi = −0. The value of µ is derived from equation (62) as: ⎛ S ⎞ µ =t⎜ ⎟− x (65) ⎝ n⎠ Therefore. even the Mann-Whitney U test statistic.Correlation Coefficient According to Data Classification Dr. the researcher must understand the underlying functions and steps to illustrate the logic of the Mann-Whitney U Test.24 - .I.7 Phi Correlation Coefficient φ In case where the data of X and Y are both nominal. The phi equation is given by: pa − p X pY rphi = φ = (66) ( p X pY (1 − p X )(1 − pY ) ) There is an equivalence of equation (66) by contingency coding method of blocks ABCD in the table: rphi = ( BC − AD ) (67) ( A + B )( C + D )( A + C )( B + D ) The calculation according to equation (67) follows: rphi = ( BC − AD ) ( A + B )( C + D )( A + C )( B + D ) rphi = ( 25 − 100 ) = −75 = −75 (15)(15)(15)(15) 50. the phi correlation coefficient is used. shows no use of the t-equation and Z-equation. Louangrath ⎛ x −µ ⎞ σ =⎜ ⎟ n (64) ⎝ Z ⎠ Note that the population standard deviation in equation (64) may not be determined unless the conventional t-equation (62) is used to determine the population mean ( µ ). σ2 Ei .

“Measure of Association for Cross Classifications. For a 2 × 2 table. Zero means there is no association between the independent and dependent variables. 268 (Dec 1954). i.8 Pearson’s Contingency Coefficient C Another case where both X and Y are nominal data. & Albaum. . No. & Kruskal. Paul T. 2. S. C does not reach this range.” Journal of the American Statistical Association. more categories has to be added. it can reach 0..Correlation Coefficient According to Data Classification Dr. William H. 631. 16. G. and one means there is a perfect association between the two variables. The range of lambda is 0 ≤ λ ≤ 1 . P. C. To the extent that it is applicable to “reliability. This type of correlation measurement is used for the measurement of association. however.25 - . the Pearson contingency 23 coefficient is used.e. The lambda equation is given by: ε1 − ε 2 λ= (70) ε1 where ε1 is the overall non-modal frequency. The Goodman and Kruskal (GK) polytomy is described by A × B crossing in the table below: B A B1 B2 Bβ Total A1 ρ11 ρ12 ρ1β ρ1i A2 ρ 21 ρ 22 ρ2β ρ 2i 23 Pearson. (2004) Fundamentals of marketing research. pp. S. 24 Smith. 732- 764. Karl (1904). Leo A. Louangrath 2. the range for correlation coefficient is -1 and +1. (1954). Goodman and Kruskal deals with optimal prediction of two polytomies (multiples) which are asymmetrical where there is no underlying continua and no 25 ordering of interest.9 Goodman and Kuskall Lambda λ The Goodman and Kruskal’s lambda is a measurement of reduction in error ratio.I.707 and 0. This is known as Pearson’s C which is given by: χ2 C= (69) N + χ2 where χ 2 is chi square and N is the grand total of observations. CA. 25 Goodman. p. In order to reach the interval 24 maximum.870 for 4 × 4 table. Generally. and ε 2 is the sum of the non-modal frequencies for each value of independent variable.” GK’s lambda is usable only if reliability is defined in terms of association of polytomies. the answer to the survey question contains more than 2 choices. Sage: Thousand Oaks. Vol 49.

253-318 in Horst. 742.).Correlation Coefficient According to Data Classification Dr. See Goodman & Kruskal (1954). p.... The marginal proportion ρα i is the proportion of the population classified as Aα and ρi β is the proportion of the population classified as Bβ . B2 . 27 Guttman. Bβ ) . The meaning of λa is “the relative decrease in probability of error in guessing Aa as between Bb unknown and 26 known..0: A divides the population into alpha ( α ) classes where α : ( A1. 734. Social Science Research Council. Louangrath Aα ρα 1 ρα 2 ραβ ρα i Total ρi1 ρi 2 ρi β 1 Table 6. Goodman and Kruskall originally proposed the measure of association as: P(e1 ) − P(e2 ) λb = (71) P (e1 ) which can be written as: ∑ ρam − ρim λb = a (72) 1 − ρi m The expression above is the relative decrease in probability of error from Bb as between Aa unknown and Aa known. p. New York (1941).26 - . Goodman and Kruskal defined λα as: ∑ ρmb − ρmi λa = b (73) 1 − ρ mi where: ρ m = Max ρ a i and ρ mb = Max ρ ab a a The interpretation of the meaning of λa is opposite of λb . B divides the population into beta ( β ) classes where β : ( B1. The value λb gives the error proportion which can be eliminated when A is known. Paul et al.. “An outline of the statistical theory of prediction. The proportion that classified as both Aα and Bβ is ραβ . A2 . Bulletin 48..I. . The Prediction of Personal Adjustment. pp.. Luis (1941). Paul T.” Goodman and Kruskal stated that the value of λa and λb were given by 27 Guttman from which they derived the following lambda: 26 Goodman & Kruskal. (eds. Similarly. Aα ) ..” Supplementary Study B- 1.

Paul T. See Kendall. Maurice G. Goodman and Kruskal provided the calculation as: 28 Goodman & Kruskal. Charles Griffin and Co. The range of GK’s lambda is 0 ≤ λ ≤ 1 .” Under this general definition. 743. Louangrath ⎡ ⎤ 0. Ltd.5 ⎢ ∑ ρ am + ∑ ρ mb − ρi m − ρ mi ⎥ λ= ⎣⎢ a b ⎦⎥ (74) 1 − 0. The Advanced Theory of Statistics. 1948. p.5 ( ρi m + ρ mi ) The lambda proposed by Goodman and Kruskal lies between λa and λb described by Guttman. Goodman and Kruskal alternatively the terms in lambda by “[l]et v be the total number of individuals in the population.. London. (1948). and so on. . p.27 - . B A B1 B2 B3 B4 va i A1 1768 807 189 47 2811 A2 946 1387 746 53 3132 A3 115 438 288 16 857 vib 2829 2632 1223 116 v = 6800 Table 7. vam = v ρ am .0: The numerical examples are taken from Kendall’s work and also was reproduced in Goodman and Kruskal’s article (p. 744). 28 vab = v ρ ab .Correlation Coefficient According to Data Classification Dr. vmb = v ρ mb . 300.I. the Guttment λa and λb becomes: ∑ vam − vim λb = a and (75) v − vi m ∑ vmb − vmi λa = b (76) v − vmi The general GK’s lambda then is given by: ∑ vam + ∑ vmb − vim − vmi λ= a b (77) 2v − ( vi m + vmi ) The following table demonstrates GK’s new definition of the population and its components.

GK’s lambda is not a tool for instrument calibration.2076 3668 + 3971 7639 It is tempted to treat GK’s lambda λ as the average of Guttman’s λa and λb .4165 λ= = = = 0. Is this reliability test relevant to the instrument itself or the entire score set produced by the survey? It is worth noting that the issue of reliability.28 - . For that reason. the following calculation shows: λa + λb 0. fulfills the requirement of replication. measures association between two polytomies: Aa and Bb .2241 6800 − 3132 3668 3593 − 2829 764 λb = = = 0. If the error ratio is reduced.2241 + 0. it is said that the study is reliable. however. Paul T. the issue of reliability attests to the efficacy of the instrument. Only if reliability is measured as the degree of association would GK’s lambda be a usable tool for reliability analysis. GK’s lambda may be preferential to Guttman’s λa and λb which requires a two-step process. as well as Guttman’s λa and λb . .0006 or 0. GK’s is not a tool for testing the reliability of the survey. On the issue of survey reliability.1924 6800 − 2829 3971 The calculation for GK’s lambda follows: 822 + 764 1586 λ= = = 0.065%. i. In reliability test. It is a good approximation. For instrument assessment.2083 − 0.Correlation Coefficient According to Data Classification Dr.e. does it give predictable result or scores? GK’s lambda. the GK’s lambda is a tool to measure the degree of reliability through the interpretation of the reduction of the error ratio. when it relates to the survey. GK’s lambda is a tool to measure association of two random variables.2083 2 2 2 There is a minor difference of 0.2076 = 0.I. On the issue of instrumentation.1924 0. Louangrath v1m = 1768 vm1 = 1768 v2m = 1387 vm 2 = 1387 v3m = 438 vm3 = 746 vm 4 = 53 vi m = 2829 vmi = 3132 The calculation for Guttman’s λa and λb follows: 3954 − 3132 822 λa = = = 0.