© All Rights Reserved

0 views

© All Rights Reserved

- Correlation Regression
- Worded Problems
- STI0903 - PSD Postprocessing 2
- Regression Analysis
- 25568585 1 1 Statistics in Engineering
- Principles and Risks of Forecasting-Robert Nau
- A-PowerPoint®-based-guide-to-assist-in-choosing-the-suitab-le-statistical-test.
- Glossary Research Terms
- Role of Executing E-commerce for Small and medium enterprises in Danang city
- Class2 Correlation
- Project Work 2012
- exam1openend
- International Journal of Business and Management Invention (IJBMI)
- MAHM6e Ch03.Ab.az
- A Perspective on Statistical Tools for Data Mining Applications
- Factors That Influence the Corporate Governance - The Portuguese Reality
- A PowerPoint Based Guide to Assist in Choosing the Suitab Le Statistical Test.
- Session 11 b
- BUS601 Syllabus Gilliland SP15(1)
- QNT 561 Final Exam : QNT 561 Final Exam Answer | Studentehelp

You are on page 1of 16

LEO A. GOODMAN

ABSTRACT

Under certain specified conditions, ecological data, e.g., the percentage of non-whites and the percentage

in domestic service for different community areas, can be used to estimate the "individual correlation"

between two dichotomous classifications, e.g., the non-white-white classification and the domestic service-

other than domestic service classification. Quite accurate estimates are obtained for some data on the relation

between color and occupation, somewhat less accurate estimates on the relation between color and literacy.

For some situations where the specific conditions here described are not met, other conditions are present-

ed that lead to different methods for estimating the individual correlation from the ecological data. These

methods are developed for the study of the individual correlation between two dichotomous variables,

between two qualitative variables where one of them is dichotomous, and between two quantitative vari-

ables.

The present article discusses some of the ferred to the variables describingproperties

results reportedin Robinson's paper on eco- of individuals, while "ecological data" re-

logical correlation2and explores further the ferred to the ecological variables describing

suggestions in earlier notes by the present properties of groups. An "ecological regres-

author3 and by Duncan and Davis.4 The sion" study is a standard regressionanalysis

terminologyused in these earlierpaperswill, for ecological variables. The problems of

for the sake of convenience, be used here, "aggregation," as discussed in some of the

although it does have some disadvantages. economics literature,6are related somewhat

"In an ecological correlation. . . the vari- to the mathematical problemsthat have ap-

ables are . . . descriptive properties of peared in the discussion of ecological and

groups.... An individual correlation is a individual correlations, although the ter-

correlation in which the. .. variables are minology of this literature is quite different

descriptive properties of individuals....") from that of the papers referred to earlier.

The phrase "behavior of individuals" re- The variables in an ecological correlation

I The research was carried out at the Statistical

are usually quantitative (e.g., percentages

Research Center, University of Chicago, under the or means for each of the 48 states), while the

sponsorship of the Statistics Branch, Office of Naval variablesin an individual correlationmay be

Research, and of the Social Science Research Com- qualitative (e.g., race of each individual) or

mittee, University of Chicago. Reproduction in quantitative (e.g., height of each individ-

whole or in part is permitted for any purpose of the

United States government. I am indebted to

ual). The ecological correlation coefficient

R. Blough, who assisted with the numerical com- used in the earlierpapers was the Pearsonian

putations; and to 0. D. Duncan, P. F. Lazarsfeld, correlationcoefficientfor the joint distribu-

J. S. Coleman, Z. Griliches, Y. Grunfeld, P. M. tion of two quantitative ecologicalvariables.

Hauser, P. H. Rossi, and H. Zeisel for helpful com- These papers dealt mainly, though not ex-

ments.

clusively, with the situation in which both

2 W. S. Robinson, "Ecological Correlations and

variables in the individual correlationstudy

the Behavior of Individuals," American Sociological

Review, XV (1950), 351-57. were qualitative and dichotomous and the

3 Leo A. Goodman, "Ecological Regression and

individual correlation coefficient used was

Behavior of Individuals," American Sociological Re- the fourfold-point (q) correlationcoefficient

view, XVIII (1953), 663-64. for the cross-classificationtable describing

4 Otis Dudley Duncan and Beverly Davis, "An

the joint distribution of the two dichoto-

Alternative to Ecological Correlation," A merican 6 For example, H. Theil, Linear Aggregations of

Sociological Review, XVIII (1953), 665-66.

Economic Relations (Amsterdam: North-Holland

5 Robinson, op. cit., p. 351. Publishing Co., 1954).

610

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

ALTERNATIVESTO ECOLOGICALCORRELATION 611

mous variables.7 The present article will ferences about "individualbehavior"will be

also study this situation, as well as situa- used to estimate ind'ividual correlations.

tions in which both variables considered in (Since the individual correlationcoefficient,

the individual correlation study are quanti- X$,may not be an appropriatemeasureof as-

tative, or where both are qualitative and one sociation in many situations,"3the author's

is dichotomous. Situations in which one note did not discuss individual correlations

variable is quantitative and the other quali- explicitly but rather inferences about "indi-

tative, or where both are qualitative but vidual behavior." However, since the indi-

neither dichotomous, will not be considered vidual correlation coefficient may some-

here.8 times be an appropriatemeasure, it will be

It has been shown that ecological correla- investigated here. The general method de-

tions cannot be used as substitutes for indi- veloped here can also be applied to situa-

vidual correlations.9 However, ecological tions in which some other measure of associ-

correlations may be of interest in them- ation is of interest.) This article will also ex-

selves; the kinds of questions that can be plore in some detail the method presented in

answered by a study of ecological correla- the note by Duncan and Davis14 and will

tions are sometimes of direct concern to so- suggest a few techniques that lead to further

cial scientists.10In some problems, both the insight into it."s

ecological and the individual correlations If individual correlationsare of interest,

and the relations between them may be of it is best to obtain the directly relevant data

interest. Even if the investigator is con- on individual behavior rather than ecologi-

cerned only with individual correlations, cal data. For example, if the individual cor-

ecological data may be of service, though relation between color (Negro-white) and

ecological correlations are not recom- illiteracy (illiterate-literate) is of interest,

mended." the appropriate data would be a fourfold

The author's earlier note'2 showed that, table describing the cross-classification of

under very special circumstances,the analy- individuals according to Negro-white and

sis of the regressionbetween ecological vari- illiterate-literate categories.'" However, in

ables may be used to make inferencesabout some situations this table may not be avail-

"individual behavior," i.e., about the un- able; thus the fourfold-pointcorrelationco-

known data, for a population of individuals, efficient cannot be computed from it. How-

describing the cross-classificationof two di- ever, the marginaltotals (i.e., the numberof

chotomous attributes. In the present ar- Negroes, whites, illiterates, and literates) for

ticle, the general approach presented in the the total Negro-white population and also

note will be developed further, and the in- for the Negro-white populations of various

7 See, e.g., Helen M. Walker and Joseph Lev, subdivisions of the country may be known.

Statistical Inference (New York: Henry Holt & Co., Using these ecological data, methods will be

1953), p. 272, and Leo A. Goodman and William H. presented for estimating the data, which

Kruskal, "Measures of Association for Cross Classi- would have appeared in the table, and the

fications," Journal of the American Statistical Associ-

ation, XLIX (1954), 732-64, esp. 739. fourfold-point correlation coefficient for it.

8 See Goodman and Kruskal, op. cit., pp. 735-38,

These methods can also be used to estimate

for a description of some distinctions between these 13 See Goodmanand Kruskal,op. cit., for further

situations.

discussionof this point.

9Robinson, op. cit., p. 357; Goodman, op. cit., p.

663. 140P. cit.

10Herbert Menzel's "Comment" on Robinson's 15 Cf., Hanan C. Selvin, "Durkheim's'Suicide'

674; Goodman, op. cit., p. 663. Journal of Sociology, LXIII (1958), 615-18. Selvin

11Duncan and Davis, op. cit., p. 665; Goodman, refersto the resultspresentedin an unpublishedver-

op. cit., p. 664. sion of the presentarticle.

12 Op. Cit. 16See, e.g., Robinson, op. cit., p. 353, Table 1.

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

612 THE AMERICANJOURNALOF SOCIOLOGY

the non-available data and the correspond- A and B. The variance aJ(y Ix) of the ob-

ing correlation coefficient for each subdivi- served y values from the straight line will

sion of the country. These methods are depend on the variance a2(plx) computed

simple, but they cannot be expected to lead for the probability distribution of the pro-

to as accurate estimates as those obtained portion p illiterate among Negroes for popu-

from relevant data on individual behavior. lations with the same proportion x of Ne-

On the other hand, if the ecologicaldata are groes; the variance o-2(rIx) of the r for popu-

easily available, then the amount of com- lations with a given x value; the covariance

putation involved in using the methods sug- Cov(p,r x) of these two proportions; and

gested here costs very little in comparison the distribution of the x values. If a2(y Ix) is

with the cost of obtaining the directly rele- not approximatelyconstant for the different

vant data from a special study. x values, it will sometimes be worthwhile to

Ecological regression.-The proportion, modify the standard regressionmethods by

y, of individuals in the Negro-white popula- the use of a "weighted regression."'8(An-

tion who are illiterate may be written as other kind of modification, which will some-

y = xp + (1 - x)r, where x is the propor- times be appropriate,can be based on meth-

tion in the population who are Negro, p is ods developed for the situation of "linear

the proportionof Negroes who are illiterate, regressionwhere both variates are subject to

(1 - x) is the proportion in the population error"19rather than for the standard linear

who are white, and r is the proportion of regression.) For a given x, o-2(yx) = o2(p I

whites who are illiterate. Thus y = r + x)x2+?2(r Ix)(1 -x)2 + 2Cov(p, rI x) x(1 -

(p - r)x = a + bx, where a = r and b = x). Thus, under the present assumptions,

p - r. Hence, if differentpopulationsor areas x2(yI X) = 0 only when all the p and r values

are consideredwhere the proportionp is the equal B + A and A, respectively, or when

same for each of these populations and also there is a specific negative linear relation-

the proportionr is the same for these popula- ship between p and r, for each x; viz.,

tions, then there will be an exact linear rela- px+r(1-x) =A +Bx, or p=A +

tionship, y = a + bx, between the values of B - (r - A) (1 - x)/x. [In the final section

y and x for the different populations (as- herein, different assumptions are made,

suming that not all the values of x are which lead to quite different kinds of situa-

equal), where the slope will be b = p -r, tions in which it is possible that o2(yIx) =

and the y-intercept will be a = r. This O.]

straight line could be used to determine The expected values will be constant, and

r = a and p = b + a. the variances will be small, when the prob-

In practice, the actual values of p and r ability of illiteracy, say, is much more a

will not be constant, but it may be the case function of color (i.e., it depends on whether

that the average E(p Ix) of the values of p, a person is white or Negro) rather than a

for populations with the same proportion x function of the ecological area being consid-

of Negroes, is constant [i.e., E(p x) is the ered. Where the phenomenon under investi-

same for different values of x] and the aver- gation is more a function of the area (i.e., the

age E(r Ix) of the values of r, for populations 18 See, e.g., R. L. Anderson and T. A. Bancroft,

with the same x value, is also constant. In

Statistical Theory in Research (New York: McGraw-

this situation, the main assumption of linear Hill Book Co., Inc., 1952), pp. 182-86.

regression analysis, E(ylx) = A + Bx, 19 See, e.g., M. G. Kendall, "Regression, Struc-

holds true, where A = E(r Ix) and B = ture, and Functional Relationship," Parts I and II,

E(p Ix) - E(r Ix). Thus standard methods Biometrika, XXXVIII (1951), 11-25, and XXXIX

of linear regression"7can be used to estimate (1952), 96-108; D. V. Lindley, "Regression Lines

and Linear Functional Relationship," Journal of the

17See,e.g., WilfridJ. Dixon and FrankJ. Mas- Royal Statistical Society, Suppl., IX (1947), 219-44;

sey, Jr., Introductionto StatisticalAnalysis (New J. W. Tukey, "Components in Regression," Bio-

York:McGraw-HillBook Co., 1951),chap. xi. metrics, VII (1951), 33-70.

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

ALTERNATIVES TO ECOLOGICAL CORRELATION 613

p and r values differ widely in the different tively, where NX is the total number of

areas) than a function of color, the methods Negroes.

presented here are not recommended;how- Since p and r lie between 0 and 1, it is

ever, in some situations the variance of p desirable that the estimates P and P also lie

and r may be sufficiently small for them to between 0 and 1. When this is not the case,

be applicable,while in others the variance of the underlying assumptions should be re-

p and r for a particular subset of ecological examined, although it is possible to obtain

areas (e.g., for the states in a given geo- such estimates even if these assumptionsare

graphic division of the United States) or for satisfied. A method for dealing with this

a set of combined ecological areas (e.g., for situation was suggested in the author's ear-

the nine geographic divisions of the United lier note.

States) may be sufficiently small for the The estimated proportion? = d + &X =

present method to be applied to the subset O6X+ P(1 - X) of illiterates in the Negro-

of ecological areas or to the set of combined white population should be close to the

areas. If the variance of p and r for the states known proportion, Y, of illiterates. If this is

in each division of the United States is not the case for a given set of data, this

small, then the methods may be applied to method is not recommended.In the special

obtain separate estimates for each division, case where Y is, in fact, equal to the aver-

thus obtaining estimates where, in a certain age, y, of the illiteracy proportions in the

sense, geographic divisions have been held various ecologicalareas and X is equal to the

constant. average, x?,of the proportions Negro in the

If the scatter diagram of y and x does not ecological areas, then this check on the un-

suggest a linear relation between y and x, derlyingassumptionsof the method does not

then the present strategy is not recom- apply, since in this case P = d + '& =

mended (unless the scatter diagramof y and y = F, even if the assumptions are not met.

x for a subset of the areas or for a set of com- A method for determining roughly whether

bined areas suggests linearity). If the scatter or not Y is sufficiently close to F will be

diagram does suggest a linear relation, then mentioned later in this section.

it may be applicable, but it is still possible Rather than compute the correlation co-

that the variances of p and r are large. In efficientc directly for the estimated fourfold

this case, the present method leads to esti- table by the usual formula, a simplifiedfor-

mates of E(p Ix) and E(r Ix) when these mula is b+VX(1- X)/I(1 - i). Since Pwill

average (expected) values are constant, but be close to Y when this general approach is

the estimate of the individual correlationfor applicable, it will not matter much whether

the total population based on these esti- or not Y is replaced by the known propor-

mates of expected values may be quite poor tion Y. Thus an estimate of the fourfold-

if the variance of p and/or r is large. point correlationis

From the scatter diagram of the per cent

Negro and per cent illiterate in different c = bVX(1 - X)/Y(1- Y).

areas, the slope B and the y-intercept A can

be estimated by the usual methods of linear Following standard correlation theory,

regression, obtaining the estimates b and 4, the ecological correlation can be computed

respectively. Then E(r lx) and E(p Ix) can by multiplying b by the ratio of the standard

be estimated by r = 4 and 5 = o + r = deviation of the proportions of Negroes in

b + d, respectively. The numbers of illiter- the ecological areas and the standard devia-

ate Negroes, illiterate whites, literate Ne- tion of the proportions of illiterates there.

groes, and literate whites (i.e., the four en- Since this ratio will usually be very different

tries in the fourfold "individual behavior" from VIX(1 - X)/Y(1 - Y) when ecologi-

table) are estimated by p'NX, rN(1 -X) cal data are used, the ecologicaland individ-

(1 - p)NX, and (1 - r')N(l - X), respec- ual correlations will usually be very differ-

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

614 THE AMERICANJOURNALOF SOCIOLOGY

ent. However, the ecological correlation ence between 2-and Y to determnine roughly

might also serve as a rough measure of whether or not 2-is sufficiently close to Y,

whether the underlyingassumptions are not which is another partial check on the under-

satisfied for a particular set of data;20the lying assumptions.

present method is not to be recommendedif The formulaspresentedabove must be in-

it is rather small in absolute value. terpreted with caution, since they compute

The estimates b and a will be unbiasedes- only the variance of the estimate from its

timates of B and A, respectively, if expected value, whereas the difference be-

E(y Ix) A + Bx. If U2(y Ix) does not de- tween the estimate and the actual (rather

pend on x (i.e., the special case of "ho- than the expected),population value would

moscedasticity"), then the estimates b and d be of greater interest. Furthermore, if ho-

are the "best" unbiased estimates. When moscedasticity cannot be assumed, the nu-

02(yIx) is not constant, which will usually be merical values obtained by the formulas

the case, the estimates will still be unbiased, may be in error.In developing the formulas,

but they may not be "best." When ho- it was not necessary to make any assump-

moscedasticity can be assumed and each y tions about the distribution of y for given x

value, given x, is a statistically independent except that of homoscedasticity and linear-

observation, the variances of the estimates ity of regression.If, in addition, the distribu-

of 6 and a are 2(b) G2(yjx)/nc_2(x) and tion of y for given x is a normaldistribution,

then it is also possible to obtain confidence

~~2(yIx)n 2

intervals based on b, p, r, Y, and c, using the

-2 (y 2 (x)

(72 (8) = I X)

variance formulas that are given above.

Our approach must begin, in each case,

where &2(x)is the variance of the observed x with a carefulexaminationof the underlying

values and n is the numberof observations. assumptions;23however, the only necessary

The variancesof p, r, x, and c can be written assumptionfor the justification of the use of

as follows: o2Qp) c2(y) + (1- the point estimates b, p ,Y, , and c is that p

2r= q2(d) =2(y) + 2 y2(b) -2(g) = and r must be more or less constant for the

2(y) + (X - )2af2(b), c2(6) =

2(b)X(1- different ecological areas in such a way that

X)/ Y(1- Y), where c2(y) = o2(yJx)/n. the standard linear regressionrmodel can be

These variances are all proportional to applied.

o2(y X), whichdependson o2(p I X), -2(r Ix), If the proportionz of Negroes among the

Cov(r,p Ix), and the distribution of x. When illiterates is approximately constant and if

homoscedasticity can be assumed, the vari- the proportionv of Negroes among the liter-

ances,o'2(p), a2() 2(y), 0y2(C), can be esti- ates is also approximatelyconstant, then an

mated, in an unbiased manner, by replacing analogous approach to the one presented

I2(yI X) in the formulasgiven above by the here could be used with the same ecological

mean-square deviation of the observed y data to obtain estimates of the proportionsz

values from the least-squaresregressionline, and v and the individual correlationc. Thus

y = c + bx.22The estimated variance of Y this approach may lead to two quite differ-

can be used along with the observed differ- ent estimates of c; the choice between them

20 A statistical test for linearity of regression for

should depend upon whether p and r are

certain kinds of data is described in Walker and Lev, more constant than z and v (see comments in

op. cit., pp. 245-46. the final section herein).

21 A somewhat different, but equivalent, set of 23 Relevant are two earlier

papers: Frederick F.

formulas is given, for example, in Alexander McFar- Stephan, "Sampling Errors and the Interpretation

land Mood, Introduction to the Theory of Statistics of Social Data Ordered in Time and Space," Journal

(New York: McGraw-Hill Book Co., Inc., 1950), of the American Statistical Association, XXIX

p. 294. (March 1934 Suppl.), 165-66, and Frank Alexander

22See, e.g., the discussion of regression analysis in Ross, "Ecology and the Statistical Method," Ameri-

Dixon and Massey, op. c can Journal of Sociology, XXXVIII (1933), 507-22.

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

ALTERNATIVES TO ECOLOGICAL CORRELATION 615

consider the example discussed by Robin- gested by Duncan and Davis, are -.07 and

son,24where, for the Census Bureau's nine +.60.26 Thus the estimate 6 = .38, while not

geographicdivisions of the United States in very close, is closer to the known value of

1930, the ecological correlationbetween the the individual correlation than are the eco-

per cent illiterate and the per cent Negro logical correlation (.95) and the bounds.

for the divisional Negro-white populations, We shall now consider the numerical ex-

ten years old and over, was .95, while the ample on the relation between color (non-

individual fourfold-point correlationfor the white-white) and occupation (domestic

2 X 2 table giving the cross-classifiedcolor- service-other than domestic service) for em-

illiteracy data for the corresponding total ployed females in Chicago in 1940, which is

Negro-white population was .20.25Using the discussed in the note by Duncan and Davis.

present approach, we see that the graph The individual correlation was .29, while

(Robinson's Fig. 1) for the nine geographic their method, when applied to the available

divisions looks more or less linear and that ecological data for community areas, led to

the slope t) = .25 and the y-intercept d = the bounds .126 and .355. The scatter dia-

.02. Also, the estimated proportion, Y, of il- gram for the proportionin domestic service

literates in the total population is .04, which and the proportion non-white among the

is, in fact, equal to the known proportion employed females, computed from the avail-

Y = .04. This does not serve as a second able ecological data for each of fifteen com-

partial check on the underlying assump- munity areas and the "balance of city"27in-

tions, since, in this example, Y does not dif- dicates that the relation is more or less

fer very much from the average, y, of the linear, the ecological correlationis .93, b =

proportions illiterate in the nine divisional .27, and d = .07. Also, the estimated pro-

populations and X does not differ from the portion, Y, of persons in domestic service in

average, x = .10, of the proportions Negro the total employed female population in

in the nine divisional populations. (We shall Chicago is .08, which is, in fact, equal to the

see, in our second illustration, how the com- known proportion Y = .08. (Since Y = .08

parison of Y and Y can serve as a partial differed from the average, y = .13, of the

check on the assumptions. Differences be- proportions of employed females in do-

tween Y and y, X and x, are due to the fact mestic service in the various ecological

that the percentagesappearing in the graph areas, and X = .07, the proportion non-

are weighted by the relative population size white among the total employedfemale pop-

of the correspondingarea in the computa- ulation in Chicago, differed from the aver-

tion of Y and X, but not so in the computa- age, x = .24, of the proportions non-white

tion of y and x.) The fourfold-pointcorrela- among the employed females in the ecologi-

tion for the estimated table, based on the cal areas, the fact that Y was very close to

ecological data for the nine geographicdivi- Y gives us some further confidence in the

sions, is c = .38. The bounds for the individ- application of this method to the present

data.) The fourfold-pointcorrelationfor the

24 Op. Cit.

estimated table is c = .25. Thus the ecologi-

25Further discussion and application of some of cal data show that the individual correlation

the methods developed here will be presented in Otis 26 See Duncan and Davis, op. cit.; comments on

Dudley Duncan, Ray P. Cuzzort, and Beverly Dun-

can, Statistical Geography: Problems in Analyzing this method will be presented in the following sec-

Areal Data (to be published by the Free Press, tion.

Glencoe, Ill.); see also Donald J. Bogue and Mar- 27 Sixteenth Census of the United States: 1940:

garet J. Hagood, Subregional Migration in the United Population and Housing Statistics for Census Tracts

States, 1935-40, Vol. II: Differential Migration in the and Community Areas, Chicago, Illinois (Washing-

Corn and CottonBelts (Oxford, Ohio: Scripps Foun- ton: Government Printing Office, 1943), Tables A-3

dation, 1953), Appendix A, for a somewhat different and A-3a, pp. 25-39, and Tables 3 and 3a, pp. 176-

approach to the problem of ecological correlation. 341.

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

616 THE AMERICAN JOURNAL OF SOCIOLOGY

must lie in the interval between .13 and .35, correlationcan be derived from the marginal

and the estimate 6 = .25 is quite close to the frequencies for the census tracts than from

known value of .29. The computation of the the marginals for the community areas

bounds may be used as a third partial check (which are combinations of tracts) and that

on the estimate 6, as well as for determining the criterion for choice between the results

the possible range of the individual correla- of different systems of areal subdivisions is

tions; we would not have recommendedthat clear: "The individual correlationis approxi-

6 be used if it had not been within the in- mated most closely by the least maximum

terval determined by the bounds. and the greatest minimum among the re-

Duncan and Davis also determine that sults for several systems of areal subdivi-

the possible range of the percentageof non- sions." Where one areal subdivision (e.g.,

whites in domestic service, based on the community areas) represents a combination

community area data, is between 21.1 and of another areal subdivision (census tracts),

44.5 per cent. It can be seen that the ecologi- the least maximum and greatest minimum

cal regressionmethod leads to the quite ac- are obtained from the finer areal subdivi-

A

curate estimate = .34 in this particular sion.28Thus, if the best bounds are desired,

case; the known value of this proportion is it is necessary only to compute the bounds

.38. for the finer subdivision. However, it is pos-

sible to combine the areas of the finer sub-

TABLE 1

division into not more than four combined

Estimated areas, where all the areas in a given com-

True Standard

Parameters Values Estimates Deviations bined area are similar (in a sense to be de-

B .32 .27 .03 fined in the followingparagraph),so that the

E(p x) .38 .34 .02 bounds computed by using only the data for

E(rLx) .06 .07 .01

Y .08 .08 .01 the less fine subdivision will be equal to the

c .29 .25 .03 best bounds determined by the finer sub-

division.

In this particularillustration, since Y and Duncan and Davis indicate that substan-

X differedfrom y and x, respectively, there tially closer bounds are obtained for their

were three partial checks on the underlying data when the finerareal subdivisionis used.

assumptions, while in the preceding ex- It may sometimes happen that there is little

ample, there were, in effect, only two. For or no differencebetween the bounds for the

the present illustration, the results may be finer subdivision and the less fine subdivi-

summarizedin Table 1, where the estimates sion. All tracts in which the number of non-

are shown to compare favorably with their white employed females was not more than

respective known true values and where the the number of females in domestic service

estimated standard deviations, computed and the numberof females in domestic serv-

from the formulasgiven earlier,are also pre- ice was not more than the number of white

sented. Since the true value of Y is known employed females can be combined into a

from the ecological data, a rough compari- single area without affecting the bounds (ex-

son between If and Y can be made by using cept for roundingerrors);tracts in which the

the informationabout the estimate of o-(Y), number of non-whites was not less than the

another partial check on the assumptions number in domestic service and the number

underlying the present method. in domestic service was not more than the

The methodof obtainingbounds.-Robin- number of whites can be combined without

son and also Duncan and Davis point out affecting the bounds; etc. Tracts that can be

that different systems of areal subdivision combined in this way will be called "simi-

give different results. Duncan and Davis lar" (this definition of "similarity" is con-

mention that, for their illustrative material, 28 See reference to this finding in Selvin,

op. cit.,

substantially closer bounds to the individual pp. 616-18.

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

ALTERNATIVES TO ECOLOGICAL CORRELATION 617

venient in this particular problem but not the points, or the ecological areas that they

necessarily so in other quite different prob- represent, that appear in the same part of

lems). Thus the fact that substantially the graph can be combined to form a single

closer bounds were obtained when census combined area, thus obtaining four areas:

tracts rather than community areas were A, B, C, D. Bounds for the individual cor-

used indicates that some of the tracts that relation computed by using the ecological

form a given community area were not data for the combined areas will yield the

"similar." same result as the bounds computed by us-

The color-illiteracy data for the forty- ing the data for each of the points on the

eight states and Washington, D.C., indicate graph; i.e., the bounds computed by using

that the nine geographicdivisions were com- each of the points on the graph could ac-

binations of areas that were, in fact, quite tually be computed from, at most, four

"similar." More specifically, the number of points, A, B, C, D, placed on the graph,

Negroes was not more than the number of where point A is a weighted average

illiterates, and the number of illiterates was (weighted by relative population size) of the

not more than the number of whites (or the points in part A of the graph, etc.

number of Negroes was not less than the The bounds for the individual correlation

number of illiterates, and the number of il- are determinedby first calculating the mini-

literates was not more than the number of mum number of non-white females that are

whites) in almost all the areas that were in domestic service; based on the available

combined to form a given division. Only ecological data, this can be seen to be 0 for

seven states had been combined with areas area A (or for any ecological area repre-

that differed in this respect. Thus the sented by a point in part A of the graph) and

bounds (-.07 and +.60) for the individual also for area B (i.e., the areas where y <

color-illiteracycorrelationbased on the data 1 - x); it is equal to the differencebetween

for the nine geographic divisions differ only the number of females in domestic service

slightly from the bounds (-.07 and +.58) and the number of white employed females

based on the data for the forty-eight states for area C and also for area D (i.e., the areas

and Washington, D.C. The method of com- where y > 1 - x). The maximum number

biningareas so that the same bounds are ob- of non-white employed females in domestic

tained for the combinedareas as for the finer service can be seen to be equal to the num-

subdivision is as follows: Draw two lines ber of non-white employed females for area

y x and y = 1 - x on the graph of the A and also for area D (i.e., the areas where

scatter diagram of the observed ecological y > x); it is equal to the number of females

variables y and x. This divides the graph in domestic service for area B and also for

into four parts: A, B, C, D, where A don- C (i.e., the areas where y < x). Thus, at a

tains all those points representing areas minimum, the number of non-white females

where the number of non-white employed in domestic service for the total population

females was not more than the number of under consideration(i.e., the combinationof

females in domestic service and the number areas A, B, C, and D) will be equal to the

of females in domestic service was not more differencebetween the number of females in

than the number of white employed females domestic service and the number of white

(x < y < 1 - x); B contains those points employed females for the combined popula-

where x > y < 1 - x; C contains those tion in areas C and D. At a maximum, the

points where x > y > 1 - x; and D con- total number of non-white females in do-

tains those points where x < y > 1 - x. mestic service will be equal to the sum of the

(If a point falls exactly on a diagonal line number of non-white employed females for

dividing the parts of the graph, it may be the combined population in areas A and D

put in either one of the adjacent parts, but, and the number of females in domestic serv-

of course, cannot be put in both parts.) All ice for the combined population in areas B

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

618 THE AMERICAN JOURNAL OF SOCIOLOGY

and C. From the available ecological data data alone. However, if each of the areas in,

for community areas in Chicago, we find say, part B of the graph becomes more com-

that these minimum and maximum num- pletely white (i.e., the percentagenon-white

bers are 5,826 and 12,271, respectively. Ac- decreases) or more completely non-white,

cordingly, the fourfold-point correlationco- but the areas still remain in part B (i.e., the

efficient is between .13 and .35. The differ- percentage in domestic service still remains

ence, T, between the maximum and mini- less than the percentage non-white and the

mum numbers, 12,271 - 5,826 = 6,445, can percentage white), then the accuracy of the

be shown to be equal to the sum, S, of the bounds need not be improvedunless the per-

number of non-white employed females who centage in domestic service in these areas

reside in area A, the number white in area also decreased.Using this general approach,

C, the number in domestic service in B, and an examination of the respective graphs de-

the number not in domestic service who re- scribing the ecological data and a glance at

side in area D. By obtaining S separately the respective marginal proportionsfor the

and comparingit with T, we have a partial total populations would reveal that the

check on our computations of the minimum bounds for the color-occupationdata for the

and maximum numbers. community areas or census tracts would be

It can be seen that T multiplied by the more accurate than the bounds for the color-

total population is equal to the product of illiteracy data obtained on either a division-

the possible range, R, of the fourfold-point al or a state basis.

correlation(.35 - .13 = .22) and the square Since it is possible, in computing bounds,

root of the product of the four marginal to- to reduce the original ecological data to, at

tals in the fourfold cross-classificationtable most, four areas (this simplifies somewhat

for the population. Thus T = S is directly the amount of computation; in any case,

proportional to R (the constant of propor- very little computation is required), the

tionality depends on the population mar- bounds are based essentially only on the in-

ginal totals). Hence the "accuracy"R of the formation available for these four combined

bounds, for a given set of population mar- areas or their respective four points on the

ginal totals, depends on the magnitude of S, graph. The actual distribution of points on

which can be determined very quickly in a the graph is not used except insofar as it

roughfashion by an examinationof the data supplies data for the combined areas; the

described by the graph. These bounds will accuracy of the bounds depends on how

be quite accurate if all the points in part A closely these four points "hug" the four

of the graph are very close to the vertical sides of the graph.

line determinedby x = 0, the points in part Furthercommentson ecologicalregression.

B are close to the horizontal line y = 0, the -The regressionapproach made use of the

points in part C are close to the vertical line graph of the ecological data, and the results

x = 1, and the points in part D are close to depended on these data. If some other areal

the horizontal line y = 1. The exact value subdivision of the population is of interest,

of the individual correlation can be deter- quite different estimates of the slope and y-

mined (R = 0) from the ecological data if intercept may be obtained, unless the under-

S = 0; i.e., if all the ecological areas can be lying assumptions of this approachalso hold

represented by points on some of the lines true for this second areal subdivisionand the

that form the sides (boundaries) of the values of E(p Ix) and E(r Ix) remain un-

graph. In other words, if in each ecological changed. Sometimes it is possible to com-

area employed females were either all white, bine areas or to use some other method of

all non-white, all in domestic service, or all definingareas or classes of individuals in or-

not in domestic service, we should, of course, der to obtain a new subdivision of the popu-

be able to determine the exact value of the lation for which the underlyingassumptions

individual correlation from the ecological of the regressionapproachare more reason-

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

ALTERNATIVESTO ECOLOGICALCORRELATION 619

able than for the original area data. It is not similar to the regressionapproach described

necessary that this new subdivision actually here for the case where 3 = 2, we see that

divide the population into mutually exclu- the proportiony of individualsin the Negro-

sive classes or that the entire population be white-"other races" population who are

included in the subdivision. For the areas in illiterate may be written as y = xipi +

the new subdivision and for the entire popu- X2p2 + X3p3, wherexi is the proportionin

lation, the underlying assumptions concern- the population who are Negro, pi is the pro-

ing p and r should be more reasonable.With portion of Negroes who are illiterate, X2 is

additional information about the popula- the proportion in the population who are

tion, it mnaybe possible to determine such a white, P2 iS the proportionof whites who are

subdivision of the population. If this is the illiterate, X3is the proportionin the popula-

case, the regressionmethods should be ap- tion who belong to "other races," and p3 is

plied to this new subdivision rather than to the proportion of people in "other races"

the original data. who are illiterate.Since X1 + X2 + X3 = 1,

None of the methods discussed here we have y = x1p1 + x2p2 + (1- x-

makes much use of the information about X2)P3 = p3 + (pl - P3)Xl + (P2 -P3)X2

the spatial distribution of the areas under a + b1x1+ b2X2, where a = p3, b=

consideration.This information and the in- PI - P3,and b2 = P2 -P3. Hence, if differ-

formation concerning the relative popula- ent areas are considered where the propor-

tion sizes of the areas are not contained in tion p, is the same for each area, the

the graph. The informationmay be of inter- proportion P2 is the same for each area,

est in itself and also it would probably be and the proportion p3 is the same for

worthwhileto make some use of it in dealing each area, then there will be an exact multi-

with the present problem. For example, the linear relationship,y = a + b1x1+ b2x2,be-

method mentioned herein for "holding con- tween the values of y and Xi, X2for the differ-

stant the geographic divisions" does make ent areas, where the slopes will be b6=

some use of the spatial distribution of the P- p3 and b2 = P2 - p3, and the y-inter-

states. The spatial distribution, the popula- cept will be a =p3. This multilinear rela-

tion sizes, and any other relevant informa- tionship could be used to determine p3 a,

tion should enter into the discussion of P2 = b2 + a, and pi 6b + a.

whether or not, in a particular case, the un- In practice, the actual values of PI, P2,

derlying assumptions are met. The informa- and p3 will not be constant, but it may be

tion concerningrelative population sizes can the case that the averageE(p1IxlI, X2) of the

also be utilized (to a certain extent) as values of Pi, the average E(p2 X1, x2) of the

"weights" in the weighted linear regression valuesof p2, and the averageE(p31 X1, X2) of

analysis referredto earlier in this article. the values of P3 for populations with the

Relation betweentwo qualitativevariables same proportions xi and x2 of Negroes and

when one of them is dickotomous.-We shall whites, respectively, are constant. Then the

now considerthe situation in which the "in- main assumption of multiple linear regres-

dividual behavior" described by a 2 X ,B sion analysis, E(y xl, X2) A + B1ix +

cross-classificationtable for a population is B2X2,holds true, whereA E(p3IXi, X2),

of direct interest but where the only avail- B1 E(p1jx1, x2) - A, and B2 = E(p2IX1,

able data are the marginaltotals in the table x2)- A. Thus standard methods of mul-

for the population and the marginal totals tiple regression can be used to obtain esti-

for some subdivision of the population. Here mates a, b1,b2,of A, B1, B2, respectively.29If

both variables in the individual correlation the variances of PI, P2, and P3 are not large,

study are qualitative; one of them has two then these estimates can be used to obtain

categories (e.g., literate-illiterate) and the the estimatesj3 = a p2-=b + a, andj5=

other has 3 categories (e.g., Negro-white- b1 + i of the expected values of ps, P2, and

"other races"; , = 3). Using an approach 29 See, e.g., Walker and Lev, op. cit., chap. xiii.

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

620 THE AMERICAN JOURNAL OF SOCIOLOGY

pi, respectively. The six entries in the 2 X 3 average income and average family-size

cross-classificationtable describing the rela- data, by the ratio V of the standard devia-

tion between literacy and color can then be tion of the population income distribution

estimated by a method analogous to that and the standard deviation of the popula-

describedearlierfor the case in which 3 = 2. tion family-size distribution. [This corre-

The estimates15, p2, P3 shouldbe examined sponds to the earlier multiplication of &by

to see whether they all lie between 0 and 1. the V7X(1-X)/V/Y(1 - Y).] The usual

Also the estimated proportion Yl = d + ecological correlation, which cannot be used

61X1+ b2X2 of illiterates in the total popu- in general to estimate the individual correla-

lation (where X1 is the proportion of Ne- tion, is obtained by multiplying b by the

groes in the total population and X2 is the ratio of the observed standard deviation of

proportion of whites in this population) the distribution of the forty-eight average

should be close to the known proportion r incomes for the states and the observed

of illiterates in this population, if the meth- standard deviation of the distribution of the

od suggested here is to be applied. Many of forty-eight average family sizes for the

the comments discussed earlier for the case states; if Washington, D.C., is included,

B= 2 can be generalizedto the situation de- there will be forty-nine averages. If, as is

scribedin this section. This is left as an exer- often the case, this latter ratio is much larger

cise for the interested reader. than the ratio V, the usual ecological cor-

Relation betweentwo quantitativevariables. relation will overestimate the individual cor-

-Many of the ideas presented above can be relation.A0The ecological correlation may

applied also in the case in which we are deal- serve as a rough measure and partial check

ing with quantitative variables rather than on whether the underlyingassumptionis not

categories-e.g., income rather than race. satisfied-i.e., whether E(y Ix) = A + Bx,

Let us consider the situation in which the where A and B are constant for all states.

individual Pearsonian correlation between We have seen that it is not necessary to

income and size of family is of interest and know the entire distribution of income and

the relevant cross-classifieddata for the en- the distributionof size of family for the total

tire population are not available. That is, for population, but only the standard devia-

each individual in the population, informa- tions of the two, in order to use the ecologi-

tion about both his income, x, and size of his cal data to estimate the individual correla-

family, y, cannot be obtained, but it is pos- tion, since only these standard deviations

sible to determine or to estimate the income enter into the computation of the ratio V

distribution and the distribution of size of and of the individual correlation estimate.

family for the population (i.e., the marginal These standard deviations can be deter-

totals). If the average income and the aver- mined from the standard deviations (or

age size of family for, say, each of the forty- variances) and the averages of the respec-

eight states is known and if there is a linear tive distributions for the states, since the

relationship between income x and average variance of the population income (family

size of family E(y Ix) when x is given, and it size) distribution is the weighted sum

is more or less constant in these states [i.e., 2 w(i)M(i) of the average squared devia-

E(y Ix) = A + Bx is true for the individ- tions M(i) of the income (family size) of a

uals in each state and A and B are constant person in the ith state from the average in-

for all states], then it is possible to use these come (family size) X for the total popula-

ecologicaldata for each of the states, to esti- tion, where w(i) is the relative population

mate the individual correlationfor the popu- size of the ith state, M(i) = (2(i) + [X(i) -

lation. The appropriate estimate of this 30See G. Udny Yule and M. G. Kendall, An In-

Pearsonian correlationis obtained by multi- troduction to the Theoryof Statistics (New York:

plying b, the estimate of the slope of the re- Hafner Publishing Co., 1950), pp. 313-14, for some

gression line of y and x obtained from the related comments.

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

ALTERNATIVES TO ECOLOGICAL CORRELATION 621

XI2, X(i) is the average income (family size) ratio BSW(x)/BSW(y) of the "between

in the ith state, and 0-2(j) is the variance of states weighted" variances.

the income (family-size) distribution in the The ratio of the usual ecological correla-

ith state. These data, together with the esti- tion and the estimate of the individual cor-

mated slope b of the regressionline obtained relation is V\[BS(G)/BS(y)]/V[o2(X) /lo2(Y)].

from the data on average income and aver- If the observed variances BS(y) and l2(Y)

age family size for the states, lead to the es- are replaced in the above ratio by their ex-

timate t)Vof the individual correlationcoef- pected values computed under the usual in-

ficient for the entire population, where there dependence assumptions of the linear re-

is a constant linear relationshipbetween in- gression model, where E(y x) = A + Bx

come x and average family size E(y Ix) when for each state and -2[y x(i)] is the variance

x is given. around the regressionline in the ith state,

We have seen that the variance o-2(X) of then the so-called "expected" ratio obtained

the population income (family size) distribu- will be larger than 1 whenever

tion is 2 w(i) M(i) = 2 w(i)o2(i) + 2 w(i)

2; {W (i) 2- [Y I X(i]}

[xC(i)- X]2; i.e., the population variance is

the sum of two terms: (a) the weighted sum O2 (X)

z /W(i}

states" variance, WS(X)] and (b) the TN [BS ( )] '

weighted sum of the squared deviations of

the averages xc(i)for the states from A [the where there are T states and a total popula-

"between states weighted" variance, tion size of N individuals-i.e., whenever

BSW(x,)], where the weights are the relative BS -.t > " [ I X (i) I /W(i)}

(X): {2 IIy

population sizes. Thus o-2(X)= WS(X) + (x) }-

NI w (i) oI[ y Ix (t) ]

BSW(x), and the variance of the family-

size distribution is o-2(Y) = WS(Y) + In the special situation where w(i) = 1/T,

BSW(y). The usual ecological correlationis then this "expected"ratio will be largerthan

1 wheneverBS(x?)> o-2(X)T/N-i.e., when-

bV/BS(Z)/BS(y), ever BS(x) is greater than o-2(X)divided by

the average population size N/T of the

where BS(x) and BS(y) are the observed states, which will often be the case. [If the x

(unweighted) variances of the aver- values observed in each state were a random

age incomes x(i) and the average family sample of size N/T from the same popula-

sizes y(i), respectively, for the states. Since tion of x values with variance y2(X), then

the estimate of the individual correlationis the expected value of BS(x) would be ap-

proximately o-2(X)/(N/T).] By a similar ap-

bV= b+ o2(X)/of2(Y), the ecological cor- proach, the "expected" ratio of the

relation will be larger than bV whenever "weighted" ecological correlation and the

C2(X)/1f2(Y) < BS(x)/BS(y), i.e., when- estimate of the individual correlationwill be

ever WS(X)/WS(Y) < [BS(x)/BS(y)] larger than 1 whenever BSW(x) >

+ E, where E = [BSW(y)BS(xc) - a'2(X)[2{a2[yIx(i)] }]/N 2 {w(i)oJ2[yIx(i)] }.

BS(y)BSW(x)]/WS(Y)BS(y). If the usual Where 2L[yjx(i)]is the same for each state

ecological correlation is modified to obtain [or wherew(i) = 1/T], this "expected"ratio

the "weighted" ecological correlation, will be larger than 1 whenever BSW(xc)>

bVBBSW(x)/BSW(y), then this "weighted" a2(X)T/N, which will usually be the case.

ecological correlationwill be larger than the Since o-2(X)= WS(X) + BSW(4), the rela-

estimate of the individual correlationwhen- tionships being presented here can be de-

ever oi2(X)/(J2(Y) < BSW(x)/BSW(y); scribed in terms of relationships between

thus, whenever the ratio WS(X)/WS(Y) of WS(X), BSW(x), and BS(x). These rela-

the "within states" variancesis less than the tionships indicate to a certain extent why

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

622 THE AMERICANJOURNALOF SOCIOLOGY

the (usual and the "weighted") ecological (i)j/V[S(i)] = D6, E{d} - A = D[i(i) -

correlationsare generally larger than the es- 2(i)3], where ox(i), 2(i)] is the covariance

timate of the individual correlationand thus between x(i) and 2(i) for the differentstates;

why they cannot be used as estimates of the V[x(i)] is the variance of x(i) for the states;

individual correlation. and 6 =- (i), (i)]/V[x(i)]. Thus, if D =

The standard deviation of the estimate of 0, the relationbetween y and x will be linear,

the individual correlationand confidencein- and the standardestimates will be unbiased.

tervals for it can be determined, when cer-

However, if D 5 0, the b will be biased un-

tain additional assumptions are made, by

less the covariancebetween x(i) and 2(i) is 0.

using methods similar to those developed

Even in the situation where bis unbiasedbut

earlier herein; this will not be discussed

D # 0, it will not be possible, except under

here. The statements and formulas pre-

special circumstances, to estimate the indi-

sented here should be understood to hold

vidual correlationbetween y and x from the

when the amount of data is sufficientlylarge

ecological data y(i) and x(i), since the indi-

to permit sampling fluctuations to be neg-

vidual values of z and their relation to y and

lected; i.e., we assume that the estimate b of

x will play an important role in determining

the slope B is quite accurate and that Y, the this correlationif D # 0.

average size of family in the population, is

Let us now consider the special circum-

close to the numerical value d + bX. The stance in which the individual value z meas-

comparisonbetween d + bX and Y can be ures a characteristicof the state in which the

used as a partial check on the underlyingas- individual lives (e.g., its size), so that z will

sumptionsmade in this section (except if the be the same for all individuals living in it.

average x(i) of the x(i) is close to X and the The value of D may be known, or it can be

average y(i) of the y(i) is close to Y) in the estimated, along with A and B, by the usual

same way that the comparisonof i and F methods of multiple linear regressionapplied

was used to check the assumptions made to the ecological data concerningy(i), x(i),

earlier in this article. and 2(i), thus obtaining the estimates d, a, b.

We shall now considerbriefly what might In this case, there will be a simple linear rela-

happen if the method suggested here is ap- tion between y and x, for the individuals in a

plied where the underlying assumptions did given state, but the y-intercept of the line

not apply. Suppose that the simple linear may differ for the different states; i.e.

relationship was not true but that a mul- E(y Ix) = (A + Dz) + Bx, where z = 2(i),

tilinear relation, E(y Ix, z) = A + Bx + may differ from state to state. Thus the re-

Dz, did hold true for the individuals, where gressionline for each state can be estimated,

A, B, and D were constant for all states and and each line will have the same slope, 6. If

where z was some relevant variable. In this the variances of the y and x measurements,

case the averages y(i), x(i), and 2(i) for the for a given state, are known, then it is pos-

ith state are related as follows: E[y(i) |x(i), sible to estimate the individual correlation

z(i)] = A + Bx(i) + D2(i). Then the stand- coefficient for the population in that state.

ard methods of multiple linear regression Furthermore, if these variances are known

can be applied to the ecological data, y(i), for each state; then they can be used, to-

x(i), 2(i), in order to obtain estimates of the gether with the values y(i), x(i), b and the

constants A, B, D, in the multilinear equa- relative size of each state, to estimate the

tion for the individuals.However, if a simple individual correlation coefficientfor the to-

linear relationship between y(i) and x(i) is tal population. Hence it is possible to obtain

incorrectly assumed and 2(i) is neglected, an estimate of the individual correlationco-

then the standard estimates a and b for the efficientfor a population from the ecological

regressionline between y and x will have the data, even if there is no constant linear rela-

following biases: E{b} - B = Da[i(i), tionship, E(y x) = A + Bx, as long as the

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

ALTERNATIVES TO ECOLOGICAL CORRELATION 623

situation is such that the slope B remainsthe might be interpreted as the "effect of z on

same in the different states, while the y-in- the illiteracy of whites" (z might measure

tercept may differ from state to state in a average income, average social status, per

way that is linearly related to some meas- cent unemployed, etc., for each state). It

ured characteristic, z, of the state. should be noted that z cannot be taken equal

Relationbetweentwodichotomousvariables. to x (neither can z be a linear function of x),

-The point of view describedat the end of unless some additional assumptions are

the precedingsection can be applied to show made, because if z = x, then E(y Ix) =

that if the average E(r Ix) of the values of C + (B + F)x, in which case B + F and C

the proportionr of whites who are illiterate, can be estimated by the methods of linear

for states with the same proportionx of Ne- regressionapplied to y and x, but it will not

groes, is a linear function of a measurable be possible to obtain separate estimates of B

characteristicz of each state [i.e., E(r Ix) = and F unless additional assumptions are

C + Fz] and if the difference between the made about their relative magnitudes. For

average E(p Ix) of the values of the propor- example, if the additional assumptions that

tion p of Negroes who are illiterate (for F = 0 is made (i.e., that the "effect of z on

states with the same proportion x of Ne- the illiteracy of whites" is zero), then the

groes) and E(r Ix) is constant [i.e., E(p Ix) - methods developed earlier can be utilized;

E(r Ix) = B], then the average of the values but if the assumption that B = 0 is made

of the proportion y of illiterates (for states (i.e., that the "effect of race on illiteracy" is

with the same proportion x of Negroes) is zero), then the table entries for each state

equal to E(yIx) = C + Fz + Bx. The spe- can be estimated as described in the pre-

cial situation where F = 0 has been studied ceding paragraphand these table entries for

earlier in this article. By standard methods the states can then be combined to estimate

of multiple regressionapplied to the ecologi- the individual correlationfor the total popu-

cal data (i.e., to the proportionsy and x and lation. In this particular example in which

the value of z for each state), estimates e, f, Z= x, if F = 0, then the effect of the per-

and b of C, F, and B, respectively, can be ob- centage of population which is Negro in a

tained, which can then be used to obtain the state on the illiteracyrate for the whites there

estimates f = 6 + fz and p = b + 6 + fz of is zero,while if B = 0, then the averagediffer-

E(r Ix) and E(p Ix), respectively, for each ence between the illiteracy rate for Negroes

state. These estimates f and 23can be used and the rate for whites is zero in states hav-

along with the values of x and the size of the ing the same proportionx of Negroes. In this

population of each state to estimate the four situation, where B = 0, the estimated indi-

entries in the 2 X 2 cross-classificationtable vidual correlation between race and illiter-

describing the relation between the two acy computedfor each state will be zero, but

dichotomous variables, race and illiteracy, the individual correlation estimated for the

for each state. These tables for the separate total population may not be zero unless F =

states can then be combined to estimate the 0 as well. Since it is possible to obtain an ex-

four table entries for the total population; act linear relationshipbetween y and x when

thus an estimate of the individual correla- either F = 0 or B = 0 (or even when neither

tion between race and illiteracy can be ob- F nor B equals zero), it is not possible to de-

tained for the total population. cide on the basis of the ecological data con-

The magnitude of the estimnateb of B = cerning y and x whether it should be as-

E(p Ix) - E(r Ix), the average differencebe- sumed that F = 0, that B = 0, or that the

tween the illiteracy rates for whites and the ratio B/F is a known constant. The research

rates for Negroes for states having the same worker will require additional data to help

proportion x of Negroes might be inter- him choose between these models and the

preted as the "effect of race on illiteracy," assumptionsunderlyingthem. This is an im-

while the magnitude of the estimate f of F portant choice, since they lead to different

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

624 THE AMERICANJOURNALOF SOCIOLOGY

methods of analysis of the data and also to and F(J - 1).32These estimates can then be

different interpretations of the results. It used to estimate C, F, and J. With these es-

was assumed earlier that F = 0, and the timates and the values of x and the size of

methods described in that case led to esti- the population of each state, it is possible to

mates of the individual correlations which estimate the entries in the cross-classifica-

were different from what they would have tion table describing the relation between

been if it had been assumed that B- 0 or if race and illiteracy for each state and then to

it had been assumed that the ratio B/F was combine these tables for the separate states

a known constant. to obtain an estimate of the cross-classifica-

Let us now consider the situation where tion table for the total population. It is pos-

E(r|x) = C + Fz and E(pjx) = G + Hz. sible to perform a rough test of whether

[The preceding comments in this section F(J - 1) = 0 by applying the standardtest

dealt with the special situation where H = that the regressionof y on x is linear rather

F, so that E(p|x) - E(rIx) = G - C = than quadratic.33If F = 0, then the meth-

B.] In this case, E(yf x) = C + Fz + [G - ods developed here earlier may be appropri-

C + (H-F)z]x=C + Fz + (G-C)x + ate, while if J - 1 = 0 (i.e., the average il-

(H - F)zx, and a multiple regressionanaly- literacy rate for Negroes equals the average

sis of the ecological variable y on the three rate for whites in states having the same

variablesz, x, and zx will lead to estimates of proportion x of Negroes), then the method

C, F, G - C, and H - F. These estimates described in this paragraph can be applied.

can be used to obtain estimates of C, F, G, If F(J - 1) = 0, the decision as to whether

and H, which in turn can be used along with to assume F = 0 or J - 1 = 0 should de-

the values of z for each state to estimate pend on the research worker's available

E(r|x) and E(p|x) for each state. From knowledge or on some additional data re-

these estimates and the values of x and the lated, directly or indirectly, to the magni-

size of the population of each state, the four tudes of F and J - 1. The magnitude of

entries in the 2 X 2 cross-classificationtable J - 1 may be interpreted, for the model

describing the relation between race and il- under consideration,as the "effect of race on

literacy in each state can be estimated, and illiteracy." For this model, the scatter dia-

the table entries for the states can be com- gram of y on x can suggest whether (1) both

bined to estimate the table entries for the F and J - 1 are different from zero (if the

total population, thus providing an estimate relationship between y and x is not linear,

of the individual correlation for the total but it can be fitted by a second-degreepoly-

population. If z = x (or if z is a linear func- nomial in x); (2) either F or J - 1 is differ-

tion of x), then the methods describedin this ent from zero but not both (if the relation-

paragraph cannot be applied unless some ship is linear but the slope of the line is not

specific additional assumptions about the zero); or (3) both F and J - 1 are equal to

relationships between the constants are zero (if the relationshipis linear with a slope

made, similar to those mentioned in the pre- of zero). The extent to which the scatter dia-

ceding paragraph.31 gram can be fitted by a first- or second-de-

Let us now considerthe situation in which gree polynomial in x can serve as a partial

E(p x)/E(r/x) = J is constant for the dif- check on the assumptions underlying the

ferent values of x and E(r Ix) = C + Fx. In methods described here. This is no more

this case, E(y Ix) = E(r x) + E(r Ix) [J - than a partial check, since, as we had seen

l]x= [C + Fx][1 + (J - l)x]-C + [F +

32 See, e.g., George W. Snedecor, Statistical Meth-

C(J - l)]x + F(J - 1)x2, and a multiple

ods (4th ed., 4th printing; Ames, Iowa: Iowa State

regressionanalysis of y on the variablesx and College Press, 1950), Sec. 14.3, pp. 379-82, for a

x2will lead to estimates of C, F + C(J - 1), description of curvilinear regression methods for a

31See Duncan, Cuzzort, and Duncan, op. cit., for second-degree polynomial.

some related comments. 23

See ibid., pp. 381-84.

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

ALTERNATIVES TO ECOLOGICAL CORRELATION 625

els may lead to a specified relationship be- C(J - )x + F(J - 1)zx-a multilinearre-

tween y and x, and the methods applied to lation between y and the variables z, x, and

the ecological data will depend very much zx. As a partial check on this model, the rela-

on which model is chosen. tions between the four constants in the mul-

If E(p [x)/E(r Ix) = J and the relation tilinear relation can be examined to see

betweenE(rl x) and x is E(rl x) = C + Fx + whether they lead to a consistent set of esti-

Kx2 or some more complicated relation, it is mates of the three constants C, F, and J.

still possible to use a method similar to the The research worker who uses the meth-

one given in the precedingparagraph,in or- ods describedherein should be aware of the

der to estimate the constants C, F, K, J and underlyingassumptions of each method and

then to use these estimates to estimate the should take advantage of all possible partial

individual correlation between race and il- checks on them. The choice between the

literacy for each state separatelyand also for various models described here should be

the total population. If E(p Ix)/E(r Ix) = J made on the basis of the research worker's

and E(r x) = C + Fz, where z is some knowledge or on some additional data per-

measurablecharacteristicof each state, then taining to the underlyingassumptionsof the

it is also possible to estimate the constants models.

C, F, and J from the relation E(y x) = UNIVERSITYOF CHICAGO

All use subject to University of Chicago Press Terms and Conditions (http://www.journals.uchicago.edu/t-and-c).

- Correlation RegressionUploaded byHimanshParmar
- Worded ProblemsUploaded byNeill Sebastien Celeste
- STI0903 - PSD Postprocessing 2Uploaded bychoprahari
- Regression AnalysisUploaded byrahulsaha1986
- 25568585 1 1 Statistics in EngineeringUploaded byrajniksingh8359
- Principles and Risks of Forecasting-Robert NauUploaded bylelouch
- A-PowerPoint®-based-guide-to-assist-in-choosing-the-suitab-le-statistical-test.Uploaded bynessa08
- Glossary Research TermsUploaded byLauren Letcher
- Role of Executing E-commerce for Small and medium enterprises in Danang cityUploaded byTrinh Le Tan
- Class2 CorrelationUploaded bynyman493130
- Project Work 2012Uploaded byAqilah Hambali
- exam1openendUploaded bylearned2009
- International Journal of Business and Management Invention (IJBMI)Uploaded byinventionjournals
- MAHM6e Ch03.Ab.azUploaded bylita2703
- A Perspective on Statistical Tools for Data Mining ApplicationsUploaded byAnonymous PKVCsG
- Factors That Influence the Corporate Governance - The Portuguese RealityUploaded bysaiful2522
- A PowerPoint Based Guide to Assist in Choosing the Suitab Le Statistical Test.Uploaded byarifin
- Session 11 bUploaded bynikaro1989
- BUS601 Syllabus Gilliland SP15(1)Uploaded byAbhishek Nagaraj
- QNT 561 Final Exam : QNT 561 Final Exam Answer | StudentehelpUploaded bystudentehelp8
- Business Research and Report Writing Report Report - TelecommunicationUploaded byMuhammad Talha
- Artikel 2, Flexible IT InfraUploaded byFahmi A. Mubarok
- scatterplots-part3-ivywhittaker-demingUploaded byapi-340387682
- Impact of Social Media Marketing & e Word-Of-mouthUploaded bySyed Shabbir Rizvi
- Final ReportUploaded byred8blue8
- SPE-12565-PAUploaded byre_alvaro
- The effect of Drying kinetic on shrinkage and colour of potato slices in the vacuum-infrared drying method.pdfUploaded byJessica Renata
- 1-Descriptive Statistics (2)Uploaded bySuchismita Sahu
- DESKRIPSI STATISTIK LANJUTAN.docxUploaded bySiti Kholifah
- 1886-1895Uploaded byila

- TOC-1Uploaded byMallesham Devasane
- Sandstone Reservoir Quality Prediction the State of the Art AAPGBulletin Ajdukiewicz Lander 2010Uploaded byJody Van Rensburg
- lesson notes English year 1Uploaded byIma Khalid
- HB 295.3.5-2008 Product Safety Framework Product Safety Warning Labels and MarkingsUploaded bySAI Global - APAC
- HedonismUploaded byValred
- Assessment 8 - Continuing Professional Development Copy CopyUploaded bySam Parsons
- Academic Calendar 2016-17.pdfUploaded bySwapnil Suman
- Four Corners Level3 Unit7 Scrambled Sentences Teachers Resource Worksheet1Uploaded byluis
- Chapter 3 Nature, Nurture, and Human Diversity, Myers 8e PsychologyUploaded bymrchubs
- Bias v. Advantage International, Inc (1990) Nonmoving Party Must Do More Than Show That There is Metaphysical Doubt but Must Provide Specific Facts to Create Genuine Issue During SJUploaded byJohnson Don
- Final Report Jamuna GroupUploaded byAriful Nehal
- Tos 7 HorticultureUploaded byCharmaigne Onallatnap Zerdnem
- VergilUploaded byRoxana Innuendini Legezynska
- Fasader i TraUploaded byChristina Hansson
- Envisat ERS Data Access GuideUploaded bypartho_scribd
- C. Gerardo Rojas v. Tai Shan Kang, Ye Mai Kang, and Hwa Lin Tai, 101 F.3d 108, 2d Cir. (1996)Uploaded byScribd Government Docs
- Proclaiming Christ and the ResurrectionUploaded byDoug Floyd
- Slope stability analysis using smoothed particle hydrodynamicsUploaded bysamiru1
- Accelerated Reader Improvement ProjectUploaded byAlexFry
- ccgps ela grade8 standardsUploaded byapi-247470989
- PES Strategy EU 2020_ENUploaded byIacob Laurentiu
- Comparison of Humanistic and Existential PsychologyUploaded byCarol Ann
- Exp Ggroup 17()Uploaded byAditya Koutharapu
- progress test 4Uploaded byapi-299149041
- Project Report on Brand Preference of Mobile Phone Among College Students 1Uploaded bykamesh_p20013342
- does the repo rate really control inflation ?Uploaded bySuresh Naidu
- GAISP v30Uploaded byHDMC
- Language and LinguisticsUploaded byDeden Syefrudin
- reflexive questionsUploaded byapi-273801384
- GS2017_QP_PHY_XUploaded byr prathap

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.