You are on page 1of 35

Sociological Methods & Research http://smr.sagepub.

com/

Constrained Estimators and Age-Period-Cohort Models
Robert M. O'Brien Sociological Methods & Research 2011 40: 419 originally published online 29 July 2011 DOI: 10.1177/0049124111415367 The online version of this article can be found at: http://smr.sagepub.com/content/40/3/419

Published by:
http://www.sagepublications.com

Additional services and information for Sociological Methods & Research can be found at: Email Alerts: http://smr.sagepub.com/cgi/alerts Subscriptions: http://smr.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Citations: http://smr.sagepub.com/content/40/3/419.refs.html

>> Version of Record - Aug 15, 2011 Proof - Jul 29, 2011 What is This?

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

Article

Constrained Estimators and Age-Period-Cohort Models
Robert M. O’Brien1

Sociological Methods & Research 40(3) 419–452 ª The Author(s) 2011 Reprints and permission: sagepub.com/journalsPermissions.nav DOI: 10.1177/0049124111415367 http://smr.sagepub.com

Abstract If a researcher wants to estimate the individual age, period, and cohort coefficients in an age-period-cohort (APC) model, the method of choice is constrained regression, which includes the intrinsic estimator (IE) recently introduced by Yang and colleagues. To better understand these constrained models, the author shows algebraically how each constraint is associated with a specific generalized inverse that is associated with a particular solution vector that (when the model is just identified under the constraint) produces the least square solution to the APC model. The author then discusses the geometry of constrained estimators in terms of solutions being orthogonal to constraints, solutions to various constraints all lying on a line single line in multidimensional space, the distance on that line between various solutions, and the crucial role of the null vector. This provides insight into what characteristics all constrained estimators share and what is unique about the IE. The first part of the article focuses on constrained estimators in general (including the IE), and the latter part compares and contrasts the properties of traditionally constrained APC estimators and the IE. The author concludes with some cautions and suggestions for researchers using and interpreting constrained estimators. Keywords age-period-cohort models, constrained model estimation, intrinsic estimator

1

University of Oregon, Eugene, OR, USA

Corresponding Author: Robert M. O’Brien, 720 PLC University of Oregon, Eugene, OR 97403 Email: bobrien@uoregon.edu

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

420

Sociological Methods & Research 40(3)

Constrained estimators are the most popular approach to estimating the individual coefficients in age-period-cohort (APC) accounting models.1 Conventionally, a single constraint is placed on these models to make them just identified. Unfortunately, since each of the constrained models fits the data equally well, the validity of the chosen constraint cannot be judged from the model fit. Instead, in the conventionally constrained APC models, the researcher must decide on the basis of theory or past research the appropriateness of the constraint used to identify the model. Setting the constraint in a defensible manner, however, is the Achilles heel of this approach, and the constraint chosen can greatly affect the estimated coefficients. Almost 30 years ago, Kupper et al. (1983) discussed a constrained estimator for age, period, and cohort effects that they called the principal components estimator. This estimator produces coefficients identical to those of the recently introduced intrinsic estimator (IE), which is also a constrained estimator. Kupper et al. (1983) note, as proved by Yang, Fu, and Land (2004) for the IE, that the principal components estimator has minimum variance. Beyond this, however, Kupper et al. (1983) did not appear to believe it had any special usefulness in the analysis of APC data, pointing out that using this estimator could lead to more bias (in the sense of differences between the expected value of the estimates and the true underlying generating parameters) than the use of some other constraint (Kupper et al. 1983). Because of its minimum variance property and other additional properties, Yang and associates recommend using the IE in the analysis of APC data. Yang et al. (2004) correctly note the important contributions of Fu (Fu 2000; Fu, Hall, and Rohan 2004; Knight and Fu 2000) in the development of the IE and in investigating its properties. It is Fu who has most extensively published work on the IE. Yang et al. (2004) recently introduced the IE to sociologists. After the specific introduction of the IE to sociologists in Sociological Methodology in 2004, it was further described in an article in the American Journal of Sociology (Yang et al. 2008). Yang and associates (2008:1699) view the IE as ‘‘a general-purpose method of APC analysis with potentially wide applicability in the social sciences.’’ They review several properties of the IE (many of which, as I discuss in the following, are shared with other constrained estimators). Given the widespread use of constrained APC models and the introduction of a new and arguably improved constraint into the sociological literature, this article explicates and clarifies the basic properties of constrained linear models as they are used in the APC context. This should help researchers in several ways: (1) understanding the process of choosing an appropriate

Downloaded from smr.sagepub.com at William Paterson Univ of NJ on November 4, 2011

the relationship of the IE estimator to other constrained estimators.2 Yij ¼ m þ ai þ pj þ caÀiþj þ eij .O’Brien 421 constraint. and some unique features produced by the constraint used in the IE approach. and cohorts. (2) comprehending the relationship of the results based on using different constraints. among other issues. periods. and (4) facilitating a comparison of the IE to other constrained estimators. pj denotes the effects of the jth period. I employ this coding because it is used in the IE approach (Kupper et al. 2008).sagepub. 1985. That discussion is followed by a simulation designed to show some of these properties and to demonstrate how a series of different constraints works in practice. the relationship of the constraint to the results it produces.com at William Paterson Univ of NJ on November 4. These inverses provide a set of solutions even though the unconstrained model is not identified. periods. I use the multiple classification model represented in equation (1) throughout this article. I begin with a brief discussion of the general APC accounting model and the identification problem and then move to the use of generalized inverses. 2004. m represents the intercept. I conclude with an evaluation of the potential advantages of the IE relative to other constrained estimators and one potentially helpful situation in which researchers might gain confidence from using different constrained estimators. and cohorts are often coded with dummy variables by APC analysts. (3) untangling which characteristics are unique to the IE and which it shares with other constrained estimators. I then show how these constraints can be implemented in different ways. ca–i+j represents the effects of the (a–i+j)th cohort Downloaded from smr. Yang et al. bias in constrained estimators. ai represents the effect of the ith age group. This coding (as well as dummy variable coding) does not constrain the functional form of the relationship between the dependent variable and the age groups. 1983. Multiple Classification Analysis for APC Models Age groups. which implement constrained estimation. In this article I will use a similar coding scheme: effect coding. I then address. 2011 . ð1Þ where Yij is the age-period–specific value of the dependent variable. In the next two sections I briefly describe the accounting (multiple classification) model utilized in this literature and the identification problem that analysts using this model face. Next I turn to the geometry of constrained estimation with the aim of developing a common framework for evaluating different constrained estimators.

0 Period 2 11.422 Sociological Methods & Research 40(3) Table 1.0 9.sagepub.5 9.5 9. we can determine their age (A ¼ P – C). or some other form of dependent variable). p – 1 represents the number of period effect coded variables. and eij represents the error term [E(eij) ¼ 0].5 8. In matrix notation. and a + p – 2 represents the number of cohort effect coded variables. (a + p – 2)]. (p – 1).0 7. Rank Deficient Matrix The identification problem in APC analysis arises because of the linear dependency between the age. 2011 .0 7. I use age as the row variable. where a – 1 represents the number of age effect coded variables. period. This is the linear dependency that prevents a unique solution to the regression of the dependent variable on the effect coded (or dummy coded) variables for age groups. Age-Period Table With Data Generated as Described in the Text Period 1 Age 1 Age 2 Age 3 Age 4 Age 5 10.0 7. and cohort variables.5 11.5 14.0 Period 3 11.0 13. if we know their age group and cohort. Each of these effect coded variables has ap entries (zeros and ones) in its column (and minus one in the row corresponding to the reference category) and is coded to correspond to the ‘‘cell’’ in which the age-period–specific value of Y Downloaded from smr. period as the column variable. we can represent the multiple classification equation (1) as: Y = Xb + e. If we know a person’s age group and the period.com at William Paterson Univ of NJ on November 4. and if we know their birth cohort and the period.0 Period 5 14.0 8.0 10. and cohorts (even when we have one reference category for each set of effect coded [dummy coded] variables).0 9. (a – 1). The X-matrix is an ap × 2(a + p) − 3 matrix containing ones in the first column and effect coded variables in the remaining columns. The order of the columns can be schematized as [1. we can determine the period (P ¼ C + A).0 12.5 (where a equals the number of age groups).5 11. we can determine their birth cohort (C ¼ P – A). and cohorts are on the diagonals (see Table 1 for an example age-period data matrix).0 10.0 8.5 12. periods.0 Period 4 14.5 10 9. ð2Þ where Y is an ap × 1 vector of the age-period–specific rates (counts.

When a regular inverse exists. This vector is said to be in the null space of X.sagepub. Constraints and Generalized Inverses Even with a rank deficient matrix.3 The last categories of age. a particular set of solutions is associated with the particular generalized inverse used to solve the equation and that inverse is associated with a particular constraint on the solutions: ^c = ðX 0 X ÞÀ X 0 Y : b c ð4Þ This allows us to compute solutions in the constrained APC model. This vector of coefficients is called the null vector. If we label the number of columns in X as m [¼ 2(a + p) – 3]. I have added the subscript c to denote that generalized inverses differ depending upon the constraint with which they are associated: ðX 0 X ÞÀ . b ð3Þ where X 0 is the transpose of X and the superscripted –1 indicates the inverse. where X is the ap × 2(a + p) – 3 design matrix and v is a 2(a + p) – 3 × 1 vector (the null vector). This linear dependency means that there is a set of coefficients that when multiplied times the columns of X produces a column vector of zeros (with the restriction that not all of these multiplying coefficients are zero).O’Brien 423 resides. one and only one such vector of coefficients exists for the APC model (this vector is unique up to multiplication by a scalar). which means that the standard inverse of X 0 X does not exist. and we can write Xv ¼ 0.com at William Paterson Univ of NJ on November 4. For the same reason I have subscripted the solution vector for b with c c. we say that there is a nontrivial solution to the a × p homogeneous equations. The problem with this solution in the APC context occurs because of the linear dependency in the X-matrix. The solution produced is a least Downloaded from smr. the generalized inverse is denoted with a superscripted minus rather than a superscripted minus 1. only m – 1 of these columns are linearly independent. That is. solving an equation of the form of (2) for a unique set of least square regression coefficients is trivial: ^ = ðX 0 X ÞÀ1 X 0 Y . it is still possible to find a least squares solution to (2) by using a generalized inverse (Searle 1971). 2011 . In general. and each solution is determined by the constraint. and cohort serve as the reference categories in our representation. Typically. The existence of only one such vector indicates that the rank of X is just one less than full column rank and that a single linear constraint should allow for a solution to the identification problem. period. In the language of linear algebra.

2008) rightly assert that their estimate is unbiased. period.424 Sociological Methods & Research 40(3) squares solution. c With the generalized inverse associated with a particular constraint in hand. 1983). we multiply X 0 Y in (3) by the generalized inverse to obtain a solution: ^c = Gc X 0 Y . where c is the m × 1 vector for the constraint and β is the m × 1 vector of population effect coefficients that generated the data. it is an unbiased estimate of the least squares Downloaded from smr. When a constraint is used in the APC context. Specifically. This solution is determined by the constraint. (2004. if v represents the null vector. I will use the symbol Gc to represent the generalized inverse associated with a particular constraint. the constraint conventionally involves setting two of the coefficients associated with age. 2011 . that is. the assumption is that v0 β = 0. Thus. (Y À Y )2 is the same no matter which generalized inverse is used to solve the equation.4 It is well known in APC modeling that the fit cannot be used to differentiate the validity of different constraints (assuming the use of just one constraint so that the model is just identified). Li.com at William Paterson Univ of NJ on November 4. the estimates associated with this constraint will be biased in the sense that they are biased estimates of the parameters that generated the age-period–specific rates. each solutions (bc) associated with the particular constraint produces the same predicted values of Y and these values minimize ^ the sum of the squared residuals. Since v is a specific constraint. rather than the more awkward notation(X 0 X )À . see Kupper et al. They show how to derive a specific generalized inverse from the constraint. period. where it will be clear from the context that c is the null vector or a different constraint. we can write c0 β = 0. or cohort to be equal. The way that constrained estimation is typically employed in APC analysis is to place a single constraint on X so that the model is just identified. since it determines the set of estimated values (bc) and whether those estimated values are unbiased. and cohort coefficients.5 If c0 b 6¼ 0. This is true for the conventionally used constraints and the constraint associated with the IE.sagepub. The choice of a constraint is crucial. b and Bryce (1980) explicate this correspondence between a constraint. To the extent that this assumption is not true. As I discuss in the following. Mazumdar. The assumption is that c0 β = 0. the estimate under that constraint will be biased in that its expected value will not equal the value of parameters that generated the data (for a further explication of the bias associated with violating this assumption. when Yang et al. its generalized inverse. and the solution. The constraint associated with the IE is that the solution must be perpendicular to the null vector (the vector that when pre-multiplied by X produces an m × 1 zero vector). The choice of the constraint then determines a generalized inverse used to solve for the age group.

To make the discussion more concrete. and cohort coefficients in generating the outcomes: e. we can find a set of least squares solutions under a particular constraint: ^c = Gc X 0 Y . respectively.sagepub. and (4) replace the last column of this inverse with zeros. cited in Yang et al. c0(age1 = age2) or c0(nullvector) . Figure 1 presents three constraints. Mason et al. Each of these places a linear constraint on the solution. In the first two cases this will be true. Using this process. The first two constraints fix two of the effect coefficients to be equal to each other and the third uses the null vector of X as the constraint. The question is which one of these constraints is best with respect to obtaining a substantively correct result (assuming that the researcher wants to determine the effects of age. the generating parameters. The third constraint is the one associated with the IE.g. 2011 .000 × bint + (− 0. + (. four period effect coded variables.134 × bage2) + .0401 × bcoh8) ¼ 0. We can write the assumptions associated with each of these restrictions as c0(age1 = age2) · β = 0. (2) replace the last row of X 0 X with the constraint that we chose to use. . for example. 1985. the constraint should be based on theory and/or empirical evidence that the constraint applies to the population under consideration. the generalized inverse Downloaded from smr. four age group effect coded variables. As previously argued in the literature (Kupper et al. The assumption when using the constraint associated with the null vector is more complicated (though the IE is easily solved for using an ‘‘add-on’’ program for STATA. . The assumption for unbiased estimates of the generating parameters is that the dot product of the constraint vector and the vector of the parameter values associated with the process that generated the Y-values is equal to zero (the reference categories are omitted). (1980) implement the constraints in the following manner: (1) Compute X 0 X . where the columns of the X matrix contain one intercept. The result is the generalized inverse (Gc) associated with the particular constraint. 2008).267 × bage1) + (−0. the assumption is that 0. These constraints are based on a five-by-five age by period matrix. (3) compute the inverse of this new matrix. c0(coh7 = coh8) · β = 0. or marriage rates). homicide rates. period. That does not mean that the IE is an unbiased estimate of the parameters that generated the age-period–specific rates. and eight cohort effect coded variables. if the population parameter values for age1 and age2 are equal to each other or if the population parameter values for cohort7 equals the value for cohort8. specifically. Mazumdar et al.6 When the null vector is b substituted for the aforementioned last row of X 0 X.com at William Paterson Univ of NJ on November 4. and c0(nullvector) · β = 0.O’Brien 425 solution in the subspace that is perpendicular to the null vector. 1973. The first two constraints are conventional constraints that might be used in APC analysis. Smith 2004).. for this five-by-five age-period matrix. suicide rates.

000 0.134 0.134 0.000 0.000 0.000 0.000 0.000 0.000 1. although the Moore-Penrose generalized inverse has some other valuable properties for those using matrix algebra.267 -0.000 0.com at William Paterson Univ of NJ on November 4.000 0.000 0.000 0.000 0.000 0. Simple Implementations of Constrained Regression Researchers wanting to implement the first two constraints in Figure 1 may choose to do so using a simple recoding of their data. one can simply create a new variable (newage1_2) that is the sum of age1 and age2 and use it in the analysis rather than age1 and age2.000 0.000 0.000 1.000 0. This result occurs because the Moore-Penrose generalized inverse institutes this constraint.000 0. 1980.000 -1.267 -0.000 -0.000 -0.134 -0. 2008).000 0.267 0. Three different constraints on the X-matrix that results is one that yields a solution for bc that is the same as if we had used the Moore-Penrose generalized inverse: the inverse used in the IE approach (Mazumdar et al.000 0.267 0. Yang et al.134 0.000 0.134 0.000 0.000 0.sagepub.000 0. The coefficient associated with newage1_2 is the coefficient for both age1 and age2.401 Figure 1. This is equivalent to creating a single column in X that is equal to the age1 column plus the age2 column and using this column in place of the columns for age1 and age2 in the regression analysis.426 Sociological Methods & Research 40(3) c age1=age2 c coh7=coh8 c null vector intercept age1 age2 age3 age4 period1 period2 period3 period4 cohort1 cohort2 cohort3 cohort4 cohort5 cohort6 cohort7 cohort8 0.000 0.535 -0.000 0. It is Downloaded from smr.134 0.000 0. To implement the first constraint based on age group 1 and age group 2 having the same effects. one can add the variable for cohort7 and cohort8 to create the new variable (newcohort7_8) and use it in the analysis instead of cohort7 and cohort8.000 0.000 0.000 0.000 0. 2011 . To implement the second constraint.000 0.000 -1.000 0.401 -0.000 0.

coh7.00a3 + .401c8.5 times that coefficient is the coefficient for age1. newcoh7 ¼ coh7. .5. the solution to the m × 1 b vector of coefficients is in an m dimensional solution space. .50c8. . . – 1. Then we can run a regular regression using newage1_2. If we implement this constraint (that is. The null vector in Figure 1 means that 0 ¼ 0. One can construct this constraint by adding 0.00intercept –. In this particular representation. Constrained regression can also be used with the first two examples.5 × age1.134a2 + 0.O’Brien 427 a straightforward extension to implement a constraint in which age1 ¼ 0. .267c7 + . The Geometry of Constrained Regression When X 0 X is one less than full column rank. age3. +. by using constrained regression.00c7 – 1. 0. 0. 2011 .5 × age1. We know from the previous sections how to estimate the m elements of the b vector by using a generalized inverse associated with a constraint. . 2008) to calculate the IE. Specifically. .5 × age1). .5 is for the reference group. or by algebraically Downloaded from smr. Using these values as regressors implements the ‘‘null vector constraint’’ and produces coefficient estimates equivalent to those from using the IE.50coh8) in a constrained regression program. The coefficient for newage1_2 is the coefficient for age2 and .5 × age1 to age2 (newage1_2 ¼ age2 + 0. it is possible to implement the null vector constraint and produce the coefficient estimates from the IE using constrained regression or by transforming the columns of X using the strategy described in the previous paragraph. The new column generated by this transformation contains 0. -1.5 × age2. 0.50a4 + . . – 1. . Another approach uses constrained regression.com at William Paterson Univ of NJ on November 4. 1. newage3 ¼ age3. A reviewer of this article suggested transforming the columns of independent variables in the following manner (although noting that the procedure is tedious): newage2 ¼ age2 – 0.5.50a2 + 0. we see that a1 is a constrained linear function of the columns of the other independent variables in X.00a3 + . . it lies in an m – 1 subspace of that solution space that is orthogonal to its constraint. newcoh8 ¼ coh8 + 1.5. Although almost all researchers will want to use the convenient ‘‘add-on’’ program for STATA (cited by Yang et al. the oldest age group in our example). which employs the constraint: age1 ¼ 0. . Adding 2. we obtain the same estimates produced by the STATAbased IE program. and this pattern repeats until there are no more rows in this column for this newly created variable.67a1 to both sides of the equation and dividing both sides of the equation by 2.5 × age2. as it is in the APC situation. coh8 as the regressors. .67a1 – . age1 ¼ –. (where –1. .67 yields a1 ¼ 0.5age1.134a4 + . .50age4 + .sagepub. newage4 ¼ age4 + 0. .50age2 + 0. 1.00coh7 – 1.00intercept – 2.

7 since 0 times the first column of X plus 1 times the second column of X plus –2 times the third column of X results in a 3 × 1 column vector of zeros. This line is labeled as the line of solutions in Figure 2. 0) and (3. the line on which every constrained solution appears. b2. the value of the intercept remains constant at 3. 1.0 (if b2 increases by 1.428 Sociological Methods & Research 40(3) applying the constraint using the columns of X. b2.0). Its slope with respect to the b2-b3 plane is the same as that of the null vector. 4. and b3 (the three-dimensional solution space) with the b1 axis representing the intercept. The line of solutions. Many essential properties of constrained regression can be illuminated using this approach. –2) is represented by the stippled line through the origin of (b1.com at William Paterson Univ of NJ on November 4. Another way to determine this line is to find the intersection of the first two planes represented by the (first two) normal equations in (6). 1. 2011 . If b1 ¼ 3 and we set b3 to zero b2 must be 4. To provide intuition about how constrained regression works geometrically. The problem is that the remaining Downloaded from smr. –2). if we bear in mind that the intercept (b1) ¼ 3. The line of solutions and the null vector are parallel to each other. We can determine the line of solutions in this simple example by using (6).0. The second column is twice the third and the null vector is (0. we use a simple illustration in which X 0 X is a 3 × 3 matrix. In the example I use the following specific data for Xb ¼ Y: 2 1 61 6 41 1 4 2 4 2 3 2 3 2 2 3 19 b 1 74 1 5 6 11 7 7 b = 6 7. –2. If we connect these two points. The line of solutions for our data is always three units above the b2-b3 plane. b3 decreases by 2. 8). 4 19 5 25 2 b3 1 11 ð5Þ which produces the normal equations for (X 0 X)b ¼ (X 0 Y) 2 32 3 2 3 4 12 6 b1 60 4 12 40 20 54 b2 5 = 4 196 5: 6 20 10 98 b3 ð6Þ Here X 0 X is singular reflecting the linear dependency in X. 0.sagepub. Figure 2 has three axes b1. is three units above the b2-b3 plane. we have the line on which all of the constrained solutions in this example must fall. and if we set b2 to zero b3 must be 8.0. b3) that lies on the b2-b3 plane (since its b1 value is zero). (3.0 for all solutions using the constrained regression approaches outlined earlier for the data in the example. The null vector (0. The slope of the null vector on the b2-b3 plane is –2.

We can. Different generalized inverses are associated with different constraints and yield different solutions—all lying on the line of solutions that is parallel to the null vector. Since this plane does not intersect the line of solutions in a single point (because of the linear dependency). Geometric view of constrained regression in three dimensions normal equation represents a plane on which this line lies. use a generalized inverse to find a solution for the bs for our example data.O’Brien 429 b1 Null Vector of X’X (0. We have seen earlier that we cannot solve this problem (equation 6) using a regular inverse. We can force this plane to intersect the line of solutions at a single point by setting a constraint on the solution. 1.0. we do not know where on the line of solutions to chose a solution. 2011 .com at William Paterson Univ of NJ on November 4. however.0) 4 3 b2 Figure 2. -2) Line of Solutions bc = GcX’Y (Values in our Example) 3 b3 8 90o (0. Using the system Downloaded from smr.sagepub.

1) · (3. The plane depicted in Figure 3 has just such a slope with respect to the b2-b3 plane.0.67. we substitute this constraint vector for the last row of the X 0 X.6).0. which implies that b2 ¼ b3.1) yields (3.5. 0. and then substitute a column of zeros for the last column of this inverse matrix.67) 0 . but only solutions that are on the single line (the line of solutions) that is parallel to the null vector. 6. we can confirm that all of the solutions fall on this line of solutions depicted in Figure 2. 4. 3. to obtain the solution for the constraint (0.0. 0. For example. each one produces the same predicted values of Y.0.0. (0.0. 0.430 Sociological Methods & Research 40(3) described earlier (Mazumdar et al.0) 0 ¼ 0 and (0.2. 6. With the constraint (0.6667. Figure 3 depicts this solution geometrically. 0) and has a slope of –1.0.6) 0 . 1. –1. 1).2.6667) 0 ¼ 0.2. obtain the same estimates using constrained regression or transformations of the columns of X to implement the constraint.0. and it intersects the line of solutions at the point (3. –1. the slope of the null vector (0. of course. and then substitute a column of zeros for the last column of this inverse matrix. This plane is orthogonal to the null vector (its slope with respect to the b2-b3 plane is 0. For example.0 with respect to the b2-b3 plane) and pass through the b1 axes.67) is the solution under this constraint.’’ the difference is that this solution is orthogonal to the null vector. 1. 2. This is part of what it means to be on this line of solutions.6) 0 ¼ 0. It is easy to check that this solution vector is perpendicular to the null vector: namely. Downloaded from smr. 3. Note also that each solution is orthogonal to its constraint. –1.com at William Paterson Univ of NJ on November 4. If we substitute these solutions (bc) into (5).5) the estimated coefficients are (3. 1980) to obtain the generalized inverses for the data in (6) and solving for the bc vectors under different constraints. 2. 3. 2. Pre-multiplying X 0 Y by this generalized inverse yields the solution vector (3.5) · (3. 0. 2. Pre-multiplying X 0 Y by this generalized inverse yields the solution vector (3. 1. –1. The constraint passes through the origin (0.6667). is perpendicular to the b2-b3 plane.sagepub. This solution is depicted in Figure 4 as occurring where the solution plane intersects the line of solutions. and the ‘‘slope’’ of the solution plane relative to the b2-b3 plane (which must be orthogonal to this constraint) is .0. –2) · (3. The solution plane must be perpendicular to the constraint (it must have a slope of 1. and using the constraint (0.0).8 Therefore. 2.67. 0.5). 0. (0. find the inverse of this new matrix. If we repeated this process to obtain the ‘‘Moore-Penrose solution. 4.0. 0. and passes through the origin.0. we substitute the null vector for the last row of the X 0 X. find the inverse of this new matrix.6667. 2. Where this solution plane intersects the line of solutions (3. –2) to the b2-b3 plane is –2. 1. Geometrically.0. –1. The constraints do not allow just any solution to emerge. 2011 .0 with the b2-b3 plane. 1. We can.

67 4 Constraint (0. We would always find that the intercept is 3. 2011 .67) 2.0 units above the b2-b3 plane. Downloaded from smr.67. and then determining where this plane intersects the line of solutions. b2= 2.sagepub.67 45o b2 = b3 2.0.-1) b2 Figure 3. then the solution plane that is orthogonal to the constraint. we could produce the constrained solutions for the data in (6) corresponding to any constraint by first constructing the line of solutions.O’Brien 431 b1 Line of Solutions bc = GcX’Y (Values in our Example) Null Vector b3 8 Solution (b1=3. since the line of solutions (for this data) is always 3.com at William Paterson Univ of NJ on November 4. Geometric view of constrained regression in three dimensions: b2 ¼ b3 If we were careful geometers. b3=2.0.1.

This manipulation simply moves us along the line of potential solutions. b2=3. 2. one of our solutions is (3.0. 6.0.com at William Paterson Univ of NJ on November 4.0.6667).20 (0. 2. 2.5o b3 = ½ b2 3. Geometric view of constrained regression in three dimensions: b3 ¼ 1⁄2 b2: The Moore-Penrose solution We can characterize the differences between various constrained solutions by means of the factor kv. 0.0.0. 2011 . where k is a scalar and v is the null vector.6) 1. 0.6667. 2.6667. b3=1.6667.60 22. For example.6667) and another is (3. –2) to the first solution.0.sagepub. 1.432 Sociological Methods & Research 40(3) b1 Line of Solutions bc = GcX′Y (Values in our Example) Null Vector b3 8 Moore-Penrose Solution (Values in Our Example) (b1=3.6667.6667).2. here moving from (3. We can move from the first to the second solution by adding –2 × (0. This relationship keeps all Downloaded from smr. 6.6667) to reach (3.0) 4 b2 Figure 4.

2.6667. 2.472. 2.6667) from (3. 2011 .0.6667.6667.6667).6667) to (3.6667. To move from the point (3. 0. the solution is always orthogonal to any constraint. We must move 2 units [2 × sqrt(v 0 v)] along the line of solutions to reach (3. The intercept is always the same no matter which linear constraint is chosen. We can reach all of the solutions for the constrained estimators on the line by taking any solution (from one of the generalized inverses) and adding a scalar times the null vector to it. The length of the null vector is the square root of the sum of its squared components: In our example the square root of (02 + 12 + –22) ¼ 2.0.0 units on the b1 axis.com at William Paterson Univ of NJ on November 4. Where this hyperplane intersects. the solutions for any one set of data using different constraints still lie on a line that is parallel to the null vector. Note that k depends on how we scale the null vector. The distance along the line of solutions between (3. All of these solutions fit the model equally well in terms of predicted Downloaded from smr.236. The line of solution is intersected by an m – 1 dimensional hyperplane that is orthogonal to the constraint. –2) is also the null vector. 1. k. 2.6667) and (3. This (one dimensional) line is then in an m dimensional space rather than in a three-dimensional space. 0. The unit of measurement for k is the length of the null vector.6667. Each set of solutions is still perpendicular to the constraint. where c and c* are two different constraints. 0. 6.6667.O’Brien 433 of the possible solutions on the line of solutions. still tells us how far we must travel on the line of solutions with the units of measurement for k being the length of the null vector (which now has m elements).6667. The scalar. The second geometric interpretation is based on the length that must be traveled on the line of solutions from one solution to the other. Two ways to view this movement are first. since any scalar multiple of (0.0. a single linear constraint is placed on the model that makes the model just identified.6667). When we move to a situation in which there are four or more dimensions. the line of solutions provides the solution associated with the particular constraint. Relevance to APC Models In the class of APC models that I examine. –2 units parallel to the b2 axis and up 4 units parallel to the b3 axis where we intercept the line of solutions at (3. move 0. 2. 6. All of the solutions using a single linear constraint lie on a single line of solutions in multidimensional space. 0. 6.sagepub. as a movement from one solution to the other solution by moving parallel to the axes.0. In this class of models.0.6667) is twice this distance: 4.0.6667). 6. 2. The difference between any two constrained estimates may be written as bc – bc) ¼ kv.0.

A Solution for All of the Effect Coefficients Because the APC model without an additional constraint is not of full column rank. In Figure 4 the solution plane perpendicular to the null vector has a slope of .62)). 2011 .669 (¼ sqrt(3.02. and judging how much confidence should be placed in the estimates.434 Sociological Methods & Research 40(3) values.50 with respect to the b2-b3 plane (a one-unit increase on b2 is associated with a 1⁄2 -unit increase in X3). it is underidentified and there is no solution that provides a unique vector of age. 1992:62). b3 origin to the point of solution on the line of solutions. Before leaving this introduction to the geometry of constrained regression. and cohort coefficients that corresponds to the generating Downloaded from smr. including the IE.22 + 1. we might want to pick the one with the smallest length’’ (Press et al. We can also see that it is representative in the sense that it is in the ‘‘middle’’ of a line of solutions that stretches out in either direction. The length of the solution vector is the distance from the b1. The solution that it produces is closer to its constraint than using any other constraint. The aim is to provide researchers with a basis for: understanding how constraints are related to solutions. –1).6672)). All of the solutions are related to each other by kv. the length of the vector from the origin to the solution on the line of solutions using the null vector is 4. comprehending how the estimated coefficients from the various constrained models are related. 2.sagepub.819 (¼ sqrt(3. This is often cited as an advantage of using the Moore-Penrose solution: ‘‘If we want to single out one particular member of this solution set of vectors as a representative.’’ Some Shared Characteristics of Constrained APC Models In this section I discuss some of characteristics that are shared by all constrained APC models. I should mention a unique property of using the null vector as a constraint. period. 1.6672 + 2. 3. It may be this property that inspired Smith’s (2004:116) comment: ‘‘There is also a sense in which the IE is an average of CGLIM [Constrained Generalized Linear Model] estimates. For example.02. b2. the distance to the solution is 4. I also discuss the characteristics of the IE that are different from other constrained models. Using the constraint (0. and the direction of that hyperplane is orthogonal to the constraint.com at William Paterson Univ of NJ on November 4. The solution is determined by the intersection of the line of solutions with an m – 1 hyperplane.

sagepub. The crucial question. in fact. Any constrained regression approach provides estimates of the unique effect coefficients for each age group. This restriction on the solutions occurs because of the linear dependency in the original data. For each constrained solution. The linear constraint. like the null vector. is unique up to multiplication by a scalar.O’Brien 435 parameters for the data. c0 b = 0? Solution Space Is Perpendicular to the Constraint Yang et al. period. is whether the parameter estimates are unbiased. They are unbiased estimates in the sense that they are unbiased estimates under a particular constraint. from our perspective. and  is the complement subspace orthogonal to N. since the solution vector is determined by a point on the constrained hyperplane. and the hyperplane is orthogonal to the constraint. in the sense that their expected values equal the values of the parameters that generated Y: Does. In general. I note that the parameter space can be decomposed as P = Bc ⊕ Πc where Bc is a linear constraint spanned by the vector {sc} with real numbers s and constraint vector c. and cohort under the particular constraint. The solution vector must be orthogonal to the constraint. (2004:82) note that ‘‘the parameter space P can be decomposed as P = N ⊕ Y. N is the one-dimensional null space of X spanned by the vector {sB0}.com at William Paterson Univ of NJ on November 4. Since the effect coefficients are estimated under this Downloaded from smr. with real number s. The constraint chosen by the researcher determines the amount of bias associated with the particular constrained estimate. and Πc is a complement subspace orthogonal to Bc .’’ In our notation B0 is the null vector (v) and s is the scalar by which the null vector may be multiplied. Specifically. and cohort coefficients that produce least square estimates. Bias Arguably the most important consideration in setting a linear restriction is the degree of bias associated with that restriction. period. As noted earlier. the constraint determines the direction of the solution plane/hyperplane. 2011 . since it is unique up to multiplication by a scalar. There are instead an infinite number of solutions for the age. all of the solutions for the different constraints (with a fixed set of data) must fall on a line parallel to the null vector. this point is the intersection of this hyperplane (with its direction constrained to be orthogonal to the constraint) and the line of solutions. where ⊕ is the direct product of two linear spaces that are perpendicular to each other.

Therefore. there are an infinite number of possible Downloaded from smr. The IE is an unbiased estimator of the bie parameter values associated with the null vector constraint: E(^ie ) = bie . we cannot use this observed orthogonality to judge the b validity of the constraint. where kv measures the distance between the expected value of b the constrained solution and the population parameters that generated the data (b). Thus. b b Using this conception of bias. 2008) indicates that the IE is an unbiased estimate of the solution to the APC model that lies in the subspace that is orthogonal to the null vector. it is the difference between the expected values of the estimates in the vector of constrained estimates. This is an important property for any conb strained estimator. I use bias to refer to the differences between expected values of the estimates and the values of the parameters (b) of the process that generated the outcome values. and b: bias ¼ E(^c ) À b. Kupper et al. E(^c ). Specifically. A careful reading of Yang et al.sagepub. 2011 .com at William Paterson Univ of NJ on November 4. just as other constrained estimates would provide unbiased estimates of the parameters that generated the age-period–specific rates if and only if c0 b = 0. (1983) note that this value of k can be computed as: k = Àc0 b=v0 c and that c 0 b ¼ 0 when the expected value of the constrained estimate equals the parameter values that generated the outcome: E(bc ) = b. (2004. except that one of the solutions is the ‘‘true’’ solution (the one representing the process that generated the outcomes). but the focus of most of the literature has been on bias in terms of the expected value of the estimated parameters and the parameters that generated the data. Kupper et al. if and only if v0 b = 0. This occurs because the X matrix is rank deficient by one. A different use of the term bias involves the difference between the expected values of the estimators and the values of the parameters under a particular constraint: E(^c ) = bc . Relationship of Constrained Solutions to Each Other The problem confronting APC analysts who employ constrained regression (whether using traditional constraints or the IE) is that they cannot find the unique solution to the normal equations. A general definition of bias involves the difference between the expected value of an estimator and the parameter that it estimates. The b IE would provide an unbiased estimate of the parameters that generated the age-period–specific rates. In this article. (1983) note that for APC models kv = E(^c ) À b. the estimated values will be orthogonal to the constraint: c0 ^c = 0. This is identical to the distance between any two constrained estimates on the line of solutions.436 Sociological Methods & Research 40(3) assumption. The crucial question is whether the constraint is orthogonal to b.

In a fundamental sense. Thus. Rodgers 1982. The difference between any two solution vectors in the class of constrained estimators examined in this article is kv ¼ (-c01 ^2 =v0 c1 ) × v.g. however. The distance between any pffiffiffiffiffiffi pffiffiffiffiffiffi two solutions on the line of solutions is k · v0 v). we could determine the difference between that solution and any other solution generated using a different constraint or.O’Brien 437 solutions. determine any other solution.com at William Paterson Univ of NJ on November 4. not just any solution is possible. All constrained solutions are related to each other in that they lie on a single line of solutions in multidimensional space. those underlying OLS multiple regression) are correct. Smith 2004) caution against setting the constraint based on the values of the observed dependent variable. Kupper et al. This is true for all of the constrained estimates that we examine. and it extends to higher dimensions and accounts for the well-established finding that in APC analysis one cannot distinguish between models on the basis of fit when they are just identified. where v0 v is the length of the null vector. we know that the solution corresponding to the parameters values that generated the outcome data must fall on this line. An advantage of the IE approach is that it assures that the researcher does not give in to this temptation—the constraint is based purely on the Downloaded from smr. 2011 . 1983. Appropriately. even though the X matrix is rank deficient by one. kv indicates b b how much the coefficients based on the two different constraints vary: ^2 b ^1 ¼ kv. of course. Setting the Constraint and Zero Influence Many sources (e. where b c1 is one constraint and ^2 is the solution under a different constraint. b Model Fit The solutions for the normal equations associated with the APC model all lie on the line of solutions and any solution to these normal equations is a least squares solution. Thus. Thus. the value of k ¼ 0 when the constraint for a solution and the solution ^2 are orthogonal c01 ^2 = 0.sagepub. if our standard assumptions (e. When b we have calculated a single solution.. Any solution to the normal equations must lie on this line and any of the solutions to the normal equations are ordinary least squares (OLS) solutions.. the line of solutions is the most tangible thing we know about the solution to this system of equations. I showed this for the 3 × 3 case using the data in equation (5).g. each of the constrained estimates generates a solution (bc) that produces the same solutions for the values of the dependent variable.

First.com at William Paterson Univ of NJ on November 4. If they mean that the Y does not affect the constraint that is chosen. B0 has a specific form that is a function of the design matrix. period. data independent. Kupper et al.e. and a potential advantage of using this constraint.’’ This is an important statistical advantage of using the null vector as a constraint and is in agreement with Kupper et al. and cohort effect parameters under study. (2008:1709) note that ‘‘for a fixed number of time periods of data. the only remaining option is to make use of any reasonably reliable a priori. But it is not clear what the following statements means: ‘‘The fact that the fixed vector B0 is independent of the response variable Y suggests that it should not play any role in the estimation of effect coefficients’’ (Yang et al.438 Sociological Methods & Research 40(3) X-matrix. and thus is completely determined by the numbers of age groups and period groups—regardless of the event rates. 2008:1705) or ‘‘Specifically. one could argue that in some situations with good theory/past research that it would be better to set the constraint on the basis of theory and previous research than setting it on this basis of the null vector of the X-matrix. (2008:1704) state: ‘‘The eigenvector B0 [the null vector] does not depend on the observed rates Y. the IE is more statistically efficient (has a smaller variance) than any CGLIM estimator that is obtained from a nontrivial equality constraint on the unconstrained regression coefficient estimator.’’ This is correct. (1983:2797) who note that the principle component estimator (a linear transformation of the IE) ‘‘deals with Downloaded from smr. 2011 . on the specific parameterization of the vector b that is estimated by the IE)’’ (Yang et al. (1983:2803-804) conclude: ‘‘Given that the observed Yij’s are of no help in determining c. as noted earlier. only on the design matrix X. then they are correct. the analyst does not choose which constraint to use.sagepub.’’ The second caution is that having no choice in the constraint used does not mean that the constraint used is unimportant or has no effect on the outcome measures. I suggest two cautions. knowledge about the underlying age. In other words. the IE imposes the constraint that the direction in parameter space defined by the eigenvector B0 in the null space of the design matrix X have zero influence on the parameter vector b0 (i. Variance of the Estimators Yang et al. then this is incorrect. Yang et al.. The solution vector must be orthogonal to the null vector and therefore is affected by this constraint. 2008:1706). If these statements are meant to convey that constraining the solution to be orthogonal to the vector B0 (the null vector) does not affect the estimates. and in this sense.

com at William Paterson Univ of NJ on November 4.’’ Given that any of the constrained solutions provide the same fit to the data. and its variance characteristics. .e. it is possible to view this solution as the balance point of a teeter-totter—with solutions extending in both directions along the line of solution from this solution point.O’Brien 439 the exact linear dependency by involving only the non-zero eigenvalues associated with the eigenvectors. where any of the conb strained estimates.. the principal component estimator is more biased and ‘‘could lead to more bias than the use of some other constraint . From the discussion of the geometry of constrained estimation. bie . (1983:2797) note that to the extent that: v0 b departs from zero.’’ Most Representative Solutions The constraint used for the IE (the null vector) results in a vector from the origin to the solutions that is the shortest of all the constrained estimators. when the squared multiple correlation coefficient R2 is fairly close to 1. . (1983) that bias (and I and they are using bias in the sense of estimating the generating mechanism) is likely to be the more important factor. mean square error) considerations.’’ Because of this. plus k times the null vector. Convergence I showed earlier that the scalar k times the null vector (kv) represents the difference between any two constrained estimators when using the same data set. so that the optimal method for choosing c should probably take into account both bias and variance (i.sagepub. b. Kupper et al.’’ being (in a sense) most representative. I am in agreement with Kupper et al. is there one solution that could serve as the conventional solution so that different researchers would report the same solution in the absence of evidence of a ‘‘better’’ solution? Some might say yes and that the solution should be one based on the Moore-Penrose generalized inverse.’’ While noting that this is an important statistical property. 2011 . a result which seems to occur not infrequently in practice. It also holds for the traditional constrained Downloaded from smr. This holds for the intrinsic estimator ^ = bie + kv. Of course. . the bias becomes the main area of concern. because of its statistical properties: ‘‘closest to its solution. This property can be used to argue that this estimate is a sort of average of the estimates based on constrained estimation. it ‘‘should be preferred on variance grounds. ^ equals the estimates based on the intrinsic estimator. In a similar vein we can view the Moore-Penrose solution (the IE solution) as a sort of ‘‘conventional solution.

Given ^ = bc + k * v * b and that bie is a constrained estimator.com at William Paterson Univ of NJ on November 4. bc . I note that I can also write ^ = bc + k * v * . where any of the constrained estimators (including b the intrinsic estimator) equals any specific constrained estimator. (2008:1709) choose to norm the length of the null vector to 1 in their presentation. I have used the asterisk on v* to emphasize that it is the normed null vector and on k* to emphasize that this scalar is appropriately scaled. 2011 . Their transformed parameter values are the same as those that result from our use of constrained regression (or using generalized inverses) to obtain the parameters of the IE. given the normed null vector. plus k times the null vector. the unemployment rate across a 60-period span may fluctuate up and down over short periods of time. Each of these solution vectors is a biased estimate of the parameter vector that generated the outcome data to the extent that it differs from the parameters values (the age. they note that as the number of elements increases (the number of age groups and periods increase) the elements of the null vector become smaller and ‘‘converge elementwise to zero’’ (Yang et al.sagepub. since the expected value of v * goes elementwise to zero with such increases. 2008:1705). convergence can occur for other reasons. I could just as easily argue that the expected value of the intrinsic estimator converges toward the expected value of any of the constrained estimators as v * goes elementwise zero. 2008:1709). On the other hand. The null vector is unique up to multiplication by a scalar. For example. It might be the case that there is a zero trend in the period effects in the long term. but with no apparent linear trend in the long term. we may or may not get an accurate estimate of the data Downloaded from smr. I am skeptical of this argument because as the normed v goes elementwise to zero there is no reason to assume that k * remains constant. period. and cohort parameters) that ‘‘nature’’ used to generate the age-period–specific results. They later transform their solution using ‘‘the orthonormal matrix of all eigenvectors to transform the coefficients of the principal components regression model to regression coefficients of the estimator B [the intrinsic estimator]’’ (Yang et al. I have found in simulations that k * does not remain fixed as the number of periods increases for the models I have investigated. Since they use this normed null vector that has a length of 1.440 Sociological Methods & Research 40(3) estimators ^ = bc + kv. They use the formula ^ = bie + k * v * (in our notation) to represent the b relationship of any constrained estimate (^ to the values estimated by the b) intrinsic estimator (bie ). and Yang et al. If we set a zero linear trend (ZLT) constraint for periods over a short time span. Yang et al (2008) argue that the expected value of any conb strained estimator converges in value to the expected value of (bie) as the number of periods and/or age groups increases.

I do not claim that this data set is somehow representative of all of the possible generating mechanisms. two for the next oldest. When we increase the number of periods to 20: the number of age groups remains at five and the number of cohorts increases to 24. we are likely to get a more accurate measure of the generating parameters for age. If the constraint is not consistent with the way the data were generated. say 30 years. they will all lie on a line of solutions. we will obtain some other set of results. and continue this up-and-down pattern until we reach the 20th period.O’Brien 441 generating age effect coefficients. they will fit the data equally well. I report the results using two data sets (one with 5 periods and one with 20 periods). I begin the simulation by using the data for the 5 × 5 age-period matrix in Table 1. four for the next 2 periods. 2011 . and a 60-year span will likely provide an even more accurate measure. The period effects are two for the first 3 periods. But whatever the results.com at William Paterson Univ of NJ on November 4. I leave the cohort effects as they are for the 5 × 5 table and continue with no increase or decrease in the cohort effect for the newly added cohorts that correspond to the newly added periods. The results point to many important relationships and illustrate many of the points that I discussed previously. The period effect is two for the first three periods and four for the next two periods. The age effect is three for the two youngest age groups. assuming an up and down pattern of unemployment with very little or no overall trend. as are the number of possible constraints. I generated the cell entries by setting the earliest cohort value (cohort 1) to 5 and keeping the cohort effect at 5 through the fourth cohort and then increasing the cohort effect by . I use the data from the 5 × 5 age-period matrix in Table 1. When I extend our analysis to 20 periods. The age effects remain the same as they are for the 5 × 5 table across all 20 periods.10 Investigating the Effects of Different Constraints It is straightforward to investigate with data what happens to the estimated age. two for the next 3 periods. one of the next oldest. The number of mechanisms is infinite.50 for each cohort through the ninth and final cohort. Certainly. because the ZLT constraint for periods is more likely to be correct (or closer to being correct) as the span of periods increases. This is not surprising. period. they will be orthogonal to the constraint used. we should obtain results that are consistent with the way the data were generated.9 If we set a ZLT constraint for periods covering a longer span of time. and zero for the oldest.sagepub. and the intercept will be the same no Downloaded from smr. I expect that if a constraint is consistent with the way the data were generated. and cohort coefficients as the constraints change.

The quandary in APC analysis is to discover a method that will yield results that are consistent with how ‘‘nature’’ generated the data. The solution vector is perpendicular to the constraint vector. (3) All of the solutions lie on a single line of solutions and thus differ from one another by kv. this is also true in the 5 × 20 case. to move from the solution for the age constraint to the solution for cohort constraint. . we need to multiply only the constraint times the corresponding elements of the solution vector. I used constrained regression analysis in STATA.11 The final column of Table 2 contains the null vector for the 5 × 20 age-period matrix. thus. The first four columns show the results when there are five periods and five age groups (based on the data displayed in Table 1) for four different constraints: age1 ¼ age2. .com at William Paterson Univ of NJ on November 4. . This is easily verified by multiplying the transpose of the constraint vector times the solution vector. Table 2 presents the results.575 for the data with five age groups and 20 periods.267 × age1 – . I first note that the analyses produce results consistent with several principles noted earlier. Similarly. and the constraint associated with the IE.742. In the 5 × 5 case. In the 5 × 5 case. We can therefore determine the coefficients for these reference categories and have reported this in Table 2. .000 × age3 + . which when multiplied times the appropriate elements of the solution vector results in zero (here I have placed the intercept as the final term to match the output).967. setting age1 ¼ age2 corresponds to a constraint vector of (1. . (2) Each of the solutions is perpendicular to its constraint. For example. to move from the age constraint to the Downloaded from smr. (1) For the same X and Y data the intercept is always the same no matter which linear constraint is used to identify the models: 10. The data have been coded using effect coding so that the sum of the effect coefficients is zero.433 for the data with 5 periods and five age groups and 11. . 0. cohort7 ¼ cohort8. Note that the constraint vector ignores the reference categories (since they are ‘‘dropped’’ from the analysis). the age1 and age2 coefficients both equal 1. 0. and I have used the last category of age groups. To calculate the IE.442 Sociological Methods & Research 40(3) matter what constraint is used. the transpose of the null vector times the solution for the IE equals zero. The fifth column contains the null vector for the 5 × 5 age-period matrix. 0). 2011 . k ¼ –3.000 × intercept ¼ 0.401 × cohort8 + 0. For the 5 × 5 case: –. and cohorts as the reference category. + 0.sagepub. periods. To compute the constrained estimates. 2008). The next four columns show the results for the same four constraints. period3 ¼ period4. to move from the solution for age1 ¼ age2 to the solution for period3 ¼ period4. k ¼ 14. I used the add-on program in STATA (cited in Yang et al. –1.20 so the dot product [c 0 · (solution vector)] is zero.134 × age2 + 0. but for the case where there are five age groups and 20 periods.

800 –5.134 0.300 –2.166 –0.134 0.098 1.200 2.163 0.200 14.200 14.037 –0.505 –0.025 0.631 1.063 –0.800 –1.200 –0.550 –5.013 –0.800 –0.450 2.200 6.800 0.800 –0.200 4.088 0.200 –0.700 0.088 –0.800 1.200 –0.800 –0.200 –1.950 –0.200 1.990 –0.800 2.050 0.308 0.800 –5.700 2.000 0.200 1.213 0.200 1.950 1.827 –1.766 –0.200 –0.267 0.200 –0.800 –7.200 2.800 –0.200 1.800 1.188 0.138 0.200 0.Table 2.550 –0.050 0.800 1.450 0.105 –1.200 –0.800 –5.800 –1.063 0.800 1.300 –0.950 3.390 1.200 2.800 –0.550 –2.450 1.000 –0.800 –1.050 –0.800 1.050 –4.134 1.200 –2.200 1.200 –0.444 –1.800 –0.760 0.800 –1.267 –0.025 0.868 –1.200 1.800 443 .437 1.800 1.800 –0.013 –0.200 4.200 –1.800 –0.200 0.000 0.050 –1.200 18.895 –1.969 1.800 –11.800 –3.895 –0.698 –0.200 1.800 –0.200 –1.936 –1.200 1.800 –0.134 0.550 –3.800 –2.376 –1.038 –0.238 0.800 –2.038 0.700 0.390 Effects 2.200 8. Analysis of the Data Generated in Table 1 for 5 Periods and Extended to 20 Periods as Described in the Text Five age groups 20 periods Null vector –0.800 –0.200 16.200 1.113 0. 2011 age1 age2 age3 age4 age5 period1 period2 period3 period4 period5 period6 period7 period8 period9 period10 period11 period12 period13 period14 period15 period16 1.050 –2.200 1.200 age1 ¼ age2 period3 ¼ period4 cohort7 ¼ cohort8 Downloaded from smr.427 age1 ¼ age2 intrinsic estimator period3 ¼ period4 cohort7 ¼ cohort8 Null vector –0.113 –0.com at William Paterson Univ of NJ on November 4.138 (continued) Five age groups and 5 periods intrinsic estimator 1.sagepub.268 0.200 12.295 0.990 –0.550 –2.800 –0.200 3.200 –0.800 –0.800 0.295 1.336 1.200 1.300 –2.

125 0.833 –0.548 –0.213 –0.950 3.167 4.875 –6.200 1.320 .950 5.333 –0.375 0.625 3.038 0.881 1.167 –0.071 0.875 –1.375 –13.456 0.800 –17.213 –0.063 –0.800 1.667 0.800 –15.002 0.375 1.727 0.776 1.267 –0.375 –0.450 5.038 –0.401 Five age groups and 5 periods Effects age1 ¼ age2 period3 ¼ period4 cohort7 ¼ cohort8 Downloaded from smr.164 –1.875 –1.800 –15.375 2.com at William Paterson Univ of NJ on November 4.360 –0.875 –18.625 –13.625 0.167 9.125 –0.375 –2.388 0.450 2.875 1.833 –4.875 –1.833 –6.288 –0. 2011 period17 period18 period19 period20 cohort1 cohort2 cohort3 cohort4 cohort5 cohort6 cohort7 cohort8 cohort9 cohort10 cohort11 cohort12 cohort13 cohort14 cohort15 cohort16 cohort17 –0.833 –0.375 0.088 –0.113 (continued) 444 intrinsic estimator 1.125 –1.625 0.286 –0.013 0.875 –11.833 –0.375 2.375 1.231 –1.013 0.333 0.875 –20.333 –0.188 –0.200 –1.375 –0.667 7.875 0.188 –0.333 0.167 0.875 –1.875 –22.263 –0.333 –0.333 –0.333 –0.238 –0.375 2.435 –0.625 0.430 0.625 0.667 –0.867 –0.625 0.063 0.625 –1.167 1.591 0.096 –1.862 0.833 –0.625 0.800 –24.375 2.452 –0.875 2.375 –0.625 0.625 7.163 –0.375 –8.659 0.000 0.299 –0.794 0.875 3.833 –0.333 –0.sagepub.375 2.625 9.523 0.800 –0.667 –8.875 –0. (continued) Five age groups 20 periods Null vector age1 ¼ age2 intrinsic estimator period3 ¼ period4 cohort7 ¼ cohort8 Null vector –0.625 2.476 0.Table 2.401 –0.138 –0.625 –0.134 0.333 2.113 –0.738 –0.833 –2.535 –0.375 2.625 0.875 –16.375 –4.643 –0.267 0.167 0.844 –1.134 0.292 1.625 5.163 –0.088 0.667 1.

238 0.117 0.625 0.625 0.433 0.475 11.213 0.625 11.125 –3.625 17.Table 2.com at William Paterson Univ of NJ on November 4.433 0.163 0.433 10.125 –2.625 19.125 –4.625 –5.475 0.138 0.154 11.625 –4.625 13.475 445 .019 –0.625 0. 2011 cohort18 cohort19 cohort20 cohort21 cohort22 cohort23 cohort24 intercept 10.625 0.049 –0.625 0.625 23.125 11.000 Five age groups and 5 periods intrinsic estimator Effects age1 ¼ age2 period3 ¼ period4 cohort7 ¼ cohort8 Downloaded from smr.625 –3.252 0.625 21.185 0.263 0.625 11.625 15.475 –2.188 0.625 0.086 –0.433 10.sagepub.000 10. (continued) Five age groups 20 periods Null vector age1 ¼ age2 period3 ¼ period4 cohort7 ¼ cohort8 intrinsic estimator Null vector 0.

446 Sociological Methods & Research 40(3) solutions using the IE. This does not match the putative data generating mechanism—but I could have used these parameters to generate the data (and nature might have). we see that each cohort is 0. the fit does not depend on the linear constraint used.713. then the absolute value of k. If this were the generating mechanism for the data in Table 1. however. For this constraint.029. It is consistent because for the data generating mechanism the two youngest age groups have the same effect. then only the age1 ¼ age2 constraint (of the constraints used) would have shown us how ‘‘nature’’ generated these data. k ¼ –3. which determines the distance between estimates. As long as the same data are analyzed.679 for the distance between the IE estimate and this generating mechanism. age group 1 (the youngest) is two units lower than age 2. k ¼ –.’’ I note that the different constraints produce quite different interpretations concerning the generating parameters. This fits the putative data generating mechanism in both the 5 × 5 and 5 × 20 cases. in the case of the period constraint (period3 ¼ period4). In terms of ‘‘substance.713 times the null vector for the absolute difference between the IE estimates and the putative data generating parameters). (4) Each of the solutions fits the data equally well—this also occurs in unreported analyses when I added random error to the cell entries. I use the term putative to emphasize that any of the mechanisms implied by any of the solutions could have generated the data. This is the quandary associated with using these constrained regression approaches to APC analysis. To move from the IE constraint to the age constraint. If the data were generated by ‘‘nature’’ the way I generated it. and that the age effect increases by one from the oldest to the next oldest and again by one to the next oldest and then increases by two for the two youngest age groups.sagepub. Comparing the results as we move from the analysis of data for a 5 × 5 age-period matrix to those based on the a 5 × 20 age-period matrix does not allow us to examine patterns across the multitude of possible models Downloaded from smr. k ¼ 15. the only constraint to produce the putative data generating mechanism (age1 ¼ age2) is the only constraint (among those used) that is consistent with the generating mechanism. that the first three periods are two less than the next two. which is one unit lower than age 3.713. 2011 .50 greater than the earlier one from cohort5 up to cohort9. would have been 15. In Table 2. from the IE constraint to the period constraint.679. k ¼ . the data could have been generated by the coefficients implied by any of the particular constraints used in Table 2. which is one unit lower than age 4. which is one unit lower than the oldest group. and from the IE constraint to the cohort constraint. For this particular generating mechanism (age1 ¼ age2) the IE does better than the period and cohort constraints that we used (k ¼ . For example.com at William Paterson Univ of NJ on November 4.

com at William Paterson Univ of NJ on November 4.713 in the 5 × 5 case.O’Brien 447 that can arise in APC analysis. Specifically. If I were to add error variance to these simulations. period effects. For example. many of these patterns depend on the following relationship: kv ¼ (-c01 ^2 =v0 c1 ) × v. during any specific era. note some interesting patterns. It is the extent to which the values in a solution vector based on a particular age-period matrix approach some set of fixed values as more periods of data are added. the value of k needed to transform the effect coefficient estimates for the age1 ¼ age2 constraint to those obtained using the IE constraint is –. (3) The values of k do not remain constant as we move from the 5 × 5 to the 5 × 20 situation. b I will use the phrase convergence as the number of periods increases in a specific manner.709 (about 3. cohort effects or period effect may be highly associated with smoking behavior. where c1 is one constraint and ^2 is b b the solution under a different constraint. Do the cohort effects that changed only for cohorts 5 through 9 have estimates that converge to some other value as we add periods? Do the period effect estimates based on the 5 × 5 age-period matrix converge to some new values as we add periods? Is there some sense in which we find that with more periods one of the constraints is more likely to tell us what nature was doing in the period we were studying. the estimated age effect coefficients are the same for both the 5 and 20 period cases.8 times greater). This occurs because unlike the traditional constraints that remain the same in the 5 and 20 period cases. This means that they do not converge to some other values. What sorts of changes in the solutions do we find as we move from 5 to 20 periods? In both the 5 period and 20 period simulations we find that (1) when the constraint is consistent with the generating mechanism. This concern assumes that analysts are actually interested in the age effects. Here kv represents the difference between two solution vectors. In the 5 × 20 case. (2) For each of the three traditional APC constraints. this value of k is –2. For example.sagepub. and cohort effects on rates during a particular era. I can. and as we will see. one based on c1 as a constraint and the other (^2 ) based on c2 . yet these very patterns are not likely to be associated with smoking behavior in the same way in future eras (after the original cohorts are replaced). however. the null vector (and thus the constraint for the IE solution) changes as the number of Downloaded from smr. we would find that the expected value of the estimated parameters would remain unbiased estimates of the data generating parameters. (4) The age effect estimates using the IE are not the same for the 5 period and the 20 period cases. 2011 . the generating mechanism is estimated correctly. for our data I ask whether the age-effects in the 5 × 5 age-period matrix (which remain constant across all 20 periods) converge to some other values when there are 20 periods.

448 Sociological Methods & Research 40(3) periods changes.2. Using the ZLT constraint for periods in the 5 period case. The data could have been generated by the mechanism implied by the constraint p3 ¼ p4 (see Table 2). I calculate k by taking the age constrained estimated coefficients minus the corresponding IE estimated coefficients and dividing each of the resulting elements by the corresponding null vector element.2. and cohort coefficients in APC models. we will obtain the b same k values as we obtained before: –. Conclusion Constrained estimation is the most common method used by those seeking to estimate each of the age. use k ¼ (-c01 ^2 =v0 c1 ) to calculate k in each of these b cases. Downloaded from smr. But theory and other data might convince us that the zero linear trend for periods or some other linear trend in periods is much more plausible.4) for the 5 period case and is (2.713 and –2. To the extent that this constraint is consistent with the data generating mechanism. however.2.2.2. there is an increase of . it should make the estimated age effects converge on the data generating age effects. (5) Typically.4. If that were the case. these age effects are estimated to increase by .2. Not surprisingly the ZLT period constraint is closer to the data generating mechanism in the 20 period case.2.964 for each of the succeeding age groups. and this is what we might expect in the long term for many sets of data.04 for the 20 period case.com at William Paterson Univ of NJ on November 4. We can. The age effects for the data generating mechanism are the same for the two youngest age groups and then are one lower for each of the succeeding age groups. then we might well use a ZLT constraint for periods (see note 9).4) for the 20 period case.4. period. depends on making a constraint that is consistent with the data generating mechanism.4. With these data the slope of the data generating period effects regressed on time is . If we had reason to believe that the period effects in the long run had a zero linear trend.2.sagepub. Using the ZLT constraint for periods in the 20 period case. Note that we might well make similar arguments for a ZLT constraint for cohorts.2.4.2. For the data in my simulation the data generating period effect is (2. For example if c1 is the constraint vector associated with the age constraint and ^2 is the solution vector associated with the IE.60 between age groups 1 and 2 and then to drop by . This. There is clear convergence to the data generating values for age.4. the assumption that there is no linear trend in periods would be incorrect.709.4 for each of the succeeding age groups.60 for the 5 period case and only .2.036 between the first and second age group and then drops of .2. The linear trend for that potential generating mechanism is strongly negative. however.4.2. 2011 .4.4.

and related to this. in some senses.O’Brien 449 Traditionally. 2008). Because the various traditional constraints. 2004. and the constraint imposed by the IE are all linear constraints applied to the X-matrix. however. Knight and Fu 2000). it is not surprising that the solutions resulting from these constraints share many characteristics in common.com at William Paterson Univ of NJ on November 4. It uses the null vector as a constraint (and this eliminates the temptation to examine the Y variable in order to set the constraint). period. the variances of the estimated coefficients based on the IE are smaller than for other constrained solutions. It is the solution with the shortest length from the constraint’s origin. Fu et al. arguably the major consideration should be Downloaded from smr. There are some special characteristics that the IE does not share with other constrained estimators. each constrained solution has the same intercept. Additionally. Its development as a general method for estimating the effect coefficients in APC models can be traced to Wenjiang Fu in a series of article (e. As I have emphasized. The use of this constraint in APC models and some of its advantages were examined by Kupper et al. but a different and more complex constraint is used for the IE. 2011 . Fu and Hall 2006. as constraints directly associated with specific generalized inverses (each constraint has its own associated generalized inverse). equality constraints on two of the effect coefficients have been used. showing many characteristics shared by all constrained estimators and some that are not shared.sagepub. and cohort effects separately. each set of estimates is subject to bias when we define bias as the extent to which the parameter values that generated the data are not the same as the expected value of the estimates under the particular constraint used in the estimation. When analyzing the same data set: All of the just identified constrained linear solutions lie on a line that is parallel to the null vector. These common characteristics include the following: With the constraint in place we can estimate each of the age.g. and all of these solutions are related to each other by a scalar times the null vector. the ZLT constraint discussed in this article. the IE could serve as the representative solution. and (3) using two small sets of data (one with 5 periods and one with 20) to show how these relationships occur in practice and some patterns that can change and some that remain constant as the number of periods is increased (at least for the data I used). 1985). This article’s goal is to offer a better understanding of the properties of constrained estimators in the APC context and to compare the conventional constrained estimator approach with the IE approach. (2) geometrically. This method has been highlighted for sociologists in two major articles by Yang and associates (2004.. Fu 2000. and the solutions are perpendicular to their constraints. (1983. I examined these estimators from several perspectives: (1) algebraically.12 With this in mind. each constrained solution fits the data equally well.

sagepub. Downloaded from smr. then one could justifiably have some confidence in the accuracy of this common set of estimated age. period. 2011 .com at William Paterson Univ of NJ on November 4. and it is reasonable to assume that there are no particular trends for periods or for cohorts. Declaration of Conflicting Interests The author(s) declared no potential conflicts of interest with respect to the research. there is a literature on methods that bypass the identification problem by not attempting to estimate the individual age. 1983:2804). The substance of the discussion in this article applies to these maximum likelihood (ML) estimation techniques as well as to the ordinary least squares (OLS) approach to estimation. of course. period. 2. Mason et al. and/or publication of this article. Then one can use these different constraints separately: ‘‘If the separate sets of estimates obtained by applying each of these various theoretically-based (a priori) choices for c are in close agreement. The basic model can also be estimated as a Poisson regression or as a logistic regression in a straightforward manner with a bit of notational change and. (1973) introduced this technique to sociologists in 1973. 2008). who note that there may be situations in which theory and previous literature allow the researcher to contemplate more than one set of constraints. the appropriate software (see Yang et al. authorship. and/or publication of this article. Of course. Then the researcher would obtain estimates under these two different constraints and see if they agree. Funding The author(s) received no financial support for the research. (1983). It is tempting to think that in some situations we can rely on theory and previous literature to make an educated guess about the constraint.450 Sociological Methods & Research 40(3) how well the data generating parameters are estimated. If we move outside of the tradition of constrained estimators. and cohort effects’’ (Kupper et al. and this may be the case in some situations. Notes 1. and cohort coefficients. Perhaps the most compelling suggestion is this from Kupper et al. authorship.’’ One possibility might be to impose a ZLT for periods and a ZLT for cohorts when there are a large number of periods and cohorts. these constraints should not be obtained by searching the data for constraints that ‘‘work. and using the ‘‘wrong’’ constraints can lead to highly misleading estimates.

1. Interestingly. Fu. 7. the Mazumdar et al. Wenjiang J. the model fit is the same no matter which generalized inverse is used. 5. period. Lawrence L. 12.’’ References Fu. 1985.’’ Journal of Chronic Disease 38:811-30. and Bryce 1980). and Thomas E. ‘‘Asymptotics for Lasso-Type Estimators. .’’ Technical Report. To produce a zero linear trend (ZLT) constraint for periods. if using Poisson or logistic regression. (2008:1706) certainly recognize that intrinsic estimator (IE) is constrained estimator: ‘‘Figure 1 also helps to illustrate geometrically that the IE may in fact also be viewed as a constrained estimator. I emphasize here that b represents the parameter vector associated with the process that generated the data. 2011 . 2006.. ‘‘Statistical Age-Period-Cohort Analysis: A Review and Critique. we can use the following linear constraint: (n – 1) × period1 + (n – 2) × period2 + . 10. and Wenjiang Fu. Sensitivity and Asymptotics. –2) is unique up to multiplication by a scalar. Downloaded from smr. 11.’’ Statistics and Probability Letters 76:1925-129. Kupper. 2000. −2 × log-likelihood is the same no matter which generalized inverse is used (here I assume that we are using just one constraint so that the model is just identified).’’ Communications in Statistics— Theory and Method 29:263-78. and cohorts each equal zero. 2004. I could have placed the constraint in the fourth row and after obtaining the inverse replaced the fourth column with zeros (Mazumdar. so that it is easy to calculate the effect for the reference category. Janis. and Bernard G. The null vector of (0. and Peter Hall. 6.sagepub. Keith. Fu.. . Michigan State University. Wenjiang J. 8. Analogously. Azza Karmous. 9. With effect coding. that is. (1980) system does not yield the Moore-Penrose matrix. 2000. The only difference between dummy variable coding and effect coding is the minus one coding for the reference category rather than a zero. where n is the number of periods and the nth period is used as a reference category.’’ The Annals of Statistics 28:1356-378. Greenberg. + 1 × period(n – 1) ¼ 0. Rohan. ‘‘Ridge Estimator in Singular Design With Applications to Age-Period-Cohort Analysis of Disease Rates. Knight. 4. Li.com at William Paterson Univ of NJ on November 4.O’Brien 451 3. Yang et al. This is certainly the case for the situation in which the period effects across time is of the form of a sine wave. Estimability. Peter Hall. ‘‘Asymptotic Properties of Estimators in AgePeriod-Cohort Analysis. ‘‘Age-Period-Cohort Analysis: Structure of Estimators. Department of Epidemiology. the sum of the coefficients for age groups.. Wenjiang J. but since it implements the same constraint it yields the same set of solutions. I arbitrarily chose to place the constraint in the last row and then replace the last column of the resulting inverse with zeros. Joseph M.

and Kenneth W. Smith. ‘‘The Age-Period-Cohort Conundrum as Two Fundamental Problems’’ (Quality and Quantity). He specializes in criminology and quantitative methods. Mason. Yoshizawa. Wenjiang J. Yang. 1992. S. Rodgers. Yang. Sam Schulehoffer-Wohl. Yang. O’Brien is a Professor of Sociology at the University of Oregon. Land. 2004. Yang. Carl N.’’ American Sociological Review 47:793-96. New York: Cambridge University Press. Teukolsky. Linear Models. UK: Basil Blackwell. Fu. and Bernard G. 2004. 1971. Land. and Kenneth C. and G. 111-19 in Sociological Methodology. and Cohort Effects’’ (Sociological Methods and Research with Ken Hudson and Jean Stockard). Oxford. Stolzenberg. Joseph M.’’ Pp.452 Sociological Methods & Research 40(3) Kupper. ‘‘Some Methodological Issues in Cohort Analysis of Archival Data. and Kenneth C. Salama. 1973. R. Vetterling. ‘‘Age-Period-Cohort Analysis: An Illustration of the Problems in Assessing Interaction in One Observation Per Cell Data. Mazumdar. Winsborough. ‘‘Correspondence Between a Linear Restriction and a Generalized Inverse in Linear Model Analysis. Ibrahim A. Li. M. Poole. C. 75-110 in Sociological Methodology. Downloaded from smr. Bryce. S.. His recent publications include: ‘‘Can Cohort Replacement Explain Changes in the Relationship Between Age and Homicide Offending?’’ (Journal of Quantitative Criminology with Jean Stockard). William M. He has published extensively in both areas. H. 1983.com at William Paterson Univ of NJ on November 4. Searle. Saul A. Flannery. and ‘‘Still Separate and Unequal? A City Level Analysis of the Black-White Gap in Homicide Arrest Since 1960’’ (American Sociological Review with Gary LaFree and Eric Baumer).’’ Pp. Wenjiang J. UK: Basil Blackwell. 1980. Mason. Oxford. Karen Oppenheim. R. ‘‘A Mixed Model Estimation of Age. Bio Robert M. ‘‘Reply to Comments by Smith. Period. Mason and Fineberg.. M. ‘‘Response: Cohort Analysis Redux. ‘‘A Methodological Comparison of Age-Period-Cohort Models: Intrinsic Estimator and Conventional Generalized Linear Models. 2011 . ‘‘The Intrinsic Estimator for Age-Period-Cohort Analysis: What It Is and How to Use It. edited by R. edited by R. 1982. Stolzenberg. H.’’ The American Statistician 34:103-05. Brian P. Press. William H. Greenberg. Lawrence L. Herbert L. 2008. New York: John Wiley & Sons. C..sagepub.’’ American Sociological Review 38:242-58. Numerical Recipes in C: The Art of Scientific Computing. Janis.’’ American Journal of Sociology 113:1697-736. and William T. Willard L. Fu.’’ Communications in Statistics—Theory and Method 12:2779-807.