d0003 R PDF

You might also like

You are on page 1of 23
Note: COWLES FOUNDATION DISCUSSION PAPER NO. 3 Cowles Foundation Discussion Papers are prelim- nary materials circulated privately to stimlate private discussion and critical comment. Refer- ences in publications to Discussion Papers (other than mere acknowledgnont by a writer that he has access to such unpublished material) should be cleared with the author to protect the tentative character of these papers. Estimation of Relationships for Limited Dependent, Variables Janes Tobin September 15, 1995 (As revised July 25, 1956) Estination of Rrlgtionhins for Lintted Dependent Vartables “Vbat do you mean, Jess than nothing?" replied vilbur. "r don't think there ie ony such thing as Jes than nothing Nothing 4s cbsolutely the linit of nothingness. It's the lowest you can go, it's the end of the line. How can scme- thing be less than nothing? If there vere something that vas less than nothing then nothing vould not be nothing, it would be comething--even though it's Just a very little bit of something. But if nothing is nothing, then nothing has nothing that 1s less than it 4s." BE. B, White, Charlotte's Web (New York: Harper, 1952) p. 28. In economic surveys of householés many vartables have the following characteristics: The variable has a lover. or upper. limit and tekos o2 the Limiting value for @ substantial mmber of respondents. For the remaining respondents, the variable tales on a wide range of values above, or below, the init. ‘The phenomenon is quite famtliar to students of Engel curves. rele- ‘tionships showing how household expenditures on various categories cf goods vary with household income. For mony eategories--"Iuxuries"-- zero expenditures are the rule at low income levels. A single straight line camot, therefore, represent the Engel curve for both low and hich Aneomes. If individual households were identical, except for income level, the Engel curve would be e broken Line like OAB in Figure 1. But if the critical tneone level OA vere not the saue for all house- holds, the average Engel curve for groups of households would look like the curve OB. A similar kind of effect occurs under rationing of @ consumers' good. The ration is an upper Mimit; many consumers choose to take thetr full ration, but sow prefer to buy less.* * For theoretical exposition of the effects of aggregate denand functions of lower or upper Minis on individual expenditure in coubina- tion with differences in tastes exong households, see [4] ana (6] end the literature there cited. fis a specific example, many-- indeed, most--households would report zero expenditures on automobiles or mejor household durable goods during Figure 1 Household 4, Expenditure S A Household Tnccise any given year. Atong those houscholds who made ary such expenditure, there vould be wide variability im emount.* * For figures on frequency of purchases and on the distribution of amounts spent anong purchasers, see (1], Part IIy Supplenentery Tables 1, 5, and 10. In other cases, the lover init 4s not necessartly zero, nor 1s it the seme for all Louscholés. Consider the net change in a bouse- hola's holding of liquid essets during a year. This variable cen be either positive or negative, fut 4% cannot be smeller thon the ncga- tive of the houschold's holdings of liguid essete at the beginning of the year; one cemot Liguidate more assets then he ouns. Account should be taken of tho concentration of observations at the limiting valve in statistical votimation of the relatioaship of a limited variable to otucr veriables cnd in testing hypotheses sbout the reletionship. An cxplanstozy vorisble in such a relationship may be expected to influence both the probsbility of limit responses ana the size of non-Limit recponses. If only the probability of linit end non- Limit responses, without rogard for the value of nom: gponses were to be explained, probit enalysis provides a suitable statistical model. (See [5].) But it de inefficient to throy avey information on the value of the dependent variable vhen it 4 availeble. If only the value of the variable were to be exsisined, ir th: were no ¢: tion of observations at a limit, multiple regression would be sn cypro priate statistical techniqe. Fut vhen there is such concentrati the assumptions of the multiple regression model ere rot realized. Acchri- ing to that model, it should be tory variables for which the expected ve: 4s its limiting value; and from this expected volu:, as fron other expected values, it should be possible to have positive deviations. Ahybria of probit analysis ana miltiple regression seens to Ye called for, and it is the purpose of this paper to present such a Hote: te to have values of the ue of the dependent vantebhie negative as well. as ‘The Model, Let W be @ limited dependent veritable, with « lever limit oJ }. The limit may not be the ecre for oll Louseholés in tie populstie: Let ¥ be a linear conbination of the independent variables (X.%p)-+-X,), to which W is by lypothesis related. @) Y= By + BX + Bk, + +++ BX, Households differ from eech other in their behavior regarding W for veasons for vhich differences in the independent variables X and tt lower Limit L do not fully account, Those other differences are) teken to be random and to te reflected in &, © vendon variable mean zero and stendard deviation o, distributed normally ever 1 population of households. Fouscholé behavior ts then essrzy as follows: to be (2) W W 4 L (y-8 <2) Y-e (¥-221) Let P(x) represent the value of the curulstive uxit-normal dj wution function at x; let Q(x) = 1 - P(x); ist 2(x) de the ‘the unit-normal probability density function at x. ‘The dist: W-L may be derived fron the distribution of %, as are For given values of the Linear combination Y ond the limit L, (3) Pr(W = Lly, L) = Pr(e>y¥- 1) = ah. (4) Pe(W > x > L{¥) = Pr(¥ - €> x) = Pr(e1) The corresponding probability density function is: «) t(xs Y, 1) = 3H) (x>1) The expected value of W for given values of Y and L 4s: BW; ¥, 1) = BaLSE) f ¥ (tS *ax : ou yen =raty +¥ ad B(x)ax + 0 a F B(x) axe Since - xB(x) = B(x) = S00. | ve never (1) BOW; x,t) = LQ g (Ty ¢ oa Xo ty ‘The Maximum Likelihood Solution A sample includes q observations of households for whom ‘i at the Limit L. Bach observation consists of a limit Lj, to which the dependent variable Wj is equal, and a set of values of the Andependent variables (X},, Xiqs-+-Xjy), where 4 19 a subscript ‘to denote the observation and runs from 1 to q. A semple also W 48 above the limit 1; each one Xj) where J runs from ds includes r observations for which may de described a5 (Wy Lys Ky» Xp5r-- 1 tor Bate Tet (05. 817 851 :+-dyy 8) de estimates of @, 4, 2 ab Let tp = yes eg + a, Ky + 8 Xp + ---Oy Mis end let Ty = Ye = 8 + 9 My * 2 %os* 5. ‘The likelihood of a sample is: a (8) # ayety Gy a) = TE PCS Ys YD) a Flys Yyb)) Boy os = wey Reape ae ayaa WY att ap) 2 B(r, - als) ‘Te natural logarithm of $, ) tab = (a5) ayy os age 8) = i fm Q(t} - allt) + rina - Bam en -3 F (15 > at) aed ie -6- Let Xj and X} be identically 1 for ali 4 and J. ‘Then setting the derivatives of ¢* equal to zero gives the following system of m+ 2 equations, ty. E LSD a. Bis, 8%" ge. Q(T} - a) gel aH s)%e5 = 0 (k = 0,2,2,...m) (20) i abet ty serge eat fade 7 2, “ey ai) te z * (I, - eW,) Wy = 0 gad NS ‘These equations are non-linear. ‘The quantity = is tabulated as 4, in [2], pp. 185-68, where the argument for the teble is x +5. The matrix of second derivatives, obtained by differentiating (10) 1s given by (11). Here “aan (*) is the derivative of -4,.,(x), end may, like 4,4,, be found by entering the tables of [2], pp. 185-86 with the argument x + 5. (xy - ew) LX, pa ests (i,t = 0,1,...) de a twa “yoo” TE, Bien OD qa) Newton's method (See [3]) for iterative solution of (10) may be applied as follows: Let (2f0), 06°, 260) 96) ) ve a trial solution, where, for notational convenience, 9.) hos previously been written as simply a. (The choice of an initial trial solution will be discussed below.) New estimates ( 960) represents vat + bag0°) + da,,...0(©) « daya() 40,11) can be toma wy solving the set of m+1 linear equations (12) for the a, where all the oe ere assumed to be linear between the trial solur ‘tion and the real solution. 4 (0) + ange sesehe) + aayr0ll + Ag) (al) 00), 120) 02) +E ante (af) 260). ..0(°),0(0) ) = 0 . 2nd (ic © 0,1,2,--.m#1) (a2) = ag te Cal? 000), 06), 060} - tt (060? n°), .. 060, a0) ) ‘The process may be repeated with the new estimates as provisional estimates until the 4a are negligible. If the finel estimates ey are used to evaluate the matrix of second derivatives (11) at the point of maximum likelihood, the negntive nverse of that matrix gives large-sample estimates of the variances and covariances of the estimates a, around the corresponding popula- tion parameters. Tests of Hypotheses ‘Hypotheses about the reletionship of W to one or more of the independent variables X may de tested by the 1ikelthood-ratio method. Consider for exemple, the hypothesis that 6, = Bp +P ‘This is the hypothesis that neither the probability nor the size.of non-zero responses depends on the X's. According to the hypothesis, there remain only two paremeters, 6, and 0, to be estimated 60 as to maximize (9), which now becomes: (3) *(2,0,0,.+-0,a) = 2 fn Qa - aW{) 2 2Esnonerima-} E ( Hy) 2 Seay Oct ay be equations (30) cirilorly simplifiea by putts: iB all of equal to cero. If (15) fs evaluated wits these solutions, than the logarittm of the Ln) Aritio } ie the aifference pptween mn it ie maximized without the (13) an the velve of (5) constraint of the statistic -2 tog 2 is for large sonples spzroxinotely distrizuted by chi-square with i degrees of freedax. Ts simile fsshion other hypotheses 2! sat subsets of the P's say be tostea, Initial trial estimates: The speed of convergence of Steration by Newton's nets0a| depends of couse, on the choice cf the initia) trial estinates. Tue following proce @ Sor findiag initial estimates relics pa linear apzroxizetion of the furction words on a quaératie appro: mation of Ja Q proximation converts the first m+1 equation ef (19) inte Linear equations in the for given a, These equations may be solved to give the o, os ncar functions of a. a these colxtdons ee substitute? fa the ano Ab becomes a quedratic equation Jet x, he the wait somal deviete such that 9 (x) = ich the vabiable Tpe> the Proportion of anes dn the simple for W takes on its Licit-value. A Mnecm ape sal) tx Q(x) 4 Sain = 4 gay Gd = te ade -ip- Renenbering that A‘). (x) = -4 ‘nin (ak) Soin 0) © Doen Q) + Min OQ) - * Man () (x), we have: = A + BK Substituting (14) in the first m+ 1 equations of (10) gives a 2, Me + Pay Xs Bay Xa Mg t+ Ba Xo Ma” Ba Wy Xa) r “EG o%oy Mes Ay Beg eBay Bad 7 8H Bas) Qs) 7 adn fees ay 4 af ra (ke0.1,2,. m) Solving (15) gives numbers & and a such that: (16) Arg the ; (x = 0, 1, 2,.. a) The final equation of (10) is, after using the approximation of (14): z a (7) +t i wh Xj iid 2 Xa “| i a l qe Fi% By - BE Me vy : - [Ee tee iales \r? isl t fel n (16) is substituted in (17), it becomes a quadratic equation ina. The solution of (17) may then be used in (15) to obtain initial trial estimates of all the coefficients. ‘An Exemple For purposes of illustration, an exemple has been vorked out using data from the reinterview portion of the 1952 and 1953 Surveys of Consumer Finances conducted by the Survey Research Center of the Unir versity of Michigan for the Board of Governors of the Federal Reserve System.* * A brief general description of the concepts and methods of the annual Surveys of Consumer Finances is given in Board of Governors |of the Federal Reserve System, "Methods of the Survey of Consumer Finances," Federal Reserve Bulletin, Mly, 1950. For a more complete treatment, ‘see eiso flein, Lb. R-, editor, Contributions of Survey Methods to Economics, New York: Coluubia University Press, 1994. Reports of the 1952 and 1953 Surveys are given in Board of Governors of the Federal. Reserve System, 1952 Survey of Consumer Finances, reprinted with supplementary tables from Federal Reserve Bulletin, April, July, August, and Septenber 1952, and Board of Governors of the Federal Reserve System, 1955 Survey of Consumer Finances, reprinted vith supplementary tables from Federal Heserve Bulletin, Merch, June, July August, and Septenber 1950. fam grateful to the Survey Research Center and to the Board of Governors of the Federal Rescrve Systex for the unpublished data used below, The paper, os vell as the illustration owes its inspiration to a senester I was enabled to spend at the Survey Research Center in 1953-1954 by the hospitality of the Center and its program of post+ doctoral fellowships financed by the Carnegie Foundation. -12 ‘The data refer to 755 primary non-farm spending units* vho were interviewed twice, once in early 1952 and once in early 1955. The * Of the 1036 spending units in the reinterview sample, these 755 have been the subject for calculations for other purposes and ere ‘therefore 2 convenient group to use in this analysis. Excluded are all spending units who had one or more of the following characteristic (a) fam; (b) secondary. i.e. not the owner or principal tenant of ‘the dwelling; (c) total income for the two years 1951-52 zero or negative; (4) not ascertained as to age of head of spending unit, amount of expenditure on durable goods during 1951-52, or amount of Migquid asset holdings in early 1951. In addition, one extreme observation was excluded, vhere the spending unit had such a low positive two year income that the ratio of durable goods expenditure and, espe- cailly, liquid asset holdings to incowe were very high. frequencies, averages and other statistics for the reinterview somple should not be taken as representative of the population of the Unites States. ‘The Surveys of Consumer Finances do collect data on distributions of incone, liquid assets, and durable goods purchases that are representative of that population; tables on these distri- butiens my be found in {1]. But the reinterview sample, on which the -ealeuations of this paper are based, fails to be representative Insofer as it emits spewding units vho moved betveen the two surveys. woreover, Witte ON ta, “Sounts of sampled spending units, without allowance for the fact that the sampling design gave some spending units greater probablilities of being included in the sample than others. The purpose of this example is not to estimate popula- tion frequency distributions, but only to examine the relationship of durable goods expenditure to age and Liquid asset holdings within this sample. It 4s not necessary to consider here how the relation- ship exhibited in this sample differs from the one thet would be ex- hibited in a complete enumeration But it may well be that the sample a gives unbiased estimates of the parameters of the relationship even though it gives biased estimates of the separate frequency Aistributions of the variables ‘he variables are as follows! Gisposabl,income. Durable goods expenditure is the two-year sum of outlays, net of trade-ins or sales, for cars and major houser hold appliances and furniture. Two-year disposable income is the sum of the two annual incomes reported by the spending unit less estimated federal income tex liabilities. Both expenditure and income were reported for 1951 in the interview in early 1952, and for 1952 in the second interview, in early 1953. Since expenditure is necessarity zero or positive, and since zero and negative incomes have been excluded, the ratio is necessarily zero or positive. x fae_of the head of the spenling unit, as reported in 1953, on the following scale: a 2 3 4 3 &5 or more years: 6 Bat asi to.195)-52 total disposable incume, Liguid asset holdings include bank deposits, savings and loan association shares, postal savings end government saving bonds. olhe In this exemple, the lover limit L is zero for all cases. Teble 1 shows the basic data. xel = ‘Table 1 Sums of Squares and Cross Products 183 limit observations Xtal xy XS w 183 82h 4056 102,15 552.03 402.3333 ° ° ° o 592 non-Limit observations X22 xy Xp Ww 552 196 8060 168.06 TL.Sh — 255.67H0 61.49 207.598 20.559 15.115087 ‘Teble 2 presents. the estimates of the parameters obtained by the initial epproximation and reports the successive iterations leading to the maximum likelihood estimates. Eetimates are shown also in Teble 3, o2 the assumption thet there fs no relation between W and ‘Mgpid asset holdings x, “15+ In the epproximation used to obtain initial trial values, ‘the function - Bh) Q (x) was approximated linearly about the point = -67, 60 that Q (xq) = .25, the proportion of non-zero cases in the sample. ‘Tis the ounstatite A and B in (15) and (17) were equal to ~.16003 and -.75771 repectively, Table 2 Iterative Estimation of Paremeters a, a, a ‘O 2 Initial triel values 1.326 ~-2200 10330 7-984 First derivatives “4.398 21.759 72.812 +898 Second derivatives a, ~680.557 8, -2542.416 -10,805.485 9g -258.b1 — -1128.653 -535.225 a 61.bN9 207.598 20.559 21.772 Indicated changes 0152 ~.00507 -00199 0376 Second trial values 1.5407 ~-2251 0350 8.022 First derivatives -.047 292 064 002 Second derivatives a, -680,260 8 -2549.666 -10,795.522 @, -238.26 -1,127.900 -535.895 a 61. bag 207.598 20.559 21.688 Indicated changes .0015 00037 4.00001, = .00064 Final estinates 1.3392 —.22h7 0350 8.028 Standard errors (.118) (.0295) (0895) (.252) -16- Table 3 Iterative Estimation of Pareneters Assuming that B, = 0 ay a a Initial trial values 1.337 +219 8.040 First derivatives 2.841 16.124 -.001 Second derivatives: @,-680.19 072541 428 10,799.12 a GL. 207.598 21.652 Indicated Changes -010 008 -.010 Second trial values L.BHT 225 8.030 First deriuatives 7 456 2.017 +116 Second deriviatives:a,~680.179 52559.988 #10,'790.472 a 61.bhg 207-598 “21.674 Indicated changes 001 0003 -.005 Final estimates 1.387 225 8.030 Standard errors (7) (028) (.252) Estimates of the variances and covariances of the parameter esti~ mates can be obtained from the negative of the inverse of the final fatrix of second derivatives. These are shown in Teble 4. The corresponding standard errors of the coefficients are given in the finel rows of Tables 2 and 3. -y- Table & Estimated Variances and Covariances of Parameter Estimates % a By a % +.0159 ay ~.00318 4.000867 ees, 8 +.000880 =.60085% +.00245 a + 00987 ~.00115 +:0004:70 +.0635, On assumption that 6, = 0: 8 a a % +0136 a, .00302 +,000784 a +.00970 = .00106 +.0635 The size of the standerd error of 8 indicates that the hypothesis that By = 0, that there is no net relationship between expenditure and liquid esset holding, cannot be rejected. This hypothesis can also be tested, with the same conclusion, by’ the likelihood-ratio method. At the point of meximum likelihood, un- restricted by this hypothesis, q¥ in (9) has the value 722.5 -"2 1m ax. ‘The final estimates in Table 3 correspond to the point of maximm liklihood restricted by the hypothesis that By = 0. At this point g* nas the value 721.8 - “in on. the statistic - 2104 is thus equal to 1.4, which is not a significant value of chi-square with one degree of freedom. 1B. A teot of the hyputnesis that neither age nor liquid asset hoidingferrect on expenditure on durable goods may also be made vy the likelihood-ratio method. Assuming, in accordance with the hypothesis that p, = 6, = 0, the values of @, and a that maximize (13) are found to be .4839 and 7.720, For these values, + 32 in 2x 4s equal to 692.7. Hence -2 1nd is equal to 59,6, a significant chi-square for two degrees of freedom, The hypothesis must be rejected. Thus this test, as well as the size of the estimated standard error of a, indicates a significent relationship of durable goods expenditure to age. The relationship of W to X, and Xp, os estimated in ‘Table 2, is shown in Figure 2, as the broken line ABC. The expected value of W implied by this relationship may be computed from (7) in the manner illustrated in Teble 5. These points are also shovn in Figure 2, For comparison, the least squares multiple regression of W on xy end X has also been plotted. The estimated effect of liquid asset holding X, has been illustrated vy drawing two graphs relating W to X,, the first, Figure 2-a, on the assumption that X50 and the second, Figure 2-b, on the ascumption that X, = 2. ‘The expected value locus, egtimated by the method of this paper 16 nonlinear, It is alvays above the broken line ABC, asymptotic to AB at the left where the probability of not buying (Ww = 0) spproachés zero, and asymptotic to BC st the right where the pro- ability of buying (W> 0) approaches S82 Multiple regression epproximates this non-linear locus with a linear relationship. As Figure 2 shows, the approximation is fairly close for the central Table 5 Calculation of Expected Values 4a OIDUEUD HO qe ® Ho Xp=2 I= 1.3392 | Calculated Calculated T= 1.ho9e Calevlated Caleulated ~.22N7% | probability of | 2(I) ee Value ~22NTE Probability of | Z(I) sana Value Buying EW) = IP2Z. Buying 2(¥) = peg P(X) 8.082 P(I) 022 1.3392 “32 1.hoge -1s8 +180 VAs -1NT 1.1845 “197 2155 123 +9598 +2528 11 7102 “BOL +304 +108 082 +5204 +350 088 +064 +2857 +385 +070 209 20610 398 205! OT ~.1637 +393 -0h0 026 ~ 3884 +370 029 “19+ Figure 2-a - Regression -20, range of values of the sample. But outside the dentral range ‘there can be large discrepancies. There are indeed conceiveble values of the independent vartebles for which multiple regres- sion would give negative estimates of the expected value of W. ‘It is true that the absence of negative observations in the sample tends to keep the regression above the axis until extreme velues of the independent variables are reached. But this pro- tection is purchased at the cost of making the regression line 60 flat that expenditure is under-estimated at the opposite end. These discrepancies could be important in predicting expenditure for extreme cases or for aggregates which include extreme cases. Q) tel 3) (4) {5) (6) -21- Board of Governors of the Federal Réserve System, 1953 Survey of Consumer Finances, reprinted with supplementary tables from Rederd Reserve Bulletin, March, June, July, August and Septenber 1953. Cornfield, J, and Mantel, N., "Some New Aspects of the Application of Meximm Likelthood to the Calculation of the Dosage Response Curve , esi of the American Statistical Association, 45 (1950), 210 Crockett, J. B. and Chernoff, H., "Gradlent,Methods of Maximization," Pacific Journal of Mathematics, 5 (1955), 33-50. (Reprinted es Cowles Commission New Series Paper No. 92.) Farrell, M. J., "Some Aggregation Problems in Demand Analysis," Heviewof Economic Studies, 21 (1954) 193-205. Tobin, J., "The Application of Multivariate Profit Analysis to Economic Survey Data," CFD? No. 1. Tobin, J., "A Survey of the Theory of Rationing," Econometrica 20 (1952), 521-553.

You might also like