You are on page 1of 5

Maximum Likelihood Estimation in Random Coefficient Models

Author(s): Warren T. Dent and Clifford Hildreth


Source: Journal of the American Statistical Association, Vol. 72, No. 357 (Mar., 1977), pp. 69-
72
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2286907 .
Accessed: 20/05/2014 07:37

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.

http://www.jstor.org

This content downloaded from 91.105.96.157 on Tue, 20 May 2014 07:37:54 AM


All use subject to JSTOR Terms and Conditions
MaximumLikelihood in
Estimation
RandomCoefficient
Models
WARRENT. DENT and CLIFFORDHILDRETH*

Previous Monte Carlo studies examining properties of estimators 2. THE MODEL


in random coefficientmodels have been hindered in part by com-
putational difficulties. In particular, determination of maximum Considerthe modifiedlinear model
likelihood estimators appears sensitive to the computational algo- k
rithmused. In a small Monte Carlo experiment,several distinctly
motivated algorithms are examined with respect to accuracy and
yt= Zxtj(j + vtj), t = 1, ..., n (2.1)
j=l
cost in searching for global and local maximum likelihood pa-
where yt and xtj, j = 1, ..., k representthe tth obser-
rameterestimates. A noncalculus oriented approach offerspromise.
When compared with other estimators, maximum likelihoodestima-
vations on a dependentand k independentvariables; #j,
tors, so determined,appear to be statisticallyrelativelyefficient. parameters; and
j = 1, ..., k are unknowncoefficient
KEY WORDS: Random coefficientmodels; Maximum likelihoodesti- Vtj, j = 1, ..., k are unobservable randomnormaldis-
mation; Nonlinearoptimization;Numericalaccuracy. turbanceswithmean zero and variancesa, j 1, ..., Ik,
1. INTRODUCTION distributed identically and independently over the index
t (time or cross section,t = 1, .. ., n).
Variableparameter econometric modelsare experienc- In obvious matrix notation the model may be repre-
ing theoreticaldevelopmentand diverse application sented as
[5, 11, 17, 19, 20, 22, 24, 26, 27]. For the randomco- y = X +t
modeldevelopedin [9], a varietyofestimation
efficient
techniquesnow exists,none of which,however,yields where E(?) = 0; E(ee') = 0 = (&ij); 0,t = 0; s # t;
convenient analysis.Most Ott = Ei=, xtj2aj. The likelihood associated with the
teststatisticsforspecification
estimatorsdevelopedto date are knownto be consistent, observationsyi, ..., y. is given as
and some are unbiased.Asymptotic distributionsgen- = (2X)-n/2j | [
2
erallyarenotavailablealthoughthemaximum likelihood (, a) -2

*exp - 2(Y - X)'0-1(y -X)


estimator(MLE) is knownto be asymptotically normal
[23]. where ' = (ai, ..., a,k). At the optimum
Thus,ifMLES are computationally feasibleandreason-
ably accuratein finitesamples,theyare attractivefor = (X'0-'X)-'X'E-'y (2.2)
thesemodels.Froehlich[6] experimented withseveral so that the relevant portion of the log of the concen-
estimatorsapplied to artificialdata. His maximum tratedlikelihoodis
likelihoodestimateswere obtained using a Newton
Raphsonsearchprocedure.In nearly20 percentofthe n k n k

cases,the procedurefailedto convergeand, whenesti- L(a) =-E log (E xtj2oaj)


- (st2/E xtj2at) (2.3)
t=1 j= j=1
mateswereobtained,meansquareerrorsappearedlarge.
Since the likelihoodfunctionin this case may have where st = t - X(t)'5 and X(t)' is the tth row of X.
regionsof different concavityand morethan one local
maximum, it seemedpossiblethata morediscriminating 3. LIKELIHOODOPTIMIZATION
computational procedure mightlead to differentresults. The domain of the parameters in (2.3) is the non-
Three alternativealgorithmswere applied to data negative orthant of k-dimensionalEuclidean space, so
similarto Froehlich'swiththe resultthat none of the that constrainedoptimizationprocedureswould be ap-
proceduresfailedto converge andthatestimates generally propriate.Box [1, p. 72] has shownthat, undera square
closerto true parametervalues wereobtained.On the root transformation, the problemmay be convertedto a
whole,the MLES had smallersamplemeansquareerrors formatwhichinvolvesunconstrainedoptimizationwhere
thanotherproposedestimators. These resultsare given no additional local optima can be introduced.Accord-
in Section4. ingly,by settingao = qj2, j = 1, . .., k in (2.3) we were
* Warren T. Dent is VisitingAssociate Professor,Department of Economics, able to apply three unconstrainedmethods of optimi-
Universityof Wisconsin,Madison, WI 53706. CliffordHildrethis Professor,De-
partmentof Economics, Universityof Minnesota, Minneapolis, MN 55455. The
generalapproach of thisnote was proposedin an unpublishedpaper [8] by one of ? Journalof the American Statistical Association
the authors and is similar to a procedureadvanced earlier by Rosenberg [18]. March 1977,Volume 72, Number357
UsefuldiscussionswithJ. Frank O'Connor are appreciated. ApplicationsSection
69

This content downloaded from 91.105.96.157 on Tue, 20 May 2014 07:37:54 AM


All use subject to JSTOR Terms and Conditions
70 of the American
Journal StatisticalAssociation,
March1977
zation. These methodsare chosento incorporatevarying available via Lemke's method [13], and we find the
degreesof informationabout the likelihoodsurface. published FORTRAN algorithmof Ravindran [16] most
Fisher's method of scoring [15, p. 366, 24, p. 527] suitable forthis purpose. Utilizingthis particularcoding
utilizesthe Hessian of second partial derivativesatndthe seems only slightlymoiredifficultthan determiningthe
vector of scores (first partial derivatives) to aid in ordinarytruncatedleast-squaresestimatorof a in (3.1).
curvature determination.A double precision FORTRAN Froehlich'sMAlonteCarlo study [6] indicatesthe value
code which evaluates the inverseof the Hessian at each of carryingthis processonc step furtheras suggestedby
iterationand which utilizes the step size and stopping Theil and M\ennes[25] and estimating the diagonal
rules suggestedby Chow [4, p. 111] was developed. elementsof ', the covariaiicematrixof w in (3.1) from
Among methodsthat use only gradients (firstpartial
derivatives), Himmelblau [iO, Ch. 5] has shown that w = t - Ga .
the Davidson-Fletcher-Powell(DFP), or Fletcher-Powell, The diagonal estimate of W is given as the Hadamard
algorithmis superiorin performanceover a wide variety product
of selected functions. The version of this algorithm 2= I*(Tww')
(converted to double precision) presented in [11, pp.
355-366] is employed in this study. Both of these and the restrictedgeneralized least-squares estimator
methodsare guaranteedto convergeto a local optimum. a (2) is then definedas the solutionfor a in
Brent [2, Ch. 7] develops an optimizationprocedure,
PRAXIS, which is noncalculus orieintedin the sense of mina(t - Coa)'"(f- Ga) , a > 0
requiringno informationon derivatives.The algorithm
has performedvery well in comparison with other 4. MONTECARLOSTUDY
methods over ill-conditioned test functions [2, pp. In his analysis Froehlich[6] examinedseven different
148-155]. The PRAXIS procedureis guaranteed to con- estimatorsof a on five distinctX matrixstructuresfor
verge to a local optimum,but due to its particularcon- sample sizes of n 25 and 75 with k = 3. Froehlich
struction, has greater chances of selecting a global experienced difficultyin applying a Newton-Raphson
optimumthan the aforementionedalgorithms.A double search procedureto optimizationof the likelihoodfunc-
precision FORTRAN version oftheALGOL routinepresented tion, noting failure to converge in approximately 20
in [2, ?7.9] was used in this study. percent of the simulation trials. Among the remaining
The three algorithmsexamined require (i) a starting six estimatorsa (2), just described,appeared to be the
guess for the optimal solution aML, and (ii) efficienitmost efficient, followedby a, also describedin Section 3.
computationof L givena vectorofvalues a. Withrespect The maximum likelihood estimates appeared to -be
to the second requirementwe utilized accuracy-preserv- sensitiveto the size of individual true ai values. Since
ing and numericallystable Householder transformations unmodified Newton-Raphson procedures are unlikely
[3, 7] to computethe Aitkenestimator(2.2). The trans- to be successfulwhen applied to highlynonlinearfunc-
formationapplies directlyto the design matrix 0`X tionssuch as the likelihoodunderexamination,we suspect
and avoids possible numericalroundingerrorsassociated that some of Froehlich'sfindingsstem frominadequacies
with sums of squares and cross products,as forexample of the search techni-que,rather than from undesirable
in X'0-1 (D-`X, ` y). For each of the three search
characteristicsof the maximum likelihood estimator
procedures,we utilize Froehlich's restrictedgeneralized per se. These feelingsare reinforced by the-simulation
least-squares estimator a (2), [6, ?3.4, p. 330] as the resultspresentedhere.
starting point. In lone research investigations, the To facilitate comparisons of estimator performance
ordinaryrestrictedleast-squaresestimatorwould provide we followas closely as possible the designof Froelilich's
a reasonablycheap and almost as efficient startingpoint. study. Of the five structuresof sample size 75 used by
Let A denote the Hadamard product' A*A [21] of A Froehlich,the last one displayed the worst convergence
with itself. The original Hildreth-Houck approach to record in maximum likelihood estimation [6, p. 333,
estimatinga considersthe relation footnote911. For this structurethe firstdesign vector is
t = Ga + w a column of units, the second vector is harmonic,the
(3.1)
third random,as definedin [6, ?4.3 and Appendix A].
where r = My = EI - X(X'X)-'X']y is the vector of Uniformand normal random deviates used in the for-
least squares residuals in (2.1), G = MX and w is an mation of our correspondingstructurewere drawn via
unobservable error component for which E (w) = 0, the McGill Super Duper randomnumbergenerator[14]
E(ww') = I = w(G, a). A restricted least-squares accessed on an IBM 360/65 computerat The University
estimatord of a may be determinedas of Iowa.2 This particular package has had wide ac-
-
min( Ga)'(f -Ga) , a ? > ceptance and now is in use at some 89 installations
nationally and internationally. Normally distributed
-Efficientnumerical computation of this estimator is
2Developed by ProfessorGeorge Marsaglia of McGill University,and first
' The Hadamard product C of two m X n matricesA, B has typical element publiclyintroducedat the Symposiumon the INTERFACE: ComputerScience and
cw= a?,b,,. Statistics,Berkeley,1972.

This content downloaded from 91.105.96.157 on Tue, 20 May 2014 07:37:54 AM


All use subject to JSTOR Terms and Conditions
RandomCoefficient
Models 71

1. Performance
ofPRAXIS,Davidson-Fletcher-Powell
(DFP), and Fisher'sMethodofScoring (FMs) Algorithms:
Numberof Cases in 200 SimulationTrials

PRAXIS/DFPcomparison PRAXIS/FMScomparison DFP/FMS comparison

Samplesize Computedlikelihood Computedlikelihood Computedlikelihood


value greaterfor value greaterfor value greaterfor
Identical Identical Identical
optimaa PRAXIS DFP optimaa PRAXIS FMS optimaa DFP FMS

75 184 16 0 166 34 0 182 18 0


25 179 19 2 159 40 1 176 24 0
aAll solutions were clearly identifiedas beirjg identical or not. In each identical solution, corresponding a, j = 1, . . ., k estimates each agreed to at least 5 significantfigures,
and the calculated likelihoods (excluding constant terms) to at least 6 significantfigures. In nonidentical solutions, corresponding parameter estimates and the computed likelihood
always differedat least in the second decimal place. All achieved firstpartial derivatives were zero at least to 4 decimal places.

disturbancesvtj in (2.1) used to formy samples were (2.2) was applied to determinethe correspondingesti-
also drawnfromthis generator. mate vector.Table 1 summarizesthe performanceof the
The completestudyinvolved200 simulationtrialswith three algorithmsin pairwise comparisons. From this
sample size n = 75, and a further200 trials at sample table it is seen that the PRAXIS algorithmappears to be
size n = 25 in a correspondingstructure.The vectors a between99.5 percentand 100 percenteffective, whilethe
and g were fixedat (1.0, 1.0, 1.0)' and (1.0, 0.2, 0.5)', reliabilityof the Davidson-Fletcher-Powellalgorithmis
respectively. between90.5 percentand 92 percent.Under the step-size
A list of pertinentcharacteristicsof the two design and stopping rules of Chow [4], the Newton method
matrices used in the study is given in the following (Fisher's method of scoring,FMS) convergedto a local
tabulation. optimumin all cases. Reliability in identifyinghighest
local maxima appears to be between 79.5 percent and
Sample mean and standarddeviation Correlation
betweensecond 83 percent. The DFP algorithmis certainlythe fastest
Sample Second design Third design and thirddesign computationally,but the cost of greaterprecisionusing
size variable variable variables PRAXIS appears worthwhile.
Comparisons of results for the PRAXIS maximum
n = 75 -0.0353 x -0.0772 x -0.1036
likelihood estimatorand other estimatorsare made in
1.0135 1.0135
Table 2, where for a given parameter, the minimum
n = 25 0.0002 x -0.1359 x 0.1354
sample absolute bias, variance, and mean square error
1.0206 1.0206
across estimatorsare underlined.Even at sample size
For a given set of aj estimates, (for a, a (2), or aML), n = 25, the MILE performswell. Whileone mustinterpret

2. MonteCarlo Sample Mean, Bias, Variance,and Mean Square Errorof Varianceand Coefficient
EstimationResultsof 200 Trialsbn SpecifiedDesign Matricesof Sizes 75 x 3 and 25 x 3
Sample Parameter True
size estimated value Mean Bias Var MSE Mean Bias Var MSE Mean Bias Var MSE

a. Variance estimation

a aML &(2)

75 a, 1.0 .9774 -.0226 .1817 .1822 .9495 -.0505 .1348 .1373 .9262 -.0738 .1650 .1704
at2 0.2 .2692 .0692 .0927 .0975 .2355 .0355 .0746 .0759 .2670 -.0670 .0827 .0872
a!3 0.5 .4411 -.0589 .1096 .1131 4412 -.0588 .1200 .1234 .4187 -.0813 .1051 .1117
25 a, 1.0 .7978 -.2022 .3367 .3776 .6652 -.3348 .2795 .3916 .6729 -.3271 .2845 .3915
at2 0.2 .3893 .1893 .2352 .2710 .4134 .2134 .3275 .3730 .3940 .1940 .2303 .2679
a!3 0.5 .5069 .0069 .3023 .3023 .4423 -.0577 .2858 .2891 .4901 -.0099 .3018 .3019

b. Coefficientestimation

/3 /3ML 3(2)

75 13i 1.0 .9960 -.0040 .0250 .0250 .9977 -.0023 .0208 .0208 .9937 -.0063 .0238 .0238
/32 1.0 .9814 -.0186 .0226 .0230 .9821 -.0179 .0225 .0228 .9806 -.0194 .0227 .0231
/33 1.0 1.0066 .0066 .0334 .0335 1.0081 .0081 .0325 .0325 1.0061 .0061 .0328 .0329
25 Al 1.0 1.0270 .0270 .0834 .0841 1.0364 .0364 .0683 .0696 1.0146 .0146 .0881 .0883
,62 1.0 .9812 -.0188 .0732 .0736 .9922 -.0078 .0685 .0686 .9801 -.0199 .0716 .0720
/33 1.0 .9809 -.0191 .1168 .1171 .9733 -.0267 .0983 .0990 .9727 -.0273 .1105 .1113

This content downloaded from 91.105.96.157 on Tue, 20 May 2014 07:37:54 AM


All use subject to JSTOR Terms and Conditions
72 of the American
Journal March1977
StatisticalAssociation,

these limited results with caution, the (PRAXIS) MLE [11] Hogg,RobertV., andRandles,RonaldH., "AdaptiveDistribu-
seems relativelyefficient, even at smaller sample sizes. tion-FreeRegressionMethodsand TheirApplications,"Tech-
17 (November1975),399-407.
nometrics,
At small sample sizes the restrictedleast-squares esti- [12] Kuester,JamesL., and Mize,JoeH., Optimization Techniques
matoris least biased, althoughthis characteristicapplies withFortran,New York: McGraw-HillBook Co., 1973.
to the MILE at large sample sizes. As anticipated,sample [13] Lemke,C.E., "BimatrixEquilibriumPointsand Mathematical
absolute bias, variance, and mean square errordecrease Programming," ManagementScience,11 (1965),681-9.
forall estimatorsas sample size increases. [14] Marsaglia,G., Anathanarayanan,K., and Paul N., "Improve-
ments on Fast Methods for GeneratingNormal Random
The resultsconfirmthose of Froehlich[6] concerning Variables," InformationProcessing Letters,5 (1976), in press.
the gain in efficiencyof his estimatord(2) relative to [15] Rao, C.R., Linear Statistical Inferencesand Its Applications,
that of a at both sample sizes. It appears, however,that 2nd ed., New York: JohnWiley& Sons,Inc., 1973.
the MLE may well be worth the extra computational [16] Ravindran, Arunachalam,"Algorithm431, A Computer
effortinvolved, certainly for larger sample sizes, but Routine for Quadratic and Linear Programming Problems
[H]," Communications of the ACM, 5 (September 1972),
possibly also for smaller samples. Results for 5 co- 818-20.
efficientestimationconfirmthesefindings. [17] Rosenberg, B., "VaryingParameterEstimation,"unpublished
Ph.D. thesis,Departmentof Statistics,HarvardUniversity,
[ReceivedApril 1976. RevisedSeptember1976.] 1968.
[18] , "Estimationin the General Linear StochasticPa-
REFERENCES rameterModel," ResearchReport,National Bureau of Eco-
[1] Box, M.J., "A Comparisonof SeveralCurrentOptimization nomic Research, ComputerResearch Center, Cambridge,
Methods, and the Use of Transformations in Constrained Mass., 1973.
Problems,"Computer Journal,9 (1966),67-77. [19] ' "A Survey of StochasticParameterRegression,"
[2] Brent,RichardP., Algorithms forMinimization withoutDeriva- Annals of Economic and Social Measurement,2 (October 1973),
tives,EnglewoodCliffs, N.J.: Prentice-Hall,
Inc., 1973. 381-97.
[3] Busiinger,P., and Golub,G.H., "LinearLeast SquaresSolution [20] - ,"The Analysisof a Cross Sectionof True Seriesby
by HouseholderTransformation," NumericalMathematics, Stochastically ConvergentParameterRegression,"Annals of
7 (May 1965),269-76. Economic and Social Measurement,2 (October 1973), 399-428.
[4] Chow,Gregory C., "Two MethodsofComputing Full-Informa- [21] Styan,GeorgeP.H., "Hadamard Productsand Multivariate
tion MaximumLikelihoodEstimatesin SimultaneousSto- Statistical Analysis," Linear Algebra and Its Applications, 6
chasticEquations,"International EconomicReview, 9, 1 (1968), (1973),217-40.
100-12. [22] Swamy,P.A.V.B., "Efficient Inferencein a Random Coeffi-
[5] Cooley,ThomasF., and Prescott,EdwardC., "Estimationin cientRegressionModel," Econometrica, 38 (1970),311-23.
thePresenceofStochasticParameterVariation,"Econometrica, [23] , "Criteria,Constraints and Multicollinearityin Ran-
44, (January1976),167-84. dom Coefficient RegressionModels,"AnnalsofEconomicand
[6] Froehlich,B.R., "Some Estimatorsfora RandomCoefficient Social Measurement,2 (October 1973), 429-50.
Regression Model,"JournaloftheAmericanStatistical Associa- [24] Theil, Henri, Principles of Econometrics,New York: John
tion,68 (June1973),329-35. Wiley& Sons,Inc., 1971.
[7] Golub,G.H., and Styan,G.P.H., "NumericalComputation for [25] , and Mennes,L.B.M., "ConceptionStochastiquede
UnivariateLinearModels,"JournalofStatisticalComputation Coefficients Multiplicateurs dans L'AjustementLineariedes
and Simulation, 2 (July1973),253-74. Series Temporelles," Publications de l'Institut de Statistique
[8] Hildreth,Clifford, "A PossibleMaximumLikelihoodIteration de L'Universitede Paris, 8 (1959), 211-27.
fora Random Coefficients Model," unpublishedmanuscript, [26] Zellner,Arnold,"TimeSeriesAnalysisand Econometric Model
University of Minnesota(January23, 1973). Construction,"in R.P. Gupta, ed., AppliedStatistics,New
[9] Hildreth,Clifford and Houck,James,"Some Estimatorsfora York: North-Holland PublishingCo., 1975.
Linear Model with Random Coefficients," Journalof the [27] , and Palm,Franz,"TimeSeriesand Structural Analysis
AmericanStatistical Association,
63 (1968),584-95. of MonetaryModels of the U.S. Economy," unpublished
[10] Himmelblau, David M., AppliedNonlinearProgramming, New manuscript, H.G.B. Alexander ResearchFoundation,Graduate
York: McGraw-HillBook Co., 1972. SchoolofBusiness,University ofChicago,1974.

This content downloaded from 91.105.96.157 on Tue, 20 May 2014 07:37:54 AM


All use subject to JSTOR Terms and Conditions

You might also like