RM4879

MEMORANDUM
RM-4879-PR
JULY 1967
MULTIVARIATE LOGARITHMIC AND EXPONENTIAL REGRESSION MODELS

C. A. Graver and H. E. Boren, Jr.
PREPARED FOR:
UNITED
STATES AIR FORCE PROJECT RAND
-------------~R~nD~
SANTA MONICA' CALIFORNIA----
MEMORANDUM RM-4879-PR
JULY 1967
MULTIVARIATE LOGARITHMIC AND EXPONENTIAL REGRESSION MODELS

C. A. Graver and H. E. Boren, Jr.
This research is supported by the l.'nited States Air Force under Project RA\,D-Con· tract No. F44620-67-C·0045-monitorf'd by the Directorate of Operational Rpquirempnts and Development Plans, Deputy Chipf of Staff, Research and Development, Hq l'SAF. Views or conclusions contained in this Memorandum should not be interpreted as representing the official opinion or policy of the United States Air Force. DISTRIBUTION STATEMENT Distribution of this document is unlimited.
-------------------~R~nD~
17ClO MAIN
SL • S,.t..NTA MON1C ... CAlJ~OIN1A
<)0.06------
Published by The RAND Corporation
-iii-
PREFACE
This Memorandum is primarily the result of research being conducted in the Cost Analysis Department and the Mathematics Department of The RAND Corporation on the general use of statistics in the field of cost analysis. Cost analysis work involves many types of estimating relation-
ships that are derived from historical data and that functionally relate resources to one or more system parameters. However, a relationship
is not very meaningful unless one can indicate how well it will predict future resource requirements. To accomplish this, appropriate statis-
tical measures must be determined, and the statistical assumptions implicit in the derivation of the estimating relationship must be clearly understood and satisfied, The authors wish to acknowledge the generous help of Dr, Albert Madansky (now of Market Planning Corporation, New York) whose overall guidance and careful review, comments, and additions to the manuscript contributed greatly to this study. Dr. Madansky also assisted in the
development of the computer program by devising special techniques that eliminated problems inherent in some of the matrix operations, Although the mathematical models contained in this Memorandum were completed in January 1965, this publication was delayed because of several programming problems. The most serious of these involved a RAND Some
auxiliary subroutine that generates eigenvalues and eigenvectors.
of the matrices used in one of the models discussed in this Memorandum were too degenerate for that subroutine to handle. As a result, Mr.
John Derr, of the RAND Computer Sciences Department, modified the subroutine so that it would generate satisfactory eigenvalues and eigenvectors
-ivin the computer success preciated program.
Mr. Derris
efforts
contributed
greatly
to the ap-
of the development by the authors. are also due to
of the computer
program
and are deeply
Thanks Sciences
Mr. Charles H. Bush of the RAND Computer

of the variable format procedure
Department
for his development program.
used in the computer
-v-
SUMMARY This Memorandum presents a statistical study of a regression function of the form
Y
X I-Lp p
The study is concerned primarily with (1) how to pick the "best" values for the unknown parameters (!-LO' -Ll' 2' .•• , !-Lp) ! ~ given a set of historical data (Yo, Xlo, X20' .••, Xp i, i 0' ~ ~ ~
=
1,2,
••• , n), and (2) how to In order to
describe the predictive capability of the choice in (1).
study problems (1) and (2) above, the error assumption on Y must be described explicitly. For this study. two basic types of error assumptions are considered-multiplicative and additive. In the multiplicative model, the error term
is assumed to have a log-normal distribution and acts as a multiplier to the hypothetical regression function. This assumption is shown to
lead to the usual treatment of problem (l)--taking natural logarithms of the regression relationship and then using linear least-squares estimators for estimates of the parameters. Two forms of the multiplicative model are treated. In one form,
the hypothetical regression function is assumed to be equal to the median of Y, whereas in the other form, the function is assumed to be equal to the expected value (mean) of Y. The study shows, however, that the It is
usual logarithmic treatment actually represents the median form.
also shown that the choice of the above forms of the multiplicative model affects only the estimate of ~O and its distribution, and does not affect the estimate of the other parameters (!-Ll' -L2' , !-Lp)' ! ••• their distributions, or the prediction interval of Y.
-viFor the additive model, the error is assumed to be normally distributed and additive to the hypothetical regression function. assumption leads to the problem of least-squares estimation of a nonlinear form. The solution of the estimators for the additive It is shown, however, that in this study, This
model is not exact and may not be unique.
for the type of regression function under consideration an absolute minimum of ~-l
..., ~
where
L-
(Y. - f.)2 can be found for those values

L 1
] that satisfy 2f. - Y. > 0,

L 1
f.
X .,
~p p1
and Y.
observed value at each point for i ; 1, 2, ••• , n.
For both models, methods of obtaining the estimators, distributions of the estimators, and prediction intervals for the dependent variable, Y, are given. Also, methods of comparing the two models are discussed.
A computer program was developed to calculate the estimators, distributions of the estimators, and prediction intervals, as described in this Memorandum~ A discussion of the computer program, presented in sequence of operations, input
Section V, deals with program restrictions, procedures, and program outputs.
One of the significant findings of this study is that care must always be taken in comparing different types of estimating relationships. Since the derivations of these estimating relationships may be based on different principles or error assumptions, the same statistical measures may not be applicable in all cases, and, if used, may lead to misleading conclusions.
-vii-
CONTENTS
PREFACE
"
51
..
iii
v
SUMMARY ............................................................................................................. Section I. INTRODUCTION II. THE MULTIPLICATIVE MODEL_. . . • •• •• •••••• . • • • . • ••• •• ••• •. • • Assumptions and Implications of the Cases •••••.••••••• Median Case
Mean Ca se Estimators of the Parameters
1 4 5 5
6 8
Distribution Prediction III.
of the Intervals
Estimators •••••••••••••••••••••••• ••••••••••••••.•.•••••••••••••••••
13 18 23
THE ADDITIVE MODEL •••......•..•.•.••..•••..•.•••••..••••

Genera 1 Mode 1 " " " "................
23
Estimation Distribution Prediction IV. V.
•.•• , • •••• •• . .•• •. • • ••• • .• .. • • . • • . • • • • • . • ••• of the Estimators Interval ••..•.•••.••.••.•.••••••••••••.•.••
24 30 31 35 38 38 42 46 51 51 55 58 58 61 61 61
COMPARISONOF THE MODELS ••..•.•...•••..••.•••.•...•....• COMPUTERPROGRAM•••..•••••.•••••.•••••.••.....•..•....•. Introduction. •• •••••. • • ••• .• •• . •• • . •••• . ••• .• . . • . . • .•• Restrictions •..••.••••.•.••.••••.•.•••.•••.•••.•..•.•• Sequence of Operations ••••••••.•••••.••.••.••••••••.•• Input Procedures •••..••••••••••.••••••••••.•••••.••.•• Title Card ••••••••••••••••••.••••••••••••••••••.•••. Orde r Card ••.•••••••••••••••••••••••••••..•..••.•.•• Format Card •••••••••••.••••..•••••.••.•••••••••.••.. Data Cards ••••••••••••••••••.••.•••••.•.••••••.•.... Blank Card •.••••••••••.•.••.••••••.••••••••.••.••.•• Summary of Input Cards ••••••.••••••••••••••.•••.•••• Output s •.••••••.•••••••••.•••••.••.•.••••••••.••.••..• Future Studies •••••••••••••••.••••••••.••.••••••....•.
69 73 73 73
74 74 74 74
Appendix A. MATRIX DEFINITIONS AND OPERATIONS Matrix " " "" " "" ".. " "" Equality of Matrices •••••••••••••.••.••••••••••••••••. Matrix Operations ••••••••••••••...••••••••••••••••••.• Addition .." " ".." " Scalar Mu1tiplication •....••••.•••.•.••••••••••••.•• Matrix Multiplication •••.•••••.•••••.••.•••••••••••. Special Matrices ••••••••••.••••••.•.•••••.••...••••••• Square Matrix •.••••••••••...••.••••••••.•••••.••.•.. Identity Matrix •••.••••.••••••••••••••••••••..•••.•.
..
" ..
75 75 75
-viii-
Transpose Syrarnetric
~trix r1a.trix
•••.••••••.•••••.•..••.•..••••.••••
76 76
Inverse Matrix ••••••.•••••.•.•••••..•..•.•••.•••••• Positive Definite Matrix ••••••••••••••••••••••••••• B. STATISTICAL FACTS AND RELATIONSHIPS •••........•...•.••. Measures of Central Tendency .•..••.•..•....•••.••.••• The Expected Value or Mean of X •.•...•.•...•.......
Median of
X
76 77 79 79 79
79
Percentiles of a Continuous Random Variable •.••...... Measures of Variability..............................

Variance Covariance Covariance Matrix
80 81
81 81 82
Linear Combinations of Random Variables .•...••••.••.. Distributions Related to the Normal •.....•..•..•..... Chi-square Distribution .•..:••••.•••• ;............. The t-distribution ••..•....•..••....••.•..••.•..•..
C.
82 82 82 83 84 86 86 87 91 98
CONVERSION OF e 0 TO
I-L
~o
D.
ABSOLUTE MINIMUM AND POSITIVE DEFINITENESS OF THE MATRICES OF SECOND PARTIALS •........ ..•........•.. General Discussion .....•.........•.......•........•.. Specific Case •..•..•••.•....•.•.••.•.•.•...••..•.•..• VARIANCE-COVARIANCE MATRIX.............................
E. F.
FORTRAN IV SYMBOLIC PROGRAM

t:I .. .. .. .. • .. .. .. • • •
REFERENCES
138
-1-
I.
INTRODUCTION
Given dependent
a set of historical variable
data that includes variables
observations
on a
and independent
of interest, involved, related
and having the analyst problems: in the
lalready decided , is usually
on the form of the relationship with the following values closely
confronted
1.
Picking chosen
the "best" relationship.
for the unknown
parameters
2.
Describing
the predictive
capability
of the choice
in I,
In this Memorandum relationship:
these problems
are analyzed
for the following
The quantity regression

p
on the right function."
of the symbol ~
is called
the "hypothetical (Xl' X ,
It consists of which
of independent
variables
••• , X ), the values to be positive ~ ) , which

p
are given to the analyst error,
and are assumed (~O' ~l' of by an ... ,
and have no associated
and parameters
are unknown.
The symbol Y is the dependent that the symbol ~would no error associated with
variable
interest, equality of Y.
and it is assumed sign if there were
be replaced
the observation
The assumption the reason
of error
associated
with
the observation were
of Y is dropped,
for problems
1 and 2. would
For if this assumption need is p

p~
=
then all that the analyst of the form (Y., Xl"

i, ~
1 data sets (observations)
X" 2~
... , X ,), i
1, ••. , p Then,
+ 1, to determine
of course, the
the p
1 parameters could
~O' ~l' .•• , ~p exactly. (predict with no error)
analyst
calculate
the Y value
associated
-2-
with any values of (Xl' X2, .•• , Xp)'
The "best" choice of the param-
eters would obviously have been made and the predictive capability would be perfect. However, with the error assumption on Y included
in the model, the analyst must not only determine the "best" estimates of the unknown parameters (problem 1), but even if he were given the true values of these parameters, his predicted value of Y would in general still be different from the observed value of Y, and hence problem 2 would still exist. It is therefore apparent that the error
assumption onoY is of great importance. In order to study problems 1 and 2, the analyst must explicitly describe the error assumption on Y. will be discussed in this study. Two forms of the error assumption
In the section entitled" The Multi-
plicative Model," the error is assumed to have a log-normal distribution and acts as a multiplier to the hypothetical regression function. This
assumption will lead to the usual treatment of problem 1, i.e., taking natural logarithms of the regression relationship and then using the linear least-squares estimators for estimates of the parameters. explicit types of this model are discussed. Two
In one type it is assumed
that the hypothetical regression function is equal to the expected value of Y, and in the other it is assumed that the hypothetical regression function is equal to the median of Y. In the next section, entitled
"The Additive Model," it is assumed that the error is normally distributed and is added to the hypothetical regression function. This leads In
to the problem of least-squares estimation of a nonlinear form.t
tIn this Memorandum, the two models are also referred to as "logarithmic" (multiplicative error assumed) and "exponential" (additive error assumed).
-3-
these two sections, methods of obtaining the estimators, distributions of the estimators, and prediction intervals for the dependent variable, Y, are given. Section IV consists of a practical comparison of the two
models, and Section V presents a discussion of the computer program relating to the two models. Considerable use of matrix notation has been made in the following sections. Hence, familiarity with the ordinary operations of matrix A short discussion of
algebra will prove beneficial to the reader.
these operations has been included in Appendix A for those who are not familiar with them. Similarly, some of the statistical facts and relationships that have been assumed in this study are recorded in Appendix B. No attempt
has been made to cover all of the statistical ideas assumed in the study, but an attempt has been made to include those necessary ideas that are most easily forgotten. It is suggested that the reader glance
through this appendix before proceeding to the main part of the text. IJ..O Finally, it should be noticed that e has been used for the constant multiplier of the regression function instead of an ordinary multiplier, say, IJ..O' This form has been used purely for mathematical convenience. It should be noted, however, that this imposes the restricIt has been anticipated
tion of positivity on the constant multiplier.
that this Memorandum will be used mainly by individuals interested in positive dependent variables such as weight or cost. that this restriction is of no practical importance. Hence it is felt For those who would
like a constant (positive) multiplier IJ..O the regression function, in IJ..O instead of e • a conversion procedure is given in Appendix C.
-4-
II.
THE MULTIPLICATIVE MODEL
In the multiplicative model it is assumed that for any values of the variables Xl' X2,
Y
... , X
, the corresponding value of Y is given by

8,
where 0 is a log-normal random variable, i.e.,t In 0 E,
and E is a normal random variable with mean and variance given by Med E Var E The mean, variance, and median of 0 are then given bytt
\,I"f-ai2
\I,
E0
Var 0 Med 0
2v+-0}\ eIi e M
e.
\I
-1)'
Notice that the median of 6 is not equal to the mean of 6, even though the mean and median of E are identical (since variable and the normal distribution
is a normal random
is symmetric).
From these assumptions and because the hypothetical regression
tThe expression" In" in what follows stands for logarithm to the base e. All logarithms in this Memorandum are taken to the base e.
[5, p. 89J.
ttFor further discussion of the log-normal distribution, see
-5-
function follows
is a constant
for any given
values
of Xl' X2, with
... , X p'
it
that Y has a log-normal ( !-La !-LI e Xl ( !-La !-LI e Xl
distribution
EY Var Y
!-L vl-a 2 ) X Pe p ..'
i
e
...
x~P)2
.2 20-~(
eCYM -
1) ,
and
Med Y
!-La I-Ll !-Lp)v c ... Xp e ;. ( e Xl In Y is a normal random variable with
It also follows
that
E [In YJ
and Var [In
!-La +!-LI In Xl + ... + f1p In Xp + v
YJ
of this model will be studied. The first of
Two explicit assumes
forms
that the hypothetical of Y, while
regression the second of Y. turn
function assumes
is the median
the distribution regression
that the hypothetical of which model to
function
is the mean
The choice out that
use is up to the analyst. the estimate nately does
It will
this choice but
affects fortuof these
of !-La and the distribution not affect the other
of this estimate,
estimates, for Y.
the distribution
estimates,
or the prediction
interval
ASSUMPTIONS
AND
IMPLICATIONS
OF THE CASES
Median
Case In this case it is assumed that for any given value of the associated of the vari-
ab les Xl' X2'
••• , X p , the median Med
Y is given by
!-LO !-LI Xl
-6-
This implies that Med 6

It follows that
l,i.e.,e
1, and hence that
\!
0,
E6
aMI 2
fVar 6
EY
2 2 O"M(O"M e e
1) .~. ,
2 j..L 2 ai X Pe p
j..LO j..Ll e Xl
and
It also follows that In Y is given by In Y 2 where E is normally distributed with mean zero and variance aM' This
case has thus reduced to an ordinary linear regression model for In Y. Mean Case In this case it is assumed that for any given values of the variables Xl' X2' ••• , Xp' the expected value of the associated Y is given by EY This implies that E 6 Notice that
\!
1, L, e. ,
1, and therefore that For if

\! =
\!
= -O"MI2.
is no longer assumed to be zero,
0 were 0
assumed, then, in order to have E 6 would also have to be made.
1, the assumption that a~
This would in turn imply that Var 6
0,
or, in other words, that there was no error.
But, as mentioned in the Hence
Introduction, a model with no error assumption is unrealistic, one cannot assume that
\! =
O.
-7 -
Now
from the assumptions Med 0 = e ,

\)
of this model
it follows
that
Med Y
... /'p)e v
p
Var 0
and
Var Y
It also
follows
that
where
is normally equal to
distributed 2 OM' However, regression equal
with since
expected
value
equal
to \) and
variance
\) i 0, this case does not reduce for In Y. The expected value
to an ordinary
linear
model
of In Y is no longer function,
to the log of the hypothetical
regression
but is given by
Fortunately, to obtain change

~
without
changing
the distribution model
of In Y, it is possible for In Y by making by letting a
the ordinary
linear regression This
in on~ of the parameters.

0
is accomplished
* o
=~
+ \) the model. in
In Y
'k
Then,
where
is normally
distributed regression
with
mean
zero and variance
%,
which
is the ordinary
linear
model
for In Y.
-8-
Since both cases reduce to essentially the same model, it is clear that problems land 2 posed in the Introduction are essentially the These cases really differ only in their treat-
same for the two cases,
ment of the intercept parameter, ~O' ESTIMATORS OF

THE
PARAMETERS
To be able to handle both cases at the same time, let In Y

f10
+ !-LI Xl + '" In
+ !-Lp Xp + E* , In
2
~'
where E* is normally distributed with mean zero and variance !-LO and E
Then
-;<
are related to the cases as follows: Median Case

1-10 E
Mean Case
~o +
\)
E - \)
Xl" X2',
1.
Now taking n sets of historical data of the form (y"
1.
1.
....
,), it follows that for i , Xpl. In

y,
1, 2, ""
n, In X
,~
II.
1.
,...0
II.
..... 1
In X , + ' .. +
11.
""p
II
pi
+ E~
i'
.'.
""
* En are independent and identically distributed normal

2
random variables with zero mean and variance equal to oM' random vector
Hence the
In Y
-9 -
has a multivariate normal distribution with mean vector given by
* ~O + ~l In XII + ~2 In X2l + ...+ ~p In Xpl

~o
-I,
+ ~l In X12 + ~2 In X22 + ...+ ~p In Xp2
* ~O + ~l In Xln + ~2 In X2n + ...+ ~p In Xpn

2 and covariance matrix g~ven by oM . I, where I is the identity matrix of order n. Now problem 1 in the Introduction was that of picking the "best" estimators of the unknown parameters in the hypothetical regression function. "best." The first problem here, of course, is to define the word By applying the Gauss-Markov theorem to the above multivariate
model, it turns out that the maximum likelihood estimators of the parameters are equivalent to the least-squares estimators of the parameters and that among the class of unbiased, linear estimates they have the property that their variances are simultaneously smallest. Hence
the least-squares estimators seem to be reasonable candidates for the definition of "best" estimators, and this definition will be adopted in the present section. Before solving for the least-squares estimators, it is desirable to change the form of the equations slightly. The reason for this will
become apparent when the distribution of the estimators is discussed in the next section. In Y. So for i = 1,2, ..., n, let
1.
-10-
where
n
In for
j=
X. J
In X.k J
n
1,2,
..., p.
The least-squares estimators of ~l' ... , ~p for Of course
this form will be the same as those for the previous form.
I
the least-squares estimator of ~O will be obtained instead of the least-
* squares estimator of ~O.
However, the estimator for the latter can
be found by using the relationship
~O
~O - ~l In Xl - ~2 In X2 - ... - ~p In Xp
Hence no restrictions have been imposed by this change. The following notation will also prove helpful. symmetric matrix whose elements are given by Let
S
be a p x p
k=l
s ..
1.J
I
n
if
j,
I
k=I 1, 2,
(In Xik
j
lnX.)(lnX·k
1.
In X.)
J
if
i -I j, T
for i
... , p;
n t.
= 1, 2, ... , p.
Let T be the p-dimensional
column vector whose elements are given by
1.
I
k=l
, p.
(In Xik - In Xi)(ln Yk
In
Y)
for i
1, 2,
...
The least-squares estimators of ~O' ~l' ..., ~p are those values of these parameters that minimize
I
Q(~O'~l' ... ~p) = ,
I
-k=I
(In Yk - [~O + ~l(ln Xlk - In Xl)
+ ...
+ ~p(ln Xpk - In Xp)J}2.
-11-
By taking the partial derivatives, (oQ)/(o~O) and (oQ)/(o~i)' i ••• , p, and setting them equal to zero, the following p are obtained: In Y In Y and sil~l + si2~2 + ...
n
1, 2,
+ 1 equations
~O
(~) [L
k=l
~l(ln X1k - In Xl) + .•.
+ ~p (In Xpk - In Xp) ]
+ sip~p
ti for i
= 1, 2, ... , p.
In matrix notation these equations are given by In Y
~O and
s
~p
T.
The least-squares estimators of the parameters are then given by the values of the parameters that satisfy the above equations. these estimators are given by In Y and
h
Hence
~l
h
~2
S T.
-1
~p
-12-
The
least-squares
estimator
of ~O is then given
by
All since case,

!-L
the estimators
for the median and,
case have
thus been
"'~
-'.
obtained,
for j..i.0 this case
therefore, To obtain
But j..i.0j..i.0· an estimator
,.
in the mean of an f.LO'
+ j..i.Oj..i.Ov and v
2 -ai2•
estimator
of v, and hence
2
aM'
must
first
be obtained. is given 2 by
The estimator
2 of aM that will
...2 \'
n
be used
%
where
L
k=l
(In Yk - In Yk) n - (p + 1)
In Xl)
+ ...
+ ~ (In X k - In X )
P P P for
k
1, 2,
... , n ,
There
are three (i) (ii) (iii)

,,2
reasons
for choosing
this estimator: estimate 2 of aM' of
%
~
is the maximum is an unbiased
likelihood estimate
ar-f
,,2
.... 2
i. e. , E
2 aM'
",2 aM is independent
of all the least-squares obtained in this section. by
estimators
of the
f.L's,previously Now
Il-O
let the estimators respectively. for k

=
of \) and f.LO be given Notice that with In Y , k
= -&~/2 and
it
'"
.'. ~~ - v,
that by
follows is given
(1, 2, ... , n),
/'\
these
estimators
the estimate
of E In Y , k
i~k
to
+ " In Xlk + f.LI +
...
.,
In + j..i.p Xpk
,.,
in the median
case,
+ "" 1-10 1-11In Xlk

things
" \I + I-1pIn Xpk + '"

before case
in the mean
case.
Two other First,
should that
be noted
concluding
this
section. model
it is obvious
the median
of the multiplicative
-13-
is the usual function
model
that one uses
to fit a multiple data. via That
linear
regression logarithms
to a set of historical
is, when
one takes
of the data assuming
and fits the parameters case
least squares, model.
one is implicitly
the median
of the multiplicative of
Second, must
the estimation or else
a~ indicates
reason
that the inequality or undefined,
n>p which
+. 1
be satisfied,
a~ would
be negative
is
clearly
not desirable.
Another
for this restriction that n ~ p
can be found
by examining fied,
the S matrix.
If the condition and hence
+ 1 is not satisSince this
then S will will
be singular
S-l will
not exist. model,
restriction that n > p
also be necessary
in the additive
the assumption
+ 1 will be made throughout.

OF THE ESTIMATORS solving problem facts 2 in the Introduction, about the distributions variances, themselves ~l' it will be neces-
DISTRIBUTION Before
sary to find out some We will need
of the estimators. and (at least
the expected
values,
covariances, of each
p
approximately) Looking equations
the distributions first
of the estimators. from the
at the estimators
... ,
~ , it is clear
-1
"
I-1p that each of these variables estimators

1.
is a linear
combination
of the normal dis-
random
In Y ..
Hence been
the estimators stated
are also normally
tributed. i. e. ,
It has already
that the estimators
are unbiased,
-14-
E ~.
for
1,2,
... ,p.
Finally
it can be shown that the covariance
matrix
of these estimators
is given by aMs
2 -1 t .
0"
Therefore
we have
that
Cov and
,., C}.L.,
") }.L.
2 .0
s~J
'
l'
j, i
1, 2, .",
p;
1, 2,
.•. , p,
i=
1, 2, ... , p,
Of course, 2. aM J.S un k nown,
where
Sij
i
0
s th e Co ~,J .) -th e 1ement of these variances
01:
1"
S-l •
but for estimates
and covariances
one may replace
It should was obtained model from
be noted
that the simple
form of the covariance in the intercept Indeed,
matrix of the for that
as a consequence
of the change earlier.
* }.La to }.La'
consider
as discussed this nice
the reason
change was Next, tributed previous

o •
to facilitate
form of the covariance of
matrix. dis-
the distribution
}.La'
A
It also is normally that were given in the
and unbiased argument.

••
for the same reasons being
Furthermore,
the sample mean of the In Y.'s,
2/ J.ts varJ.ance J.S gJ.ven b y aM n. of the other algebraic least-squares
It also turns out that
~a
is independent by an
estimators,
a fact that can be proved projections. Hence,
argument
that uses orthogonal
,.
}.La
}.La'
aM
n 2
Var ~o
I
Cov (~o ' ~i)

0
for
1, 2,
... ,
p.
2 . ,,2 A gaJ.n aM can b e est~mate d b yaM'
tFor proof,
see [1, pp. 245-250J.
-15-
. As ment10ne d ear l' 1er, ,,2.. 1S 1nd epen d ent aM of the ~'s mentioned so far in this section. .. 2
0f
a 11
0f
. th e est1mators
Thus the covariances of .. 2
aM with the ~i's are zero.

,,2
Looking at the equation for
aM' one sees
that [n - (p
+ 1) J aM is equal to the sum of squares of independent normal

Hence,
2 random variables that have zero mean and variance oM"
cr;[
n - (p
+ 1) l
aM
has a chi-square distribution with n - (p loss of p
+ 1) degrees of freedom.
p
The
+ 1 degrees of freedom is due to the
+ 1 linear relation1
ships already established between the random variables, In Y., by the least-squares equations
I
1-10 and ~l
In Y
T.
Now since the expected value of a chi-square random variable is equal to its degrees of freedom, and its variance is equal to twice its degrees of freedom, it is easy to see that
and
tn ,.,*
(p
+ 1) ]
The distribution of ~O is, of course, normal, as 1-10 is a linear combination of normally distributed variables, namely,
,,*
-16-
The expected value is given by E
n~
-'.
E
I
~O - In Xl
A
...
~1
...
- In X
...
~p
~O - In Xl ~1 Hence
the
... -
In X
p ~p
= ~O· *
,.,"
~o
oJ.
is also unbiased.
The variance and covariances are found by
following: Var
p
,..
~o -
I
i=l
Cov (~~,
ni)
p
'\'
In Xi
+ )"
G
(-2 In Xi
Var " + ~i
i=1
j=l jfi
--X. --X. In In
1.
Cov
rn.. ~j))
1.
In X For i
1, 2,
oJ.
... ,
p,
Cov (~~ , ~i)
~O~i
,.,*"
,.,*
~o
'" i-Li
C' " ~O~i
- ~i~l In Xl
I
" - E i-Li E
/
(~O ~i
" - i-L1 In Xl
E" i-LO
- ... -
" In X ) i-Lp P
~O~i p
-E"
(I
j=l
(E
n.~. J
1.
E '" i-Li
E ~.) J
In Xj)
-17 I
Cov (~o ' ~i) - In X. Var '" l-1i 1
j=l jh
-aM
I
2
In X. Cov (~. , ~j) 1 J
j=l Therefore, Cov
I
~l)
ln X. s
J
ij
W~,
...
ln Xl
-0: S
Cov (1-1 ' ~2) 0

-~
... *
2 -1
ln X
Cov (~~
, ~p)
In X
The distributions of all the estimators for the median case have now been obtained since 1-10
,..
1-10 in this case.
7~
In the mean case, however, For this case, ~O
the distribution of ~O is still to be determined.

... ,.,"
1-10 -
2 ...I
2.
The exact distribution of ~O is not really obtainable since it is a linear combination of a normal and (essentially) a chi-square random variable. For large n, however, it can be assumed that ~O is approxAn easier problem is that of determining the mean and and the covariances of
I
imately normal. variance of S1nce

• ....2
no
nO
and ~i for i =
is independent of ~O' ~l' ... , ~p and since ~~ is a function is independent of
...
1,
2, ... , p.
... ""n
of only these random variables, it follows that ~ 1-10•Therefore,

E 1-10
;,.
fL
>'<
~CJ
2
M fLO'
-18,..
J..L
Var and
Var
J..L
,..* + 1 Var ... 2 ~, O 4

..._
Cov (~O'~i)
= Cov
(~~,
~i)
for
1, 2,
. . ., p .
The distributions of all of the estimators of the mean case have thus been obtained. Introduction.
PREDICTION INTERVALS
We are now able to attack problem 2 of the
One of the problems posed in the Introduction is that of providing a measure of how well Y can be predicted by the estimated regression function. A prediction interval will be used for this purpose. This
takes the form of < where 0 < Y < 1, and Y~ and Yare
u
<
Y}
u
'Y,
bounds that depend on y and the

t
values of the variables Xl' X2, ..., Xp'
The prediction interval
takes into account the variability of the predicted Y due to the random nature of the estimates of the parameters of the regression function as well as that due to the error assumption in the hypothetical regression function. For a given y, the predictive ability is, of course, best for Notice
those values of Xl' ..., Xp for which the interval is smallest.
that the prediction interval also gives high and low estimates of Y for each y. To obtain the bounds, again examine the predicted value of E In Y as expressed by the following:
( 1)
In Y tThese values will not generally be from the sample.
/"
-19 I
( 2)
= ... +
(In Xl I-LO I-Ll
...
In Xl) + •••
,. (In X - In X ) I-Lp p p (median case) (mean case).
(3)
l~o
... + I-Ll Xl + ••• In
... + IJ. In X p p
... lJ.o + 1J.1 In Xl + .•.
+ ... In X + ... v I-Lp p

p
In these equations,
In Xl' •.• , In X
are the same quantities used in
the estimation procedure. only.
They are functions of the historical data
On the other hand Xl' .•• , Xp or In Xl' .•• , In Xp are the

y
values of the independent variables for the predicted,
that is being
From Eq. (1) or (2) it is easy to see that In Y is normally distributed, being a linear combination of normals, with
E In Y
/"'.,.
r>:
*+ I-La I-Ll Xl + ... In
+ I-Lp Xp In
and /\ Var In Y In X
A P
(To derive Var In Y use (2) obtain Var ~~.)

.'.
and the same reasoning as that used earlier to
From the assumptions of this model, it was seen that In Y was normally distributed with
-;'(
E In Y and Var In Y
=
1-10
+ I-LI Xl + ". In
I-L P
In Xp
O'M'
Furthermore,
,A... it is independent of In Y since all the Hence the random variable D, defined by
randomness of In Y comes from

D
* E.
A In Y - In Y
-20-
is normally distributed with ED where
and
Var D
In X - In X p p
A prediction interval can now be set up using the standard normal random variable
To get numerical bounds, however,
2 OM
would have to be estimated by
A2 G. M
A more accurate prediction interval, which takes into account this estimation, can be set up as follows. It has previously been shown that
(p
Cn -
+ l)Jo-~
G
M
Further-
has a chi-square distribution with n - (p + 1) degrees of freedom. more it is independent of In Y and In Y (seen from Eq. (2» therefore it is independent of
A
and D, and
Hence,
[ n - (p
...2/ 2 + 1) JOM oM n - (p + 1)
has a t-distribution with n - (p + 1) degrees of freedom.
-21-
To obtain the prediction interval then. choose the probability. Y. that is to be covered by the interval. Let IY = (1 + y) 12. Then find
the IY(lOO) -th percentile of the t-distribution with n - (p + 1) degr ees of freedom. Call this t IY -IY -t.
O!
Since the t-distribution is symmetric about Therefore.
zero, we have tl
or
or
The latter is a prediction interval on In Y.
But since exponentiation
is a monotone function, it can be performed inside the brackets without altering the probability content of the interval. diction interval for Y is given by Therefore the pre-
pr'A
ln
f2 Y-tctJv"~l <Y < ."
/"Y+tctJvOM
T21
y.
Notice that this interval is the same for both cases.
To describe
the interval in terms of the estimated regression function of the cases, it follows from Eg. (3) that
A-
In Y
le>? ... o.p) v

"
"
....
J..L
-(!-Lo
p p p
for the median case. for the mean case.
f.Ll Xl
-22-
Hence the prediction
intervals are given by
P Ye
.I~-tjvcr~
0 -t
r::/I.
< Y < Ye
,. t
JV(J~
for the median case and by
P Ye e
VOM
,.2
< Y < Ye e
~ v t)vcr~\
for the mean case, where Y is the predicted value of Y given by

....
IJ.O 1-11
,.
...
X I-1p
Again it should be noted that the prediction interval does not depend on the particular case being used. The endpoints of the inter-
val look different because ~O' and therefore Y, is different for the two cases (i.e., ~O (mean case)
=
~O (median case) - " v).
It is the
positioning of the estimated regression function in the prediction interval that is different, and not the prediction interval itself. This concludes the discussion of the multiplicative model.
-23-
III. GENERAL MODEL
THE ADDITIVE MODEL
In the additive model it is assumed that for any given values of
... , xp ,
the value of Y is given by Y
~p
+~, Since it is usually
where ~ is a normally distributed random variable.
considered desirable to have the distribution of Y centered on the regression function, it is assumed that the mean of ~ is zero.
2
The Hence
variance of ~ will be denoted by aA and is assumed to be unknown. Y itself is normally distributed with EY and Var Y Med Y ~.
2
~O ~l e Xl
~p P
Notice that the variance of Y is independent of the value of Xl' ..., Xp' Notice also that the median of Y is the same as the mean of Y; hence in this model we only have to consider the one case. The distribution of the historical data, which consists of n sets of the form (Y., Xl"
1 ~
X2., •.., X .), where n>

~
p~
+ 1, is easily
obtainable from the above. The random vector
-24-
has a multivariate
normal
distribution
with mean vector
given by
~ XP
pn and covariance identity matrix 2 given by 0A . I, where I is the n-dimensional
matrix.
ESTD-rATION To tackle to determine problem 1 of the Introduction, by "best" it is again necessary Of course "best"
what
is meant
estimators.
may be defined
in many ways.
One way that has some nice properties estimators as the "best" estimators. that
is that of using That is, choose minimize
the least-squares as estimators
of ~O' ~l' ,•. , ~p those values
X~ pl
i=l Besides being the least-squares estimators estimators,
~ )2
are the
these estimators
maximum-likelihood Because squares
and are also approximately "best" estimators
unbiased. as least-
of these properties, estimators
will be defined
in this section, of finding these estimators will be much more case difficult
The problem
in this case than it was in the multiplicative regression on In Y was assumed) since nonlinear
(where a linear estimators
least-squares
-25-
must be obtained. problem directly.
To see why problems arise let us first attack the The usual way to minimize Q would be to take the
partial derivatives of Q with respect to ~O' ~l' ••• , ~p and set them equal to zero. are obtained: After doing this, the following least-squares equations
2Q
o~o
-2
i=l n -2
(Y.
-e ~
"o
~I Xli
~p "o "i X p i, .)e Xl. ~ ~p ~o f.LI X p i, .)e Xl. ~ for
~p X. p~
0,
QQ
O~j
i=l
(Y.
- e Xli
~o
~l
f.Lp X . In X .. p~ J~ I, 2, .." p,
Having obtained these equations, one would try to solve them for
~O'
... , ~p .
(i)
But this leads to the following three problems:
How does one solve the p p + 1 unknowns?
I nonlinear equations
for the
(ii) (iii)
Having accomplished Having accomplished
(i), is this solution unique? (i), is this solution a minimum?
Problems (ii) and (iii) arise from a generalization of the problem of solving the equation ~ dX
0,
where f(X) is a function of just one variable X.
It is not true for
every function f that there exists just one value of X for which the derivative, [df(X) ]/(dX), is equal to zero, Any such value is called
a critical point and can represent a relative or absolute maximum or minimum, or an inflection point of the functionf(X). In like manner
the set of least-squares equations can have more than one solution.
-26Each solution is a critical point of the equations and may represent a relative minimum or maximum, an absolute minimum, or a saddle point in the surface described by Q(~O' ~l' ••• , ~ ).
p
(From the nature of Q
itself it is fairly clear that no finite critical point will represent an absolute maximum.) of Q. It should be noted that these problems do not arise in the multiplicative model, since in that case we have p + I linear equations in p + I unknowns and therefore there is a unique solution for the parameters that minimizes Q (assuming that the equations are linearly independent, i.e., that the determinant of
S
Hence care must be taken in the minimization
is not equal to zero),
Hartley [3J has studied the problems posed in (i), (ii), and (iii) for general nonlinear regression functions .. The, solutions to the of his results.
questions given in this section are a specialization However, a slight modification
of his convergence routine is used
here, so the following is not a direct specialization. To find a solution for the least-squares equations, initial choice for the values of the parameters. first make an
Call these ~O' ~l'
0.4,
o
~p'
Use these values to evaluate the following p + I equations:'
-e i=l
00 ~. ~O ~l ~~OO~O ~l ~p Xli •.. X eX, ••. X ,

.L
p i.
pa
-27-
( eX i=l
00 ).10 ).11 ).1 .•.. X ~ . H p~
0)2(
ln ~.
-IU
.. + ~ln~. -In lnx J1D.) L J j=l
(I C,
Notice that we now have p
).10 1-11
X.···X~e l~ p~
j.1·).10
0)
X ... · l~
1-11
Xp~ ~
).1
for k
1, 2, ... , p.
+ 1 linear equations in p + 1 unknowns,
D 0'
... ,
D p'
These equations correspond to the equations given in (10) Solve the equations for DO' 1 1 say, ~U' ~l' ...,
of Hartley's paper.
... ,
D. P
Next,
new values for the parameters,
must be selected.
Hartley gives a method of doing this for which the following relation holds:
. . . , j.1~).
For computing purposes, Hartley's method has been altered, but the above relationship still holds. In the computer method, the function
Q is
calculated as a function of a single parameter
t, namely,
for
£ = 0,0.1,
0.2,
...
, 1. O .
Let 1° equal the value of "', 1.0. 0 If 1°

=
for which
iQ£ is minimized
among £
= 0,0.1,
0, calculate
Q1 for £
= 0,0.01,
... ,0.1.
Recalculate 1
reducing the values of
1 by a factor of 0.1 each iteration, until ~O
0 (or until the values If 1° is not equal to
of 1 get smaller than Some specified number t).
tIn this case consider the iterative process finished.
-281
zero, calculate ~ 1 \.1k
by
for
0,
1,
...
, p.
Now use the set (~~,
1\ ~i,.. , ~pJ .
~n place of
(0 ~l' 0 ~O'
... ,
0\ ~p) in the
linear equations to calculate a new DO' D , ... , Dp' I
From this, cal-
culate a new set (~~, ••• , ~~) as above and continue this process until t t-l the ~k's differ from the ~k 's for all k by less than some specific value. Let the resulting values of the set (~~, ..., ~;) be denoted by
(~O'
~l' ••• , ~p) and end the iterative process. The resulting values of the parameters are (approximate) solutions
to the least-squares equations.
Moreover, convergence is always obtained, for Q. This is
and the solution represents at least a relative minimum
because of the procedure used for picking the new values of ~O' ~l' ..., ~
p
at each stage of the iteration. t-l
At each stage the inequality
Q (~O
'
... , ... ,
~;) converges to (~O' ..., ~p)'
holds,' and therefore as (lit .... 0' verges to a relative minimum.
Q con-
The assertions that the solution for the least-squares equations obtained is unique and is an absolute minimum for establish.
Q are harder to
In his paper, Hartley discusses a sufficient condition for It is that the matrix of second partial derivatives
these assertions.
of Q, with respect to \.10' ..•, Up' be postive definite for all values of the parameters. In general this will not be true. In fact for the
regression function discussed in this study, it has not been proved, but it can be aserted that Q is minimized over the largest connected
-29-
region of the parameter space containing (~O' ..., ~p) and having the property that the second partial derivative matrix is postive definite when evaluated at any
CuO' ... ,
u)
'p
in the region.
It will be positive
definite when evaluated at (~O'... ~p)' since this point represents , at least a relative minimum. Furthermore, (~O'
. .. , ~
) will be the
only solution to the least-squares equations in this region. For the case under consideration, it can be shown that if X p1 .
~p
Y. > 0
1
for
1, 2, ... , n,
then the matrix of second partial derivatives of Q is postive definite, Hence if this holds for (~O'
... ,
~),
p
then the estimators represent
an absolute minimum of Q and the only critical point for Q in the region of the parameters (~O' ••. , ~p) that satisfy the above inequalities, If the inequalities do not hold for
enO'
.•• , ~p)' one should take
initial estimates of the parameters from the above region and obtain the solution in the region. For solutions outside the region, (no gen-
eral argument can be given at this time for proving uniqueness or absolute minimum properties on some other region. (For further dis-
cuss ion of the sufficiency of a postive definite matrix of second partials and for a proof of positive definiteness of the above region, see Appendix D.) It should be noted that it seems to be a reasonable assumption that the hypothetical regression function lies in the region described above. For it seems that if the relationship between Y and Xl "', Xp
being studied here is valid, then the historical data should satisfy ~O ~l ~ 2e Xl' •.• X ~ - Y, > 0 for i
1 p1 1
1, 2, .."
p.
This should certainly
-30-
be true when Y is not too close to zero. establish the criterion
Hence
it seems reasonable
to
,.
Xp1 .
as a necessary condition
I-Lp
Y.
>
for i
l,2,
.•. ,n
for a good fit.
DISTRIBUTION
OF THE ESTIMATORS model, before answering problem 2 of the Just
As in the multiplicative Introduction, the distribution
of the estimators
must be found.
as the solution
for the estimators case,
was more difficult so also will difficult,
for this case of the dis-
than for the multiplicative tribution of the estimators Gabler
the finding
be more
and only approximations this problem utilized
can be obtained. for the general
and Madansky
[2J have studied

model.
least-squares
regression
The equations of their model. estimating
in this study represent Because the estimators
a direct
specialization
of the nature
of the least-squares
procedure,
" !-LO' !-Ll'. ",!-L ) e"" p

unbiased, i.e.,
are approximately
norma 11y distribute d
and approximately
for
0,
1, 2, ... , p.
The word "approximate" represent
must be used in two senses. solution
First
the estimates equations they are
only an approximate solution
to the least-squares Also,
due to the iterative only approximately regression equation.
for these estimators. of the nonlinear
unbiased
because
form of the
By using
the Gabler-Mandansky matrix
model
[2], one can obtain
estimates
of the variance-covariance
of (~O' ~l'
... , ~ ).
p
Since the
-31-
application it appears labeled plying
involves in Appendix
a great E.
deal of matrix
algebra, is p
the discussion
of
The matrix
derived
+ 1 X P + 1 and is
is found by multi-
R.
The estimated 2
variance-covariance
matrix
R by 0A'
To obtain replace
actual
numbers
for variances The latter
or covariances is given by
one must
"" T (Y. - Y.) 2

'--..J ~
J.
...2
°A
-'-n~- -(""-p-+-l-) -
i=l
where
,.
Y.
"
i.
X.
pJ.
I-Lp
for i
1, 2, ••• , n,
PREDICTION
INTERVAL
As in the multiplicative be used to offer function of (Xl' a measure ~l Xl

'"
case,
a prediction
interval
for
Y will
regression
of the "goodness" as a predictor
of the estimated of Y.
Y
X2,
e ..•
~o
I-Lp
p
So take any new values
, Xp)'
Let
Y - Y.
...
Then D is a random distributed.
variable.
For large n it is approximately value and variance
normally
The expected
of D are given by Y
ED
and
EY - EY
/'p - E
P
Var D = Var Y
+ Var Y
...
0A
+ Var Y.
...
-32-
Note there is no covariance term in Var D since Y is independent of
~a'
... , ~p
and hence of Y.
'" The expected value and variance of Y can be approximately obtained
" from the first order Taylor series expansion of Y around the theoretical
regression line. This is given by X pel + (A
p
i-L
i-La
- i-L)
+ In X (~
I
- i-L ) + ...
I
+ In X
Then
Pp
(U - i-LP )J.
+ ... + In X
since for Also, Var y~(/ax~l
k
E (~
- i-L )]
P
u X p,
P
a,
1, ... , p.
••• x~p)2[var
Ua
p
+2
p
I
P
k=;
Cov (~a'
~k)
lnXk
I
k=l
Var Uk In
Xk +
II
Cov (~,
~j)
In ~
In Xj]
which can be approximated by
-33-
where
J.
n
We therefore have that D is approximately normally distributed with
and
Then
D
is (approximately) a standard normal random variable. · . 2 ...2 interva 1 can now be set up b y est~mat~ng a by a .
A prediction
However, as in the
multiplicative
case, this would introduce an additional random variable
whose effect can be controlled by the construction of an appropriate t-distributed random variable. This is accomplished by noting again that
...na2[
A
(p + 1) J
has a chi-square distribution with n - (p is independent of D
+ 1) degrees of freedom and
-34-
Hence
D
has (approximately) a t-distribution with n - (p freedom.
+ 1) degrees of
Now to set up the prediction interval, choose the probability, y, that is to be covered by the interval. Let
and t
be the a(lOO)-th percentile of the t-distribution with n - (p As in the multiplicative case t I-a
+ 1)
degrees of freedom.
-t ; hence
I"t Ja!(l
a
<
y2W)
< ta
=Y
or
<
<
ta
Jcr1 (1
y2W) }
Y.
The latter is the prediction interval for Y.
Thus a measure for the
"goodness" of the predictive ability of Y has been provided for this mode 1.
-35':'
IV.
COMPARISON OF THE MODELS
In the previous sections, problems 1 and 2 were discussed in relation to each of the two models considered. For each model, "best"
estimators were defined, a method for finding them was given, and a prediction interval for Y was found. Consider now the problem of deciding which model is best. If the
analyst has some idea of how the error occurs in observing Y, i.e., is it additive or multiplicative, the choice appears easy. model that corresponds to how the error occurs. has no idea of how the error occurs? He chooses the
But what if the analyst
He then is faced with the problem The
of picking "best" estimators independent of what model is used.
choice of a criterion of what is best then affects the choice of models. One way of handling this problem is to look at how the "best" estimators were obtained for the two models.
I-Lp
For this discussion let
X .. In the additive model the estimators

p~
were picked to minimize

( 1)
i=l
I
(In
(Y.
- f.)
and in the multiplicative model the estimators were picked to minimize

n
( 2)
Y. - In f.)2.t
i=l Now (2) is equivalent to
* tIn this case, ~O should be replaced by ~O in order to explicitly

include both cases, case, Expression (2) really only represents the median
-36n
(3)
i=l
en
Y.]2
f:
So in the additive model one is interested in controlling the size of the actual deviations of Y. from f., while in the multiplicative model ~ ~ one is essentially interested in controlling the relative deviations of Y. from f .. The analyst therefore may wish to choose the model (define ~ ~ "best") depending on what type of error he is most interested in controlling. For observations and predictions in a small interval of Y
values (Y values of the same magnitude), the additive model might be preferred. For observations and predictions in a wider interval of Y
values, the multiplicative model might be preferred. Other criteria are available for the choice of "best," but their For
meaning as far as which model is more correct is hard to interpret. instance, one might use the model that gives the narrowest prediction interval for the given values of Xl' X2, •.. , Xp'
Another possibility
is to choose the model that has the smaller estimated variance for Y. This is essentially the same as comparing the standard error of the estimates. The theoretical variance and the estimated variance are given in
the table below. VAR Y Model Additive Multiplicative Median Case Mean Case Theoretical
2
Estimated ... 2 °A ... .... 2 2 ... °M( e OM 2e Y ... 2 "'2( oM - 1) Ye (a)
°A 2 2 2 e °M( e oM Y
y2( eoM
- 1) _ 1)
1)
(b) (c)
-37-
" In the table, Y is equal to the estimated regression function, and the
estimated variances are given by
i=l
n
I I
.
(Y.
- Y.)
"2 ~
n - (p
+ 1)
A 2
in (a),
(In Y. - In Y.) ~ ~
n - (p
+ 1)
in (b) and (c),
i=l
where the summations are over the historical data used for the solution of the estimators and In Y. ~s defined as in the multiplicative model, ~ i,e., In Yi
A A
~O + ~l In Xli + ".
"*,,
independent of the values of Xl'
... , X p
+ ~p In Xpi'
"-
Notice that (a) is
(except for the historical
data), while (b) and (c) depend on the value of these variables through
y,
So choice of the model with the smaller estimated variance will Note that (b) and (c)
,,2
G
depend in general on the values of Xl' ••. , Xp'

"
are really the same, as the Y in (b) is equal to eM
'" times the Y in (c).
It should be noted that only one possible criterion of "best" was examined in each model. Many other possible criteria are available,
such as those parameters that minimize

n
[v,~ i=l
£.1. ~
Consider-
This definition might be appropriate for the additive model.
ing different criteria of "best" might lead to new comparisons for the two models. It should be clear that the choice of which model is best is still up to the analyst, and it is hoped that this section has helped to clarify the issues involved in making that choice.
-38-
V. INTRODUCTION
COMPUTER PROGRAM
The computer program discussed in this section was developed explicitly to reflect the basic concepts presented in Sections I through IV of this Memorandum. Hence, a detailed description of the
underlying principles of the program will not be given here, since such a discussion would essentially duplicate the preceding presentation. This section will consider mainly the program restrictions, the
sequence of operations in the program, the input procedures for the program, and the outputs that result. However, for those readers who
may be concerned with the operating details of the program, a listing of the FORTRAN-IV program is presented in Appendix F. The reader will recall that the discussion in this Memorandum has been concerned with a regression function f consisting of p independent variables of the form
f
~o
~ •X P
p
where e
base of natural logarithms ~ 2.71828.
However, because of
storage considerations, the regression function in the computer program is limited to a maximum of three independent variables:
f
~o • X~l •
1
The reason for this restriction is that the machine has a limitation of approximately 1038 for floating-point numbers. As was shown earlier, For more than
not only f, but f2, must be calculated in the program.
-39the latter could easily exceed 1038 in

the function
three independent many cases. to a maximum
variables,
Hence,
the decision was made to restrict variables. are entered

in
of three independent
If three independent
variables
the input data,
the program has the flexibility together.

y
of considering
(a) all three taken
or (b) all possible
combinations order:
(seven) of the independent
variables
taken in the following

y
e e e e e e e
).1.0 ).1.1 Xl ' ).1.0 ).1.2 X2 ' ).1.0 ).1.3 X3 '
First pass through program Second pass through program
).1.0 ).1.1 f.L2 Xl X2 ' ).1.0 ).1.1 ).1.3 Xl X3 '

1-10
).1.2 ).1.3 X2 X3 ' 1-"1 1-12 Xz Xl X3 .

1-13
j
Seventh pass through variables the program the are entered, of considering either the two taken
1-"0
Similarly,
if only two independent
iprogram has the flexibility together,

y
-40-
or all possible combinations (three) in the following order:

y
e e e
!-Lo
!-LI
Xl '
!-L2
First pass Second pass X2 .

!-L2
!-La !-LO
X2 '
I-LI
Xl
Third pass
Lastly, if only one independent variable is used, then, of course, only the one combination is considered by the program, i.e.,
y
For any run in which more than one independent variable is entered in the input data, the program automatically considers all combinations of the independent variables, taken one combination at a time, unless a "flag" to the contrary is entered as an input on the title cardo In that case, only the combination of all in-
dependent variables is considered. With regard to notation of the variables and exponents, the following is used within the program:
y
and in the mean case of the logarithmic form,
However, since the W, X, Z notation might be somewhat confusing, or difficult to remember, the following has been adopted for output purposes:
·-41-
In the above notation B represents the exponent of the first independent variable actually being used during a pass (regardless of whether it was entered as the first independent variable), C represents the exponent of the second independent variable actually being used, and D represents the exponent of the third independent variable. Suppose that only the second independent variable eX) of three is being considered during the second pass through the program. Then, internally
the X variables are changed to W variables, and the program uses the relationship
y e A
even though in this case W actually represents the second independent variable entered on the input cards. The method by which the independent variables are manipulated this way within the program is as follows. in
The values of the first
independent variable are entered as WI., the second as Xl., and the ~ ~ third as Zl., where i represents each point.
r,
The W., X., and Z. values

~ ~ ~
used in the computations are generated during each pass from the input values (WI., Xl., Zl.), depending on which combination of independent
~ ~ 1
variables is being considered for that pass.
For example, suppose that
Y is being considered as a function of the second and third independent variables (entered as Xl. and Zl.).
~ 1
In this case, the machine performs
the following operations within the program for each data point:
w. ~
X. 1
Z.
1.
Xl., ~
Zl. , ~
1,
-42-
it then uses the following relationship for that pass through the program:
A
Y. ~
or, actually, Y. ~ (since all Z. ~
w~ ~
x~ ~
= 1.0).
Table 1 summarizes the operations for three independent variables taken in all combinations. RESTRICTIONS As indicated previously, the computer program makes use of several auxiliary subroutines, one of which is the matrix inversion subroutine. For the matrix inversion subroutine, the time required to invert an n
X
n matrix is approximately proportional to n.
Therefore it can be
seen that the time required for inversion increases very rapidly as the size of the matrix is increased. For a SO X SO matrix, the time re-
quired for inversion is about 1-1/3 minutes, and for a 75 X 75 matrix, the time required is about 4-1/2 minutes. Since the program requires
the inversion of an [N - (p + l)J X [N - (p + 1)] matrix, where N is the number of data points and p is the number of :independent variables, the decision was made to limit the maximum allowable number of data points in one set of data to 50 for the program. (A similar
situation also exists for the eigenvector and eigenvalue routine.) The minimum number of data points entered must be two more than the maximum number of independent variables being used. This
-43-
Tab Ie 1 SUMMARY OF OPERATIONS FOR THREE SETS OF INDEPENDENT VARIABLES

TAKEN IN ALL
COMBINATIONS
Input
Y. ,
1
Pass through Program

1
Internal Operations First independent variable:
Output
WI. , Xl. ,
1
Zl.
W.
X.
;:
1
;:
WI.
1.
1
;:
Z. Y.
2
1.
= eA
· if.
XC
1
· zD
A Y. = e
1
· Xl~
Second independent variable:

. W. = xi 1 1
X. = 1.
1
Z. = 1.
1
Y. = e
1
· W~ . X~ · Z~
1 1
A Y. = e
1
· X2
B
1
Third independent variable:
W. = Z1. 1 1
X. = 1.
1
Z. = 1.
1
A Y. = e
1
· W~ . XC · ZD
1 1
A Y. = e
1
· X3B
For note, see page 45.
[continued]
. -44-
Table 1 -- continued Pass through Program 4
Input Yi, wI. , Xl., 1. ~ Zl.

1.
Internal Operations First and second independent variab les: W.

=
1.
Output
WI.
X.
1.
= Xl. L
=
Z. Y.
5
1.
1.
1.
= eA
. if. · XC · ZD
1. 1.
1.
Y.
1.
= eA
. XIB
1.
X2~
1.
First and third independent variables:

W. a.
1.
WI.
~
L
X. = Z.
1.
zi .
1.
A
Y. = e
1.
wB 1.
· X~ · z~
L
1.
Y.
1.
= eA
X1~ ~
X3
C
L
Second and third independent variables:

W.
= Xl.
X. = Zl. ~ ~ Z. = l. ~ Y. ~ For note, see page 45.

=
. WB · X~ · Z~ ~ ~
L
A Y. = e
X2B ~
X3~
[continued]
-45-
Table
I -- continued
Input . Yi, WI. , Xl ~ , ~ Z 1.
Pass through Program
Internal
All
Operations
Output
three independent variables:
W. ~ X.
Z.
1.
= WI. ~ = Xl. 1. = =
Z1. e
A
1.
1.
Y. 1. NOTE:
i
W~
1.
X~
1.
. Z~ ~
Y. ~
Xl~
1.
X2~ ~
X3D ~
1,
...
, N, N
50.
-46-
is because the degrees of freedom used in some of the statistical calculations are N - (number of parameters) , or N - (number of independent variables + 1). Therefore, in order for the degrees of freedom to be greater than zero, N must be at least two greater than the number of sets of independent variables being used. The program is also restricted to values greater than zero for all dependent and independent variable data points being considered, since the logarithm of each such point is calculated. If any of these data
pOints were to have a zero or negative value, its logarithm would be meaningless (i.e., could not be calculated), and the run would stop. SEQUENCE OF OPERATIONS As many runs as are desired, or as time permits, may be made by the program. The machine reads in one set of data at a time, performs all
of the calculations and resulting printouts on that set of data (for up to seven passes through the program), and then reads in a new set of data. When there are no more sets of data to be read in, the run is
terminated automatically, The program consists principally of a main routine, which in turn calls most of the subroutines as required, The program is so
structured that the logarithmic model (multiplicative error assumed)
-47':'
is treated first.t
The solutions that are obtained for the parameters
(A, B, C, D) in this case are then used as the starting points (initial guesses) for the exponential model (additive error assumed). It may
be recalled that the logarithmic model gives exact solutions, whereas the exponential model is based on an iterative, correction-type operat10n.
• :j:
In this operation, initial guesses are made for the parameters, and then corrections, guaranteed to produce convergence to a solution, are made to these initial guesses. This process is repeated over and over again
until the change in the value of each parameter from one iteration to the next becomes equal to or less than some predetermined value (10
) the program. . At th at p01nt a so 1'· ut10n 1S assume d .tt -8
in
The general flow of operations is depicted in Fig. 1.
At the be-
ginning of the main program, an auxiliary subroutine SETSW (RAND W037) is called on to set the four control switches and trap limit counter. This is done so that control of the machine will be retained by the program if excessive floating-point traps are encountered. Next, all The
of the variables listed in common are cleared (set equal to zero).
program then calls on subroutine READ for reading in one set of data. In subroutine ARRAY the data are ordered with respect to the dependent variable (Yi) and are then transformed as required, depending on
which independent variables are being considered during that pass through the program.
tThroughout this section and also in Appendix F, the expressions "model" and "case" are used somewhat interchangeably. It may also be noted that the expression "logarithmic model" refers.to the model in which a multiplicative error is assumed (Sec. II), and "exponential model" refers to the IDodel in which an additive error is assuIDed (Sec. III).
:j:
See Sec. III.
ttSee [3].
-48-
SUBROUTINE SETSW (SETS SWITCHES SO THA T EXCESSIVE FLOATING POINT TRAPS ARE IGNORED)
SUBROUTI NE ARRA Y (ORDERS INPUT DATA, SELECTS APPROPRIATE ! INDEPENDENT VARIABLES FOR CONSIDERATION WITH Y)
SUBROUTINE PRINT (PRINTS SUBHEADI NG)
:~
CLEAR ALL STORAGE LOCATIONS IN COMMON
SET PASS COUNTER TO 1 M= 1
I I I I
SUBROUTINE READ (READS IN INPUT DATA, SCALES DATA I F REQUIRED, CHECKS DATA FOR ERRORS)
**
SUBROUTINE SEARCH (USED IN VARIABLE FORMAT OPERATION)
Yes
I I I I I I I II II II II II II II II II II II II II II
II II II II II II 1'- ____ I
SUBROUTINE MATINV (INVERTS MATRIX)
_~
~----
SUBROUTINE GRT (CALCULATES STATISTICAL T-VALUE)
!
SUBROUTINE AUX (SUPPLI ES FUNCTION FOR SUBROUTINE GRT)
L ':..-=. -:..
-=--...:-_-_-_
SUBROUTINE OUTPUT IPRINTS OUTPUT)
* This
subroutine is used in conjunction with auxiliary subroutine FPTCTL (see listing in Appendix F). If overflow occurs, control of the machine is retained by this program. Underflows are ignored.
• ·If there are no more sets of input data to be read, run is terminated automatically at this point.
Fig.I-Flow
of operations
-49-
--~
Yes
SUBROUTINE STATtS (DETERMI NES STATI STieS FOR EXPONENTIAL CASE)

I
-:~
-:-=:===-::.~
II 3 ~----..,
II II
SUBROUTINE EIGEN (OBTAINS EIGEN VALUES AND EIGEN VECTORS FOR USE IN DETERMINING VARIANCE-COVARIANCE MATRIX)
---r::\ ._- - ---0_j

I 1- -
L-----II~
SET ERROR DESIGNATOR TO ZERO. IERR = 0
INCREASE PASS COUNTER BY 1 M= M+ 1
Fig. l-Flow
of operations (cont'd l
-50-
Subroutine CALC is next called on to obtain solutions for the parameters (A, B, C, D) in the logarithmic model. However, it first
calls on subroutine PRINT to print a heading stating which 'independent variables are being considered. After solutions are
obtained for the parameters, subroutine CALC then initiates the determination of the Student's statistical t-value by calling on the auxiliary, general-root-finder (GRT) subroutine (RAND W008).t Following this are
all of the statistical calculations relating to the logarithmic model. Lastly in this subroutine, another subroutine, OUTPUT, is called on to print, according to prescribed formats, all of the pertinent data relating to the logarithmic model. Using the solutions from the logarithmic model as the starting points, subroutine EXPO is called on to obtain least-squares solutions of the parameters for the exponential model. If for some reason no
solutions are obtained within the 50 iterations allowed, a message to this effect is printed, and the machine continues on to the next pass. Subroutine STATIS calculates the statistical measures required for the exponential model and then calls on subroutine OUTPUT to print the pertinent data relating to the exponential model. The above sequence of operations may be repeated using another combination of independent variables for the same set of data. When all of the operations relating to the one set of data have been completed, another set of data is read in, and the sequence of operations starts over again.
tThe determination of the t-value is based on an approximation method described in [6J. (It should be noted that the GRT subroutine also calls on subroutine AUX to supply the approximation equation.)
-51-
Tables 2 and 3 summarize the functional operations of the program subroutines and the auxiliary subroutines required for the program. INPUT PROCEDURES The input cards consist of the following in the order in which they must be entered:
1.
Title card Order card Format card Data cards Blank card Need only to be entered first time if data for all runs are to be read in same format
2. 3. 4. 5.
Title Card The title card, which must always be entered for each set of data, performs two basic functions. It gives some initial instructions to If the variable
the machine, and it contains the title for the run.
format cards (the next two cards) are to be read in for this run, a "1" is entered in Col. 1; otherwise this column is left blank. Column 2 is used to indicate whether all possible combinations of the independent variables are to be considered (Col. 2 left blank) or whether all of the independent variables are to be considered together ("1" in Col. 2). For example, suppose that data for Then, a Ill" in Col. 2 will
three independent variables are entered.
cause the machine to consider only the follOWing relationship:
-52-
Tab Ie 2 SUMMARY OF SUBROUTINES IN COMPUTER PROGRAM
Subroutines MAIN ROUTINE READ ARRAY CALC (logarithmic model) (multiplicative error assumed) PRINT AUX
Functions Controls entire program. Reads in data. Orders data and selects independent variable(s) to be used for each pass. Obtains solutions of parameters A, B, C, D and calculates associated statistics. Ca11s subroutine OUTPUT for printouts. Prints headings stating which independent variab les are being used. Supplies approximation equation to general root finder auxiliary subroutine for determining statistical t-value. Obtains solutions of parameters A, B, by iterative method.
EXPO (exponential model) (additive error assumed) STATIS OUTPUT
e,
Calculates statistics for exponential case. Calls subroutine OUTPUT for printouts. Prints out results for each case.
-53-
Tab Ie 3 SUMMARY OF AUXILIARY SUBROUTINES
Subroutines General root finder (GRT) (RAND WOO8) Matrix inversion with accompanying solution of linear equation (MATINV) (modified RAND WOI9) Eigenvalues and eigenvectors of a real symmetric matrix (EIGEN) (modified RAND W021)
Functions Determines the Student's statistical t-value from the approximation equation supplied by subroutine AUX. Solves the matrix equation AX = B, where A is a square matrix and B is a matrix of constant vectors; A-I is also obtained. Determines all scalar solutions A.i (including proper multiplicity) , and optionally the associated unit vectors, Xi' to the matrix equation AX = A.X,where A is a real, symmetric matrix. Provides a means of setting the four control switches and the trap limit counter in FPT. For this program, it is used to preven t stoppage of the run if excessive floatingpoint traps are encountered. Image --picks up a word from a Hollerith list and stores it in a specified location. Search--matches a word with the words in a Hollerith list and stores the number of the word from the list. Generally used in a computed GO TO statement. W062 is used in reading in the order of the data from the order card. In conjunction with SETSW , provides a way of retaining control of the machine if overflows occur.
FPT utility (SETSW) (RAND W037)
Image and Search (SEARCH) (RAND W062)
FPTCTL
-54-
If Col. 2 is left blank, all seven possible combinations of the independent variables, taken one at a time, will be considered for this set of data. If data for only one independent variable are entered, Col. 2 may be left blank. Column 3 is used to indicate the level of significance to be used in the program. Four choices are available as follows: Level of Significance 0.01 0.05 0.1 0.2
Value in Col. 3 1 2 3
4
If Col. 3 is left blank, a level of significance of 0.05 is used automatically. Column 4 is not used. The next four sets of two columns each are
used for scaling the input dependent and independent variable data when required. This is done as follows: Column Location of Scale Designator 5-6 7-8
9 -
Variable Scaled Y (dependent variable) Xl (first independent variable) X2 (second independent variable) X3 (third independent variable)
10
11 - 12
A fixed-point number is used to indicate the number of places that the decimal is to be moved. A positive number indicates that the decimal
is to be moved to the right, and a negative number indicates that the decimal is to be moved to the left. For example, suppose that a "3" is
-55-
entered in Col. 6.
moved three places to the right, e. g., 50.123 621.7, etc.
in the right-hand column of the two-column set. examp Ie, if the

11
--
Then each input Y value will have its decimal point 50123., 0.6217
Care must be taken to enter all positive factors Thus, in the above
3" were entered in Col. 5 instead of Col. 6, the It should also
machine would read this number as "30" instead of "3."
be noted that the scale factor so entered applies to the entire set of the corresponding variable.t Columns 13-14 are also not used. the title or other identifying remarks. numeric) may be entered in these columns. An example of a title card is shown in Fig. 2, and a summary of the above information is given in Table 4. Order Card The next card (when used as indicated by a "1" in Col. 1 of the title card) indicates the order (from left to right) in which the data sets are located on the data cards. in Cols. 1-2,4-5,7-8, The order (alphanumeric) is entered Columns 15-72 are reserved for Any keypunch symbols (alpha-
10-11, and 13-14 (Cols. 7-14 may not be required),
The symbols used to show the order are as follows: Symbol

ID
Type of Data Identifier (alphanumeric) (if used) Dependent variable (required) First independent variable (required) Second independent variable (if used) Third independent variable (if used)
Yl
Xl
X2
X3
tpreferably, the range of values for each variable should be kept approximately the same and reasonably low so that excessive overflows are not encountered. Scaling does not affect the values of the exponents of the independent variables.
-56-
(11 -1
EST
k:UN t-jUt-1t:t;::k'
I I o 0 0 0 0 0 leo coo
12l .. 5 6 7 •
.. IIII I 0 0 I 0 0 0 I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 DOC
42 43 44 ~5 46 .7 48 IS SO 5: ~2 SJ ~ ~~
0 0 I 0 ..
0 DOC 0 0 0 0 0 DOC 0 0 0 0 0 0 DOC 0 0 0 0

!IS
~ 10 11 12 13 It 1!l16 ,7 11 It 2021 22 23 2~ 2S N 27 21 29 3Il ~1 32 13 J.I 15 36 37 3I!I19 ~"
sr ~
59 bO 6\ &2 '3 ~, 65 S6 6~ " U 70 71 72 73 '4 15 1& 11 7e 79 10
11111111111111111111111111111111111111111111111111111 22222222222222221222222222122222222222222222222222222222222222222222222222227222 33333333333333133133333333333333333333333333333333333333333333333333333333333333 4444444444444444~444144411444444444444444444444444444444444444444444444444444444 55555555555555515555515155515555555555555555555555555555555555555555555555555555
I 1 I 1 I 1111
1111
11 111111111111
6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 66 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 C 6 6 6 6 6 6 6 • 6 6 6 6 6 6 6 6 6 6 6 777777777777777777777777777777777777777777777777777777)7777777777777777777777777 8888888888888888888888888888888888888888888888888888888'883'888888888888888.88a8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 919

12 ~, S6 1
9 9 9 S 9 9 919
9 9 9 9 9 9 9 9 9 9 g 9 9 ~ 9 9 ~ 9 9 ~ 9 9 9 9 9 9 9 99 9 ~ ~ 99999939999
A() "
9 1~ 11 I~ 13 14 IS 1617 18 19 2tI:l
212J 24 2516 2~;6 2S:ro 31 J2 JJ 34 ~ 36 1;)II,j~
42'~
4.1
4:l 46
111M
see-
ee
IS.;OJ
~j
~2 5j)ll
~::.G
~J ~
e.
S999999
~.' 11 12;~ 14 1~ 16)J 18 1~
8()
Ii'~:
t;
OJ E4 6~ 615616e~?
Fig.2-Example of title card
-57 -
Table 4 SUMMARY OF INFORMATION ON TITLE CARD
Columns 1
Remarks A "I" indicates that the order card and format card are to be read following this card. If blank, no such cards are to be read. A "I" indicates that all independent variables taken together are to be considered. If blank, all combinations taken one at a time are to be considered. Level of significance (LOS) designator. "1" --LOS of 0.01; "2"--LOS ofO.OS; "3"--LOS of 0.1; "4"--LOS of 0.2. If blank, LOS of 0.05 is used automatically. Not used. Scale factor for dependent variable (Y) . If positive, must be entered in Col. 6. Scale factor for first independent variable (Xl) . If positive, must be entered in Col. 8. Scale factor for second independent variable (X2) • If positive, must be entered in Col. 10. Scale factor for third independent variable eX3) • If positive, must be entered in Col. 12. Not used. Title for run. May consist of any keypunch symbols (alphanumeric).
4 S-6 7-8
9 -10
11-12
13-14 15-72
-58-
Suppose that a set of data is to be entered, in which the identifier, say, the date, is in Cols. 1-6; the dependent variable is in Cols. 10-19; and three independent variables are in Cols. 20-29,30-39, and 40-49, respectively. Then, "ID" would be entered in Cols. 1-2
(since the identifier would be the first item of data to be read on the card); "YI" would be entered in Co Ls, 4-5, and "Xl," "X2," and "X3" would be entered, respectively, in Cols. 7-8, 10-11, and 13-14. This is shown in Fig. 3. Format Card The format card indicates where the data are located on the data cards. Again, this card is used only if a "1" is entered in Col. 1 This card must begin with a left-hand parenthesis For
on the title card.
in Col. 1 and end with a right-hand parenthesis before Col. 73. the above example, the format card would read as shown in Fig. 4. Data Cards
The data cards (~ 50) contain the input data relating to the dependent and independent variables and one identifier (if used) for each data point. The location of the data on these cards must be in
exact agreement with the information entered on the format cards, or else they will not be read in properly. The numerical data (dependent
and independent variables) must be entered as floating-point numbers, and the identifiers (if used) are entered as alphanumeric data. As
stated previously, any set of numerical data being used must not contain any zeros or negative values. An example of a data card, for the above
example, containing an identifier, a dependent variable, and the three corresponding independent variables is shown in Fig. 5.
-59 -
..
I n 1"'1 ;':1 :<2 ~::::
00010010010010000000000000000000000000000000000000000000000000000000000000000000
1234~ilI11DllaUM~.n.~Mnnn~~nv~~~~~~~~sn~H~~U~~%.~q4IY~~~~~~~~~~~~~~~M&1NI,mnnnM~~nnn"
11111111111111111111111111111111111111111111111111111111111111111111111111111111 2222222222122222222222222222222222222222222222222222 33333333333:31333333333333333333333333333333333333333333333333333333333333333333 2 2 2 2 2 22222222222222222222222
4144444444444444444444444444444444444444.444444444444444444444444444444444444444
55555555555555555555555555555555555555555555555555555555555555555555555555555555 6 6 6 &6 & 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 &6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 77 717 717 717 7 7 7 7 7 7 7 7 7 7 7 7 77 7 77 7 7 7 7 7 7 7 7 7 7 17 1 7 7 7 77 7 7 7 7 77 7 7 7 7 7 7 7 7 7 77 111 7 7 7 7 7 7 77 7 7 7 8881888
123455
B 8 818 8 8 8 8 8 8 8 81111
B 8 8 88111 B IBII
B lIB
8 818 8 8888
8818
8818
8881118181188
8888
88 B a B
19~99999999999999929999999999999999999999999999999999999999959999999999999999999
'1910111213141~Koa.~n22n24~26n28~3D~32n~~~V~3940~42~"e4EU~49~~151~~5S~~~~9~Bl~2~~6S~S7EI6,~n1213~~n7J7a~SD
Fig.3-Example of order card
(A6,
:;:,,:, 4>10.
(I)
I I 0 0 I0 0"
12l 4S•
II
, • t Ie 1112 1314 \~ 1117 lilt
0 0 0 0 I0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 DOC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2D 21 Z2 2J H 2':1 ZI Z7 21 21l1l1 1233:W 35 16 ~7 'JIl9 4141.2 0 ~ ~';. 4$ 'J., 49 so S~ 52 53 54 ~ 56 ~7 ~ S9 60 61 62 63 '" 65" 61 Ii 69 70 71 72 i3 14 75 n 711& 19.
11111111111111111111111111111111111111111111111111111111111111111111111111111111 22222222222222222222222222222222222222222222222222222222222222222222222222222222 33313131333331333333333333333333333333333333333333333333333333333333333333333333
14 4 4 44 4 4 4144 4 4414 4 4 44 4 4 4 4 4 4 44 4 4 44 4 4 4 4 4 44 4 4 4 4 44 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 4 4 4 4 4
55555555555555555555555555555555555555555555555555555555555555555555555555555555 66166666661666666666666666666666666666666666666666666666666666666666666666666666 7777 7117 7 7 77 7 7 7 7 7 7 71 7 11 7 11 71 77 77 77 77 7 7 7 77 7 77 77 7 71 77 77 7 77 7 7 77 7 7 7 7 77 77 7 7 7 7 77 7 7 1117 8 8 8 8 8 8 B8 8 8 B8
la 81a 8 818 B a B B lal8

11M !lOP'
8 9 8 8 8 B 8 B 8 8 B 8 B 8 B 8 B 8 B 8 8 8 8 8 8 8 R 8 8 8 B 8 8 Q8 8 8 8 B 8 8 8 8 8 8 8 8 8 8 6 8
9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 1 2 3 4 5 • 7 a '1011 Il U 141516111'" 20112~ 73" :'532721 2'!1 30 ,'12lJJ.C l5 J5 37:111 J! 40'1 U4J (4)4641"4'3~ 5152 S3~!lSS6 5151 S96Q Sl'~ &3601~1"1'~ 6~ '0 1172 u J4 15 16HU lUll
Fig.4-Example of format card
-60 -
i-iC
2.0/6
1 1
00010000000100000000000010000000000000000010000000000000000000000000000000000000
1111 111 111111111111111111111 1111 1111111111111111111 1111111
1 2l.5"'9101112IJI4ISII17~1'~~nn~~82r~"~~UD~~~»»~.41~Q~~.47~'9~~IU~~~~~?~59~1112U~~.~U"rol1nn~~nnn~.
1 1111111111
11111
1 1 1 111
2222222221222222222122222222222222222222222221\222222
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2222
33333333331333333333133133333333133333333331133333333333333333333333333333333333
444444 4 4 U 4 4 4 4 U H 4 44 44 44 4 4 4 4 U 44 4 4 4 4 4 44444 4 44 44 4 4 4 4 4 4 4 4 4 4 444 44 44 4 4 44 U 4 44 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5515
5 555555 555555S555555 5555 55 55555555555555555
66666666666661666666666666666616666666666666666666666666666666666666666666666666 7 7 71 J 77 J7 7J 717 77 7 77 7 7J 71 77 71 77 717 7171 7 7 71 7 71 7 71 7 77 77 77 7 7 7 7 71 7 71 7 7 7 7 77 7 77 77 7 71 7 77 8 8 8 8 8 8 8 8 8 818 8888888
slis
n
8 8 8 8 8 8 8 8 818
2'
8 B 8 8 8 8 8 8 8 18 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 B 8
35 r; 31 38 3'S 40'4; ,; '3
4..t
e8 B 8 8 8 8 8 8 8 8 8 8 8 B
jig
Is
I2l
9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 19 9 9 9 9 9 999999999999999999999999999999999999999999999999999
45
ti
1 I ! 1011 12 13 14 1516 11 18 I' 2'0 21 n Iii'" ~09'
2'5 27 2129))
3: 1lll:).4
4~ 46 41 '" 49!iJ~!
~2 53 ~ ~5 S;~] 51 ~S 60~ .f>1 ~I I;:J
'"
6~ ~~ '7 ~
tc ~I 7213 u
7~',
78 79 eo
Fig.5-Example of data card
-61-
Blank Card A blank card is used to signify the end of each data set. Con-
sequently, this card must always be present at the end of each set of data cards, Summary of Input Cards Figure 6 shows the order of data cards for two runs in which the second set is to be read in the same format as the first set. OUTPUTS Examples of the outputs that result from this program are shown in Figs, 7-11, The particular data shown have no meaning whatsoever,
and are presented only for the purpose of displaying the outputs. Since most of the headings are quite self-explanatory, no detailed explanation will be given. Figures 7-9 show the outputs relating to the logarithmic case. The scale factors (if used) are printed beneath the main title. For
this problem the first set of independent variables entered (Xl) were scaled up by a factor of 10 (i.e., 101). down by a factor of 10 (Le" 10
-1
The second set were scaled
),
Thus, "Base 10 Exponent I," for
example, is read as 101, or 10.
The scale factors are printed only
on the first page of the output (logarithmic case), but, of course, they apply to the data for the exponential case as well. The values under the last two headings on the third line: EXP(A)
(MEDIAN)
EXP(A)
(MEAN)
are, respectively, e
for the median and mean cases.
(Values for A are
,:".62- .
~Sets of independent variables + 2)~ N 2~50
Blank card for second set of data
I
N 2 data cards for card
Title
/
I
/
Blank card for first set of dato
Second data set (this set is to be read in same order and format as first set)
'Set> of independent variables +2)~Nl~50
First data Format card
set
Order
card
Title
core for first set of data
Fig.6-Arrangement
of data cards for two runs
-63u. e
>c e
U
o•
III
~ ~ ... ~
0 0
0
101 00
>
.., .., ..,

III
VI
00 ...
0'"
~
0
> ou
101
... ...
!<\
'"
Z .... C1' III 0 C1' 101
...
0
00
0'"
co
>D
w_ ><~
- ....
cz
11..101
.... ...
f1'
...
CJ.)
_..
-0
E
u -.z ec .... !<\
11..0 ><101 w:l: 0
0
>
0""
0 ...
~ ;,
IoU Vl .." W CI
co
101 :II:
0 0
0
>
...
0
co
.,. .,.
...
00
'" ue
ua:
xw
'"
oZ >CI ww
Ol[
... ...
~ co ....
N
..c: ........
:....
E
ro
0
en
C[a( a(
u; l[
....
C
..,
... >
t!)u
.... C[ 0..J..J
.....
'" '" '" I '"
......
0
.....
.,. ... ..,

III
.... 0
0
u
"'0
·
0
>u
..
co
0
>D
0 I
II..
...
..J ;, :II:
11'1 11'1
.., ....
C1'
WC Ow
l[
>-
Z .C[
.,. ,._
III
... .... ... ... z

I 101 Z
....
·
>
VlI.L
...
0
.., ...
0
...
... ...
C1'
C1'
::l
0 0
CJ.)
........ ::l a. ........
....
><D u
0
.
... z ... ... u z ...
u, U I.L
en a.
ro
In
........
:.... u.. t-
e
><
101 Z
0. 101 0
0. 101 0
..,
><
0 Z
ee
tr
0>-
'"
...
x ><
a..,
0
11'1 11'1
w~L1.i
... ... '" eo cc

Vl II<
0;'''' li..'" oZ ~
0 0-
V'>u. ...
.....
0-""
""
0
N N III
...
0
....
u
>""
o•
....
co
0 I
cc
oD
.!21
L.L..
Z
N oD
....
u,
t!)
II< W
:IE: => Z Z =>
cc
U I.L W
... ... ... ...

0 ..J
Vl
a:
U
I.L W
...
0
Z
....
>D
..,
0
...
...
Z 0 101
l[ o>w
a:
.....
Vl W
..
..
Vl
... u
..J
1"'\
)(
~
"-
"-0 ....
l[
wZ
~ .,.
.Q
u,
0 0
:l:1.L=>0
...
>-
'"
(3
0 U
>CI
. . ..,
11'1
...
0 ,._ 0
0
N N
'" ...
0 Z
Vl
Z
Vl
<!>
...
V>
...
..
... "" ...

0 0 0
v»z
Vl 0
:1: ... 1 ::'0<1'1
w ..
... .,.
oD
u, 0
0 0 0
C1'
If'
a-
0 U
> ...
.., .,.
0 I
0
-' >
~ ..
.....
w
:Ii:

V>
V>
....
eo
-64-
·. . . . . . . . .. . . . . ~ ~ _, .
111-A.ClE" CL
• 0 III:U~O
'"
II.
O'."OCOO'C!r"OO
,.,11)00 "'''''.011)11) '''''' ''''''' ... .. ... ......... O"'ID "'.0 0"''''''''''"'''''
............ p4NNN
a.W-
"' ... "" 0 ........ .011) """' ... O ID"''OO''' .. 'O''' W ....... ID,.,"" ... II)CD"''''OOOOO
Clli:U~'"
...
CI..
...
Z
O ................... "'NNNNNNNN
III
.,.,
W
OW-_'
:'ClE
o
>
W oJ
oJClli:oJW
· . . . . . . . . . .. . ..
NNNNNr'\Il"\I ...
.0 ,..,
...
C
U C
III
Z C
,""z-
OO'" "'Z
lE
III: C LL
OClCl.. U>W
•~ W LLCU LL-'" WIII:W
"'''''''.00"'...«>,., "' ......... .. "'

.,;t
.:r ...
Q) ~NII'~ ..... Y"\W"4"W"\O'(J'
"''''.0''0''.0'''-''''''''''''''0 "'a::IO"IO .................. NNNNNN~ 000
·. . . . . . . . . . . . . .
"'-0.(11)""''''''' """II\
''' ..
0.,.'''_''' "'.0 ... .0.00
VI
C'C U C'C
"'C W
....
~ '"
C
CIIi: .... 0"",0
....
W
CD .... ~CD-O N "'O~II\,... Clli:lEW "''OIl).,.O ..... '''N,.,..,,.,''' W ... O ............. .,., NNNNNNNNNNt\,j
"'C
"'0"'.0
....
0
..,
oJ
...J
::l U
""
""::l
•W
000000000000000
~LLCl
000000000000000
· . . . . . . .. . . . . . .
..
.... t'f'I..CJONI""'I..oNrt"lC:Cf'l"\Nf"I"'IU'\
..........
E
Q)
-0
1110-
""
>oJ WCl
OU
o E
000000000000000
OO .... 4"t\Jt'I"'o..."
>->I >-
000000000000000 II III
·. . . . . . . . . . .. . .
OOOO-I I I
u
U
_, 4
"' ... 'Oll'lN"',.,'O
.. "'''''''''"'Oll'l ~~NII\~~~"""'~ON~~~ ~~~~~~~OOO""'''''''''''''''''

...,........... O ............... ....c .... NNNNNNNt\I
>-
'"
~
OIDOO"''''OOONOO,,",''''''
O .........'CDN('f"IOCD('f"Ipo...O""t-.;t " -~I,f'\"OCD-C:OU"'lr,t'\O-N."OQj O-N~~~~CD~-N~~~~
-Nt""I~U'\"O,....aJC"-N""'~t,l'\"O
..........................
o
ro
"'C
W 0'1 0-
o ..... ooooco""'..nOr,t"lf'l"'lON
OO'O .... NNOOU'\,....ON~DM ONCLt"'l"OO~OO,....Ot\JI"""\\f\,..
~O~~~~~~O~~~~~N
~1"f"'I..r"Oa)(70
•••••••••••••••
.... ~..r..£)OOO-t'I"\l.f"I ........ .,...j .......... r\,INNr\.I
Vl
,_ ""
...
,_
Q
::l 0Z
.....
X
000000000000000 O"~"""O"'''''N-I.t\~OIl"'lOOI.t"\ O~NO..r...,.,....o..oO"O"'''''ON
_~~O~~~NII'I~"'''ID_''
..... ~ ........
00
en
u;
NNN("'\""'f'f"I-4'.:1'
>-
CC..;r..oa:;CD~O"OOO-''''''''''''''''N
O---'
O..cJC'f'-NOOON..oOlrll.l"\r"'\O CC~~O~O..oO""''''''-Nr.I'\CO-
•••••••••••••••
.... -
.... ""'NNNl'\lt'\It'\IN
oJ W ~
"'"
--------------I I t I I t I I I I I I I
~N~~~~~~~~
O~N~~1.f"I
.... ..... _ _ ....

I I
-65III
0
w-~OE"
!lfU"'O
• ~
1"I"''''I'-'''III..o",CC~O'O''''O''''
... ,_,.._O
~w~ ex: .... a. ~
'" 4 U
Z
4
W .... :'OE 0""'--'

_'CZ;....tW
a:U~~ ~
· '"
-o
W
O".,.'OCl)""O'IICOP""'fl.-4~
O ............... ~ NNNNNNNNN
·. . . . . . . . . . . . . .
... a)N"OOfil'\"'
NNt\I
NU'\
"'O',...O ...CIO"'''OO'''~..oo, "'''''''I'-CCClOO'O'OOOOO

C-----
N,_.,..~..:t'O
·. . . . . . . . . . . . . .
..... ..:t" .... ~CI:I,...,....f'If'\-D
''' ...
..... ~NNNNN"'N
...
"'''''''0''''''0''''1'-",..0,.....0..00 NU'\-O..,.O.,."'C" .... "'~II\,._CDO
..,. .. .,. .... ..,.NW"I ........ "'I(\~"'O'O" O''''II'\O,...CC''',...",,,"OO''''-CIO
>
W E
a:
~z00 ... -z .... W

~4U
...
e
c
""
,,"-ex:
Wa::W
U"\Q)O'O ... ....... NNNNNN~

000 """"
oeQ. U>-
·. . . . . . . . . . . . . .
Q) In
....
ro ro
U C
o o w ~ e
... ~
... e
a:~0«0 a:EW
W~'" Vl:J
• o W., 0
~,,"<I
"'-~
LI'IO-
O ........ -NNNNNNNNNNN 000000000000000 000000000000000
O'O"'..o,....,. o,-o..oCC"''''0' CI) .... N~CD"O Lt\O"'&r'I,... ....'''' " O'..oOOO'O-"'N"'''''''''''''''''''
Q)
·. . .. . . . . . . . . . .
.....
I
E
Q)
W ... >->I >-
. .
"0
OU
000000000000000 000000000000000 II IIIII
,.... t""I"OON~"ON..:t,....NN"'U'\ ..... OO,...,..:tt\lt""l .............. OOOQ-
·. . . .. . . . . .. . . .
o E
U _,
< >-
NI'-..o"'NN"'..o~O"'''''''Oll'\ ~~N~~~,_ .... ~~ONU"\~O' CI:I~~""~O"~OOC""'''''''''''''''''

••••••••••••••• O "'NNNNNNN
,....
x
O.-4-O"GONP"IOCCrf"\r-O.or-..,.. .... C"I"'\I.t\"OCC_,«lll\"'O
OClOOOO''''OOONOO''''''''' O
N..:t'..om
Ntt\
.... NI"f"I.,f'
",,,,.,...o..oClOO'''''''I''I''''''..o
U'\"O ......
CI)r:/'
-....
0..
::J ::J
It\.,,c
....
....
o o
Q)
ro
"0
!.....
01 0..
O .... OOOOOOU"\lI\OI.f'itt'lo CO"O-NNOCIl"Ipto..CNtf"\orn ONOIl'\...oOa;:OO-ONP"lI"r-
~O~.-~~~o~~~m~N
..... rt"I..;t..,gaor::!'
•••••••••••••••
........... -
...... ",..U"\o/)
CD0""""'" ............ NNN~
'"
l-
s:
0-
~ ""
-
Cl)f""I~O""O..oO""~"'N"'CC""" CID..:t'..c~~"'C7"'OOO--""""N •••••••••••••••
O ................... _NNt\lNNNNN
~ =
oJ
0"'",,,,,,,,,, . .... N~.~~~~~ ... .............. ~~
-------~~-~---I I I I I I I I I I I I I I I
-66zo
....... ~ II-WO
wQ(el
...Ow 0-1.0 1-'" ... :::l ee

0' 0'
...
0
... 0
u
o",z
u-
1,,)0:::l
e e
0
...
.0
>el 0
..,
001-
... z-z
Oel~
...ell.>
1,,»-
• I- W
• ~
00 0
... -ez: w« ....
..
0 0
... 0
U
>el 0
... ... ..,

00
'4'
,·
0,.)
W «1-OelO «:ltW
",-I-
.rN
...
CD
........ ",oc
0
W ...'" "':::l •w ~
0
>el 0
.... ...
N
.0
· ·
""0
E
ro
+-'
Z el
:It
e0 00
.r..
.,.
w
l-
>
00 00
....
'" ...
11'1 0
N
C
OJ
w:::l
:It
>-
"'''' .. '" I.>el

zw w zw
0> w0..0 0
"I1-",
'"
Cl. X
OJ
-'"
elO
.... '"
1.> ... 0..0
I-
.>-
1.-'>
>w
. . ... ...
0 0' 0
... ..,
-D
co
01.> 00 I-
>
....
11'1
00
11'1
,.,
0
+-'
+-'
Cl.
'"
>
el
... ...
'"
I
N
w OCD
00 IIII
... ... · ... ... ... .... .. .. ... · ·

....
el
W
OJ Oi
ro
Cl.
0..
.0 0
s: +-'
0 LJ... 0
~ ~
,.,
)(
00
0 Z el
N
)(
...
)(
. . ...
I.>
00 '" ,.,
00
>
...·
'" ..
0'
01)
Oel
lIII
00
....
0
.r-
• e
... ...
11'1
Il'\
...
0 0
>u
I.>
.. ... ·· ·
I"0 .0 0 I
....-I
.Ql
LJ...
II!
CD :It
:::l Z
...
u,
CD
wl-w o =>1O-~
I-
>
• >--
...
N N
...
0
'" 0
• z:::>
0-
0..'"
11'1
Z =>
Z =>
IIlll-el
'" 0
>11> 0 I.>
·· ·
0
l"-
...
e
0 .0 0'
...
.0
N N N
... e
Z ~ 0
.... Z
z
l-
II,,)
el
W I-
'"
I-
'"
..
el
."
>-
..
00
-e
.0 0 0
.,,>
"'0
0>-
:.:
."
=>0
· .. ... ·
.0
N W
...
u
0 0
>CD
... · · ...·
I.>
0 I
.0 I".0
:::>
el :>
_,
'" ...
z
... ...
C I.-'
...
z:
I.-'
I-
I.>
Z el I.>
u,
III ."
0 Z
III
'" ...
u
I.-'
III III
'"
e
l-
CD
-67-
-. w-•
11'1
N~O"''''''''''cz)O'''OoD''''Nt''I II'IIZ) OO .... .....oONII'I N a.OlE:N O"'."OCCC7'O'OO a.W~ «: _, • NNN""NNNNN a. ... O .................
'" ... ,._ 0
O'.,c ........ II'INCI)O'OO'~~lt\f'I'\CC ....... IZ)-Q-Q-Q-Qo#"r<lII'INN
· . . . . . . .. . . . . . .
...1Z)1I'I ONo#" NNN .... OO'ONOC CO ..... ,... 11"1 NII"tIl'\>D
....,...
-. W-«U_O
'"
... W
,.....,."OON..,.-
-ON.o
ow_w _,cr::_,>
a.
20lE:
....,._('\,j.cOrt"ll.l'\,....ON~ .... NCI)"O""' a)fI"'Ilt'!,oClOCZ)O"IO"'OOOO ....--
Cl)"' ..... 0'

O
....
. . . . . . .. . . . . . .
..... f"",.,N"O~II\ ...
tna)
.... N.oIZ)CZ)
NNN~NNN
Q)
"0
-
011\"''''' N .... ,._CON CIlCOU"\ .. 0 ..... ao.,"OO"'OCDIt\.,oCPo.IJJ"'..,.,...
11"1
o o
U
N
-U'"\CI;)N1I'\r.... cttt'l"'l~'ONU"\NII'\ O""OO .... II'ICZ)N.o..:>O'NII'I.oCZ) 0'4"..0,.... CO 1:"0"0000 ......... -O __ ................ NNNNNNNN
·. . . . . . . . . .. . . . ·. . . . . . . . . . . . . .
O'I/"ILt'I
""""' C
Q)
C
.,;t-Q)NO..oOCX)~.CCO'NN""'O' >- .... ... .... O,....eo,...C'I"\Q)"O,.,.,Or"""lOCf"oa Z ."'NO"'OO,...~"'~OO >UJ
....
-W
~O~NNOIl'\ NOO" ....
CO N"ONII'\ ,....~ V'\ .... ,.,.,OO4"~ I I I
""
o Cl... x Q.)
Co.
-00"'0-000000000 I III
> ....
W>-
~O"''''''''''N''''-'''''OOOO-''''
000000000000000 I III I I I
>-
000000000000000
·. . .. .. . . .. . . . .
I
• • • • • • • • • • • • t •
t"'IO' ... O'oOmU"'oN N .... 0"'11'\ 0' NCD 4""" Ot\.lr"\ .... II'IC!'.o,,:>Oo#" 0'" "'-0 ON
....
-o
Q)
....
U
 ,..C!',....,C!'U'\CI''''CZ)~NC!'N .... CZ) Otl'\ff"I ....CCC:OO.NO"f#"i,..,U"\N,....
>-
O"~NU"\.,...N ........ U"'I,...ONII"I,.._O" CIl."' .... cz::O-c-OOO .... ............ •
c:
r--I
r--I
s: """"'
-I
ro c..
0'>
O_,-
.............
NNI"\lt\,fNN~N
000000000000000
CDOOOOoooooooooO
>
,...1""rt"lO..,.(,)<rOO..,.,....~NIL\~
CD.:t~(X)C%)"'C7"OOO
• • • • • • •
CI'.oC!'-NOOON.ooll'lll'l"'O ...... --N

t • • •
...
•
""'_NNNNNNt"\,ir\l
...J
W
Cf
_'N~~~~~m~
I I I I I
O_'N~~~ I I I I I I I
--------------I
............ "''''
I I
-68-
given on the top line.)
Each e
value thus represents the coefficient
or the intercept of the regression function (point at which values of independent variables are 1). The input data listed in Fig. 8 represent the actual data that are being used for that pass through the program. If more than one
pass is to occur, then this listing will change each time with regard to the independent variable. For this problem, this output represents
the seventh pass through the program. For the convenience of the reader, the calculated data shown opposite the input data are listed for both the median and mean cases of the logarithmic model (Figs. 8 and 9). The only differences in these
listings are the calculated Y values and the residual values (Y. lnput
As explained in Sec. IV on the comparison of the models, values of the standard error of the estimate (and hence the coefficient of variation) may be used to compare the additive and exponential models.t For example, a comparison of these values in Fig. 8 with the standard error of the estimate and the coefficient of variation in the exponential model (Fig. 10) shows that for most of the points, the exponential model gives lower values for this particular run. Figures 10 and 11 show the outputs for the exponential case. Fig. 10, a "COEFF. OF CORRELATION end of the top line. In
(UNADJUSTED)" value is listed at the
This value is not calculated for the logarithmic
case, since it would have no real meaning. In Fig. 11 there is a heading labeled "2YCALC. - Y," the significance
tIt may be noted that as explained in Sec. IV, the variance of Y 1 .. ca c in the logarithmic case varies with each point, whereas 1t 1S constant over the range of points in the exponential case.
-69-
of which is explained in Appendix D.
If all of the values under this
heading are positive for a given set of data, then the solutions of the parameters for the exponential case represent solutions that give
an absolute minimum for the sum of the squares of the Y deviations in the region of (A, B, C, D) defined by 2Y. - Y. > O. If the values \ ~calc ~ are not all positive, a message noting this fact is printed. The identifiers are listed in the left-hand column under the heading "lABEL," As alphanumeric input data they are used to identify each However, as stated on page 58, they may be
data point when so desired. omitted in the input data.
The only other major difference in the output formats for the two cases is that a percent Y deviation is not given for the logarithmic case. Because of space limitations, it was decided to print
the Y residuals for both the mean and median cases and to omit the percent deviations. Table 5 summarizes the statistical calculations used in the two models.
FUTURE STUDIES
At the present time, attention is being given to one problem inherent in the exponential model (additive case): how to search a
region of data points for the solution that gives the absolute minimum for the sum of squares of Y deviations when a relative solution has been reached. This problem would exist if any of the values under the
heading "2YCALC - y" (Fig. 11) were to be negative. An example of such a problem is depicted in Fig. 12 for a oneindependent-variable case. Because of the grouping of the data, a
relative minimum could be found for either a straight line (~1
1),
-70-
Table 5 SUMMARY OF STATISTICAL CALCULATIONS USED IN COMPUTER PROGRAH Logarithmic Modela (Multiplicative Error Assumed) Exponential Modelb (Additive Error Assumed) L: (Y. _Y. ) '-1 1. 11nput 1ca 1 c N-(p+l)
N
2
Heasurement SEY:
of e sit Lmat;e
Standard error Median Case: SEYi=~i 2
~~==--::----::-- ) I (h2
calc where
N
A2 ·eo . eO -1 ,
SEY= where p
N = number of data points,
L: (1 nY . -1nY . )2 1. 1 n2 i=l 1nput ca I c N-(p+l) Mean Case:
number of parameters.
° -~,'=-~------~-----------
where
N
a2~i=1 CV: Coefficient of variation (percent) where
L: (lnY.
-lnY. )2 1input 1calc N-(p+l) CV=(y SEY mean where

N Y
CV
= __
SEY. ) (Y 1_ . 100 , mean

N
).100,
mean
L: Y. . :1= 1 1. npu t • a
N
L: Y. i=l 1input • mean

N
R:
Coefficient of correlation (unadjusted for sample size)
Not app licab Le
R=
SDEVY: Standard deviation of input Y Significance test of parameters (A,B,C,D)
SDEVY=
If
) L: (Y. -: . 1= 1 1. 1npu t mean. 1nput N-l
IStandard deviation of parameter \ Value of parameter
~ Student's t-value,
then the value of the parameter is considered to be significant.
aFor the logarithmic model, SEY and CV vary with each data point i. bFor the exponential model, SEY and CV are constant over the range of data points.
· -71-
~---7~J.Ll ~-T---J.Ll 1'=---J.Ll
== 1 >1
Fig .12-Example of th ree fits th rough a set of data usi ng curve of form Y = e lLo. X ILl
-72-
a curved line bowed upward (~1 < 1), or a curved line bowed downward (~1 > 1), only one of which would be the absolute solution to the problem, i.e., the one with the minimum sum of squares of Y deviations.
As more independent variables are considered, the problem becomes more complex, particularly for negative slopes.
The above problem illustrates the type of studies currently under consideration. Hopefully, as such studies are completed, their results In this
can be incorporated into the computer program and documented.
way, it is believed that this computer program may continue to reflect the latest methodologies this Memorandum. relating to the general problem discussed in
-73Appendix A MATRIX DEFINITIONS AND OPERATIONS
This appendix gives definitions and operations of matrices that have been assumed in the body of this Memorandum. MATRIX An array of elements (usually real numbers) A with m rows and n columns is called a matrix of order (size) m X n. denoted
by
The elements are

j
a .., where
1J
refers to the number of the row and
refers
to the number of the column in which the element appears.
Thus
mn
Any element a .. with i

1J
is called a diagonal element of the matrix.
EQUALITY OF MATRICES Equality of matrices is only defined for matrices of the same size. So consider two m X n matrices A and B with elements a .. and b .. , respec1J 1J
tively.
a ..
Then A
b .. 1J
if and only if
i
1J
for
1, 2,
... , m;
1, 2,
... , n.
That is, each element of A must be equal to the corresponding element of B.
-74-
MATRIX
Addition
OPERATIONS
This operation
is also only defined
for matrices
of the same size,
so let A and B be as above. and B, written tained by C

=
Then C will be said to be the sum of A c .. ob~J
+ B, if C is an m X n matrix of elements
c .. ~J Hence
a ..
+ b ..
~J
for
1, 2, ... , m;
1, 2, ... ,
n,
all a
C
+ bll + b21 +b
a a
l2 22
+ b12 + b22 +b
a a
ln
+ bIn + b2n +b
21
2n
ml
ml
m2
m2
a mn
mn
Scalar
Multi2lication of elements a .. and let ~J of

0'. 0' 0'
Let A be an m X n matrix real number.
be any fixed O'A,
Then the scalar multiplication each element of A by
and A, written
is given by multiplying
Hence
all
a etA
0'
l2 22
a a
ln 2n
O'a
ll 2l
O'a O'a
12 22
O'a O'a
ln
21
O'a
2n
amI Matrix
m2
a mn
O'a ml
O'a
m2
O'a mn
Multiplication with elements a .. , B be an n X p matrix

~J
Let A be an m X n matrix with elements
b ij , and C be an m X p matrix
with
elements
c ..•
~J
Then
-75C is said to be equal to A times B, written C AB,

if
c ..
lJ
for
l,
2,
... , m;
l,
2, .... , p.
Hence to obtain c .., the i-th row of A and the j-th column of Bare lJ multiplied together term by term and then added. by-term multiplication Notice that the term-
cannot be accomplished without elements being
left over unless the number of columns of A is equal to the number of rows of B. defined. If this is not the case, then the multiplication is not
Notice also that the resulting matrix, C, has the same number
of rows as A and the same number of columns as B, a consequence of the definition of multiplication. in general. if p In fact, if P Finally it should be noted that AB m, then BA is not even defined.
BA
But even
m, it is still likely that AB
BA.
SPECIAL MATRICES Square Matrix A square matrix has the same number of rows as columns. Identity Matrix. An identity matrix is a square matrix whose diagonal elements are 1 and off-diagonal elements are O. It is usually denoted by I. So the
identity matrix of size 3 is given by 10

I 0 1 0
0
0 1
-76-
Let I be the identity matrix of order n. that has the following properties: AI IB CI A B IC C for all m for all n for all n
X X X
Then it is the unique matrix
n matrices A. m matrices B, n matrices C.
Hence I is the matrix that corresponds to the number 1 in the real numbers, i.e., I has the properties in matrix multiplication that 1 has in real number multiplication. Transpose Matrix Let A be an m X n matrix with elements a ...
~J
Then the transpose
of A, denoted by AI,
I • a .. are g~ven
is defined to be the n x m matrix whose elements
~J
by
a .. ~J
a ..
J~
for
1, ... , n;
1, ... ,
m,
Hence the j-th row of A becomes the j-th column of AI. Symmetric Matrix A symmetric matrix is a square matrix that has the property that it is equal to its transpose. Hence if A is a square matrix of order n,
=
then A is symmetric if and only if A for i
AI,
i.e., if and only if a..

~J
a ..
J~
= 1,2,
... , n; j
= 1, 2, .•., n.
Inverse Matrix An inverse matrix is defined only for square matrices, be a square matrix of order n.
SO
let A
-1
Then the inverse of A, denoted by A
-77-
is the unique square matrix of size n that has the properties that I, where I is the identity matrix of order n. Hence a square matrix is However, just
to its inverse as a real number is to its reciprocal.
as the reciprocal of zero is not defined (as a real number), so also

A-I , ' 1S not a 1ways d e f'1ned f or every square matr1X.
If any row
(column) of A is a linear combination of the remaining rows (columns), then the matrix is said to be singular, and its determinant is zero. In this case the inverse is not defined. The method for obtaining the inverse will not be discussed here, as computer routines are usually available for this purpose. The
explicit methods can be found in any book on matrix algebra, such as [4J, for those who are interested. Positive Definite Matrix Let A be a square matrix of order n with real numbers for elements. Let a be a row vector with n elements (real numbers), i.e"
a=
... , an ),
Hence ~ is a 1 X n matrix and ~' is then defined as the n X I matrix (column vector) given by
~I
a a n
-78-
Let the real valued function f(~) be defined by
Now if A is such that f(~) > 0 for all nondegenerate~, a
i,e.,
1 (0, 0, ,.• ,0),
then A is said to be positive definite.
This concludes the matrix definitions and operations needed for this Memorandum.
-79 -
Appendix B STATISTICAL FACTS AND RELATIONSHIPS
This appendix is not meant to be a course in statistics but merely a statement of some of the statistical facts and relationships that have been used in this study and to which the reader might wish to refer. For clarification of terms used but not defined in this appendix or for further discussion of the ideas presented here, it is suggested that the reader refer to any good elementary book on mathematical such as Lindgren [5J. MEASURES OF CENTRAL TENDENCY statistics,
Two measures of central tendency have been used in this study. They are the mean and the median. Since the median is difficult to
define uniquely for discrete distributions, a continuous random variable X with density fx and cumulative distribution function Fx will be used to discuss the relationship between these two measures. The Expected Value or Mean of X The mean of X, denoted by E X, is defined by
Hence it represents a weighted average of the values of the random variable by the probability of those values (i.e., fX(X) dX). Median of X The median is obtained by finding the value, XO•5' of the random
-80-
variable,
X, that satisfies
J
Hence
0.5
fX(X) dX
Joo
X 0.5
fX(X) dX
0.5.
-00
it represents
the middle
of the distribution
of X; i.e., the that
probability X>X. O5
that X < XO.s to 1/2.
is equal
to 1/2 and the probability
is equal
That the two are not equal from the definitions. variable only through
for all distributions depends on the value
should
be clear
The median the way
of the random mass is spread
in which
the probability
over these values, through average the spread process.
while
the mean depends
on these values
not only the weighted are equal
of the probability It is true, however,
mass but also through that these measures
for symmetric
distributions
such as the normal.
PERCENTILES
OF A CONTINUOUS
be noted
RANDOM VARIABLE
is just a special case of the the
It should more general
that the median
concept
of percentiles section
of a distribution.
Using
notation
of the previous
the percentiles
of the distribution
of X are defined Choose
as follows:
any y such that 0 < Y y that satisfies
< 1.
Then the y(lOO)-th
percentile
of
X is that value X
J
The significance X
y fX(X) dX
=
y.
-00
of the y(lOO)-th
percentile
is that the probability
that
<X
is equal
to y and the probability
that X > X
is equal
to 1 - y. of the
It is easy to see that the median distribution.
is just the 50-th percentile
-81-
MEASURES
OF VARIABILITY
Variance The variance given by Var X of a random variable X is denoted by Var X and !is
It is a measure ful relationship
of the dispersion is given by
of X about
its expected
value.
A use-
Var X
Covariance The covariance they vary together. of two random It is denoted variables X and Y is a measure by of how
by Cov (X, Y) and given
Cov
(X, Y)
E [(X - E X)(Y
- E Y)].
If X - E X is large when Y - E Y is large, is small, then X and Y will have a large positive and large when
and small when Y - E Y covariance. If X - E X then
is small when Y - E Y is large, the covariance of X and Y will on the other,
Y - E Y is small, number. will
be a large negative
If one seems be near zero.
to have no effect A useful
then their covariance
relationship
is given by
Cov (X, Y)
E XY - E X E Y.
From Cov
this it is easy (X, Y)

=
to see that if X and Yare independence implies
independent,
then This
0, since
that E XY
E X E Y.
implication
does not hold
in the other
direction,
however.
-82-
Covariance
Matrix generalization of the variance matrix. of a single

2 •••
The multivariate
random a
variab Le is given by the covariance multivariate distribution,
If Xl' X ' matrix
, Xp have
then their covariance
is given by
Cov (Xl' Xp) Cov (X2' Xp)
Var X
LINEAR
COMBINATIONS
OF RANDOM VARIABLES equalities will help in computing the expected value,
The following variance,
and covariances
of a linear variables
combination
of random
variables. Then
Let X, Y, and Z be random E (aX
and a and b be real numbers. E (X
+ b) + b)
Z)
aEX 2
+b
and
+ Y) + Y)
Y, Z)
E X + E Y;
Var X
Var
(a_,'{
Var X
and
Var (X
+ Var Y + 2 Cov (X, Y); Z) + Cov (Y, Z).

then
Cov (aX
+ b,
a Cov (x , Z)
and
Cov (X
Cov eX,
Furthermore,
if the random
variables
in question
are all normal,
any linear combination
of them is also normal.
DISTRIBUTIONS
RELATED
TO TIIE NORNAL
Chi-square
Distribution
Let Xl' X ' ..., ~
be independent
=
standard
normal
random
variables
(i.e., E X. variable
0 and Var X. ~
by
1 for i = 1, 2, ... , k).

k
Let Z be the random
defined
=L
X~ ..
i=l
-83-
Then Z is a chi-square random variable with k degrees of freedom and

EZ k
and
Var Z
2k.
The t-distribution Let X be a standard normal random variable and Z be a chi-square random variable with k degrees of freedom. independent. Assume that X and Z are
Then the random variable T defined by
T = _X_
JZ1K
has a t-distribution with k degrees of freedom. It should be noted Hence for any
that the t-distribution is symmetric about the origin. y, 0 < y < 1, it follows that
t
-tl -y '
where t
and tl
-y
are the y(IOO)-th percentile and the (1 - y)(lOO)-th
percentile of the t-distribution.
-84- .
Appendix
C e 0 TO
CONVERSION
As mentioned regression
OF
!-L
!-LO
to have a
in the Introduction, of the form
it is often desirable
function
( 1)
(!-Lo > 0) ,
of a regression function of the form studied in this Memorandum,
instead namely,
( 2)
This involves
nothing
more
than making
the transformation
!-LO
in the original of !-L by
model.
It seems only natural
then to define
the estimate
~o
then the values of the and the will remain this
Furthermore, estimators prediction valid
if this definition ~1'
is adopted,
... , ~p' the distribution obtained
of these estimators,
interval
in the body of this Memorandum function.
for the new form of the regression the only problem that remains
So in adopting
definition, mine
in the conversion
is to deter-
the distribution
c: of !-LO. by looking at the first order Taylor series
This can be accomplished expansion given by
-85-
Then, by definition of ~O' it follows that ~

~O ~ ~O (l
+ ~O - ~a)'
"-
which is a linear function of the random variable ~O since ~O and are constants. But
~o
~o was
found to have a normal distribution (approx-
imate for the additive case), so
~o
is approximately normal since it is Furthermore,
approximately a linear function of ~o'
and
E ~O
...
~o
Hence
(approximately in the additive case).
Similarly, ~ 2 " ~2 ,.. Var !-L~ (~O) Var!-La~ !-Lo Var ~O O and
for i
1,2,
•.., p.
Using these approximations for the distribution of ~O' the conversion is complete.
-86-
Appendix
ABSOLUTE MINIMUM A~~ POSITIVE DEFINITENESS MATRICES OF SECOND PARTIALS GENERAL DISCUSSION
Suppose we have a function, that we are interested feZ). would With the usual in finding continuity f, of one variable, the value assumptions
OF THE
say Z, and suppose
of Z, say ZO' that minimizes on f, the first thing that
be done is to find the roots
of the equation
where
f' is the first derivative is called
of f with point.
respect
to Z.
Each root of in the additive maximum that repreto Z,
this equation model,
a critical
As mentioned
a critical
point may represent point.
a relative
or absolute point
or minimum
or an inflection
To find a critical derivative when
sents a minimum, denoted point by fU(Z).
one finds the second If this is positive then that particular
of f with respect
evaluated point
at the critical is a relative
in question,
critical
minimum. Take a relative minimum critical point and denote it by ZOo Then
and
Now if fU(Z) > 0 for all Z, then Zo is the only critical and Zo is therefore interval point around the absolute minimum of f. Similarly,
point
for f,
for any
ZO' say, Zl
< Zo < Z2' Zo will be the only critical

and will therefore be the absolute minimum i.e., for
of f in this interval
of f in this interval all Z satisfying Zl
if fUeZ) > 0 for all Z in the interval,
< Z < Z2'
-87-
The matrix
of second
partial
derivatives point when
is a generalization
of
the second derivative. matrix t of second
Any critical derivatives,
that has a positive evaluated
definite
partial
at that critical of second this
point, partials critical region
is a relative is positive point,
minimum definite
for the surface. for all points point
If the matrix in a region
including minimum
then this critical
is an absolute
in that
and is the only critical
point
in that region.
SPECIFIC CASE
The matrix of second partials for
Q is given below:
L
i"d
Ii (21i
¥i)
L.
i~l
fi
In Xli (2fi
- Yi)
L
i=1
fi
In
XZi (2fi
Yt'
L
1=1
fi
In Xpi(2fi
1i)
il
E
fi
In Xli (2f;
¥i)
j_=l
f.
In
Xli (2£i
-Y
L'"L
f.
In Xli
In X
2i
(2f
¥;)
1=1
f. In Xli 1
In Xpi (2f;
1)
I
I
!
I L.__
i=l
ri
f.
In Xpi (2f
¥i)
;~l
.c
nX
pi
In Xli (21i
Yj)
I
i=1
f.
In X pi
In
XZi
(2fi
Y i)
... 1:
!..=l
£i
In
X~i (:fi
Y, )
I
_j
I
In this matrix,
fi
= e Xli
!-Lo !-Ll
!-Lp X . for i p1 is
1, 2, ... , n, and the element
in the k-th row and the j-th column
It is assumed
that this matrix
is nonsingular.
Now assume
that
tFor definition,
see Appendix
A.
Q,-,
-88-
2f. - Y. > 0 for all i; then it is possible ~ ~ substitutions. For i 2 1,2, •.. , n, and f.(2f,o- Y.), ~ ~ ~
f~ ln
... j
to make the following = 1, 2, ... , p, let
1l0i
X ..(2f. - Y.).
J~
i. 1
Then the matrix of second partials becomes n n n n
i=l n
I I
n
rBi
i=l n
'fbi llli
I=L n
I
L
n
'fbi il2i
i=l n
'Tbi\i
illi
i=l
1H
i=l
2 illi
"
illi i il2
i=l
i=l
I
n
ilu \i
Define vectors
iI. = (n_., iiI"1 ... ,11 1 'U1

as
n
p1
.) for i
1,2,
... , n.
Then
C can be expressed
C=L
where
i=l
-89 -
the transpose of ~ .. Now take any p + I dimensional row vector

1.
Then
'"
aCa
~/
i=l n
I I
(~~~) (~.~/)
1. 1.
2 l,
i=l where b.
1.
j=l Therefore,
aj ~ji
a~:
1.
~.al. 1.
aC~'~ 0
for all a, since it is really the sum of squares.

1.
Furthermore, equality holds only if b. can happen only if (i) (ii) a is degenerate, i.e., a
for i
1,2,
... , n ,
This
(0,
...
, 0),
or
the rows of C are linearly dependent, i.e., there exists an
such that
a~~=
1.
0 for i = 1, 2, ..•, p.
But (ii) cannot happen, since this would imply that the matrix C is singular, which in turn would imply that the matrix of second partial derivatives is singular. Therefore for any nondegenerate a we have that
'"
aCa
"'I
> 0,
So if 2f. - Y. > 0 for all i, then
1. 1.
and hence C is positive definite.
the matrix of second partial derivatives of Q is positive definite.
-90-
Now take a given set of historical data.
Then 2f. - Y. > 0 for
all i on a certain region R of the parameters ~O' then depends on the historical data.
... , ~.) ~p
.. . , ~p .
This region
is in this re-
gion, then it corresponds to the critical point 20 in the one-variable model. The region R corresponds to the interval given by (21, 22),
Therefore (~O' ••• , ~p) represents the only critical point in this region, and it is the absolute minimum for Q in this region.

RM4879

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RM4879

Uploaded by

Copyright:

Available Formats

MEMORANDUM

MULTIVARIATE LOGARITHMIC AND EXPONENTIAL REGRESSION MODELS

STATES AIR FORCE PROJECT RAND

MULTIVARIATE LOGARITHMIC AND EXPONENTIAL REGRESSION MODELS

Published by The RAND Corporation

auxiliary subroutine that generates eigenvalues and eigenvectors.

-ivin the computer success preciated program.

of the development by the authors. are also due to

and are deeply

Mr. Charles H. Bush of the RAND Computer

for his development program.

used in the computer

••• , n), and (2) how to In order to

describe the predictive capability of the choice in (1).

usual logarithmic treatment actually represents the median form.

model is not exact and may not be unique.

(Y. - f.)2 can be found for those values

] that satisfy 2f. - Y. > 0,

observed value at each point for i ; 1, 2, ••• , n.

Section V, deals with program restrictions, procedures, and program outputs.

Distribution Prediction III.

Estimators •••••••••••••••••••••••• ••••••••••••••.•.•••••••••••••••••

THE ADDITIVE MODEL •••......•..•.•.••..•••..•.•••••..••••

Estimation Distribution Prediction IV. V.

•.•• , • •••• •• . .•• •. • • ••• • .• .. • • . • • . • • • • • . • ••• of the Estimators Interval ••..•.•••.••.••.•.••••••••••••.•.••

Percentiles of a Continuous Random Variable •.••...... Measures of Variability..............................

FORTRAN IV SYMBOLIC PROGRAM

a set of historical variable

data that includes variables

of interest, involved, related

and having the analyst problems: in the

lalready decided , is usually

on the form of the relationship with the following values closely

the "best" relationship.

for the unknown

In this Memorandum relationship:

for the following

The quantity regression

on the right function."

the "hypothetical (Xl' X ,

••• , X ), the values to be positive ~ ) , which

are given to the analyst error,

and are assumed (~O' ~l' of by an ... ,

and have no associated

and it is assumed sign if there were

The assumption the reason

the observation were

For if this assumption need is p

then all that the analyst of the form (Y., Xl"

1 data sets (observations)

~O' ~l' .•• , ~p exactly. (predict with no error)

with any values of (Xl' X2, .•• , Xp)'

The "best" choice of the param-

In the section entitled" The Multi-

In one type it is assumed

to the problem of least-squares estimation of a nonlinear form.t

algebra will prove beneficial to the reader.

tion of positivity on the constant multiplier.

THE MULTIPLICATIVE MODEL

, the corresponding value of Y is given by

where 0 is a log-normal random variable, i.e.,t In 0 E,

From these assumptions and because the hypothetical regression

ttFor further discussion of the log-normal distribution, see