You are on page 1of 7

Two-point estimates in

probabilities
Emilio Rosenblueth
Universidad Nacional A utfnoma de Mdxico 20, D.F., Mexico (Received April 1981)
Introduction In a multitude of problems uncertainty in data and in theories is significant to such an extent that a probabilistic treatment ought to be mandatory. Frequently though a deterministic treatment is preferred so as to remove the complications of a rigorous probabilistic analysis. Random variables are then replaced with point estimates, that is, each variable is replaced with a central value (expectation, median, or mode), or with one consciously biased so as to incur errors in the less unfavourable sign, and the estimates are treated as deterministic. Results are also expressed as point estimates without even giving an idea of their differences with the corresponding central values or of the magnitude of the resulting bias or dispersion. In decision making a trustworthy calculation of the first moment expectation - of functions of the random variables would often suffice. This paper develops a simple procedure for computing the first three moments which overcome the deficiencies of deterministic treatment by sacrificing the accuracy of a rigorous probabilistic analysis; the procedure also furnishes an approximate and an equally simple approach to Bayesian statistics. For functions of a single variable we propose to estimate the variable and its functions at two points, rather than at a single point as in the deterministic approach. We arrive at the estimate at 2 n points for functions o f n variables although the number of estimates can be reduced to 2n in certain cases of great practical interest. Exceptionally we would suggest the use of estimates at a larger number of points to improve the accuracy. In short, the method presented here gives results that are usually almost as satisfactory as those of a rigorous probabilistic treatment provided the coefficients of variation of independent variables do not exceed moderate limits, yet it implies no more than a modest increase in numerical complexity over that of a purely deterministic analysis. The method is expounded in a deficient and excessively succinct manner elsewhere, t'2 which is overcome in the present paper. Let Y = Y(X), X = random variable. When we are not interested in the distribution of Y but only in an approximation to its first few moments we can ignore X's probability density function and use no more than the corresponding moments. The solution will be independent of the distribution we assign X. Any distribution we assign it having the same first moments as the given distribution will fiirnish the exact solution when Y is a linear function of X, and we can choose X's distribution so that the solution be sufficiently accurate if Y(X) is sufficiently smooth in the 0307-904X/81/050329-07/\$02.00 IPC Business Press neighbourhood of the expectation of X provided X's dispersion is not too large. We will choose the fictitious distribution of X following this criterion and that of simplicity. If we are interested in only Y's expectation we can assume that the density of X is completely concentrated at X's expected value. The analysis will correspond to a point estimate of Y where X equals the expectation of X. It will be as simple as a deterministic analysis with the advantage that the meaning of the values computed will be explicit. We will obtain a first-order approximation to the expectation of Y which, however, will often be excessively crude. If instead of one we use two concentrations, equal to 0.5 each, placed symmetrically with respect to X's expectation, we can take into account X's first two moments, estimate those of Y with a second-order approximation to its expected value. If we omit the condition of symmetry we have enough parameters to take into account the first three moments of X and obtain a third-order approximation to the expectation of Y without thereby complicating the analysis beyond a two-point estimate of Y. In the text we will adopt the latter approach and derive the results of concentrations symmetric with respect to the expectation of X as a particular case.

F u n c t i o n o f one variable
Consider a real function Y = Y(X) of the real random variable X. (Capitals are used for random variables; the corresponding lower-case letters denote specific values of those variables,) Of the latter we know the first three moments. The ith moment is defined as:

~[i(X) = _[ xipx(x) dx
__oo

i = 1, 2

px(x) is the probability density function of X at X = x.

Mternatively w.~ can write the central moments:
oo

M[.(X) = f (x - .~)ipx(x) dx

i = 1, 2 . . . .

(l)

27 denotes the expectation o f X , MI(X). It follows from equation (1) that the zero order moment is
oo

f px(X)dx

Appl. Math. Modelling, 1981, Vol. 5, October 1981

329

Two-point estimates in probabilities: E. Rosenblueth This is always l . The first central moment, M~ (X), is'always nil. M j ( X ) is k n o w n as the variance o f X and is ordinarily designated o2; its square root, o, is X ' s standard deviation. M ~ ( X ) is k n o w n as the skewness o f X ; i f we denote it as vo 3, v is X ' s skewness coefficient. For symmetric distributions skewness is nil and the same holds for all other central n 0 1 2 5 10 ? 0.5O00 0.3333 0.2500 0.1429 0.0833 V 0.5774 0.7071 0.7746 0.8660 0.9199 v 0 0.565 0.8607 1.2831 1.5524

moments of odd order. We are interested in obtaining expressions for the exlSectation, standard deviation, and coefficient of skewness of Y, respectively Y, ry, and v~, andto do it independently o f X's distribution. To this end let us assign X an arbitrary distribution having four parameters so as to comply with expressions for the moment of order zero and for the first three moments ofX. A particularly simple function satisfying this requirement consists in two concentrations, Pt and P2, o f the probability density function px(x), respectively a t X =xz and x2:
p x ( x ) = e t 6 ( x - x O + P26(x - x2)

li

'
\ /n:O \ \ ~x/n = 1 \ ,n=2

t
\\
n =1 0 ~ 0 0

5(') is Dirac's delta function of variable (-). The distribution is depicted schematically in Figure 1. If we make ~i = ]xi - XI]o, i = 1, 2, the four expressions that are of interest become:
1

=PI +P2
~Pt ~e~

Curve
1

0 = ~lPt - ~2P2
] = v = +

~3tPl -- ~3P2
is:

Their solution
~2 =

2 3 4 6 6

0.3333 0.4000 0.4867 0.5333 0.6000 0.6667

V v 0.7071 0.5657 0.5401 04781 0.4403 0.1913 0.3853 -Q 1913 0.3800-O.4761 0.3536 -0. 5667

G = v/2 + x / ~ (v/2) 2
~l -

(2)
(3)

P,

~2
= ~ l (4)

P~ =

-e,
Y)PI +(Y2- Y)P2

(5)
(6) (7) (8) (9) (10) (I1)

With Yi = y(xi) the first three central moments of Y are:

0 =(Ylo=y= (Yt - Y)~P, + (Y2 - Y)2P2 VyO~ = 0 ' , -- Y)~Pt + (Y2 -- Y)~?=

From equations (5)-(8) we find:

F "-P~y~ +P2Y2 Oy":" Px/'P~tP2IYl-Y2I V v O r - (e2 - e~)(Yt -Y~)

-t---4
20 Et 0 X

Figure I Concentrations of probability density function

Results for Y are not very sensitive to v and computation of this parameter can be awkward. To facilitate its estimation as well as that of the coefficient of variation F = a/)~ through mere inspection, representative curves are given in Figures 2-4.

330

1.O

0.8

0.6

0.4

0.2

0 10 x 10 6

! i

Curve 1 2 3 4 5
6

v 2O00 1.414 1.155 6.185 1.623 0303

of which the appropriate smoothness conditions are met, and apply to each segment, the method proposed. This is especially useful for computing Y, as computation of the global o r and t,r is awkward and it may be preferable to obtain them through analytical or numerical integration. F u n c t i o n o f several v a r i a b l e s Generalization o f the foregoing method to functions of several variables requires solution of large numbers of simultaneous equations, many of which are generally nonlinear (see Appendix). For example, if we know the moments of the first three orders of the random variables and there are two such variables, we must solve 10 simultaneous equations, some of which are linear, others quadratic, and the res.t cubic. As unknowns we may choose the coordinates of four points in the plane of the random variables Xx, X2, as well as the magnitudes of the concentrations, at these points, of the joint probability function Px, x2(Xl, x2), which yields 12 unknowns. This number exceeds that of the equations, so we may arbitrarily choose the values of two variables or of two relations between them. With thee random variables the number of simultaneous equations rises to 20 and we may take as unknowns the coordinates of five points and the corresponding concentrations. This approach is awkward. We prefer to concentrate the density function at a superabundant number of points and impose conditions on their coordinates. If we take 2 n points when the number of random variables is n and we leave as unknowns the concentrations at all points and the coordinates of two of them not having coordinates in common, distributing the rest so as to form a rectangle, prism, or hyperprism, we obtain an adequate number of unknowns to satisfy the moments of orders zero, first, and second of the form
oo

3 x

Q.x 5~Id e

~6

5OO0OO x

100O00O

1500 0 0 0

Figure 4

(a) Gamma and (b) Iognormal distributions

When one of the central moments of X is unknown the number of simultaneous equaiions is reduced to three, so we can arbitrarily assign the value of one of the four parameters of the distribution of X. On the other hand, when t, = 0 the expressions obtained become simpler. They become: ~l -- ~2 = 1 Px = P2 = 1/2 Y - (112)(yl +Y2) o r - - (I/2) ly~ -Y21 whence: (12) (13) (14) (15)

f (xt
--oo

x,) 2

pxi(m) dxi
=

i = 1. . . . , n, pxi(xi) marginal probability density function o f x i. We thus, however, force the other third order moments without necessarily satisfying the corresponding conditions, but the sacrifice implies a significant simplification. The resulting equations are simple and can indeed be solved almost by inspection.

Vr "- lYl - Y2I/(Yl +Y2)

(16)
O= 4 [ ( 1 * ( V 1 1 2 ) 3 ] [ ( l * ( v 2 / 2 ) 3] En 0.1

These equations are valid whether v is unknown, zero, or is regarded as negligible. Compared with the results of expandingy/, i = I, 2 . . . . . in a Taylor series about X, multiplying both members by px(x), and integrating, which furnishes moment M i ( Y ) , we find that when the third derivative of Y(X) exists equation (9) constitutes a third order approximation (with relative errors of the order of V4), equations (10) and (14) are of second order (with relative errors of the order of Va), and equations (11), (15), and (16) of first order (with relative errors of the order of V2). However, the expressions obtained do not presuppose conditions of continuity or of existence of the derivatives of Y(X), although the accuracy of the approximations can seriously deteriorate when Y(X) is not sufficiently smooth. If Y(X) or its first derivatives have no more than a f'mite number of discontinuities and these are rmite, we can divide X into segments within each

P,2% -o

Pn P21"O

"_~210"2
C E120 _____# I

w~2~2.oL_ ~

'P, P22-0

" E120.! I

II
xI

01

_1

(1-p)/4

!(_~)/4

(1op)14 i_

LI

(1-p)/4

X~

Figure 6 Special case v, = ~2 = 0

and P(A) = P(A IX = x 0 Pl + P(A IX = x2) P2 = previous probability o f A . Calculation o f x t , x 2 , P~, and P2 proceeds as for functions o f a single random variable. Once P'l and P2 have been obtained we can compute the probability o f other experimental results. Generalization to more than one random variable presents no difficulties. Occasionally the 'experiment' consists o f a long chain of elementary experiments, all yielding the same result in an almost systematic way and its growth makes px(x) tend to values outside the interval between xz and x2. To cover this possibility it may be advisable to concentrate the probabilities at more than two points, placing the end points sufficiently far from X so that the values to which px(x) can tend will fall, with near certainty, between them.

Thus for the case Y = Y(Xb X2) we obtain the rectangle in Figure 5, where p = coefficient of variation of X1 and X2; in symbols Pii and ~ii the first subscript indicates the variable and the second one identifies each o f the values that the variable can assume;Pii and ai are computed as for functions o f a single variable. When X~ and X2 are not correlated, p = 0. If the variables are statistically independent, the solution in Figzlre 5 is correct even for all the third-order moments, since:
oo

Examples

Smooth function of a single random variable

Consider the following cases: Y = X a; X = 1; o = 0.1, 0.2, 0.5; v = 0, 0.2, 0.6, 0.8. Results are given in Table 1. We will illustrate their computation for the case a = 0.5, v = 0.4. From equation (2): ~1 = 0.2 + ~/1 + 0.2 ~ = 1.2198

= 1 + 1.2198 x 0.5 = 1.6099 dxj = 0, i , / = 1,2 In the special case vl = v2 = O, Figure 5 becomes Figure 6. The case in which Yis a product o f functions, each a function o f one o f the random variables, and the latter are statistically independent is particularly interesting. The following relations are then exact: From equation (3): ~2 = 1.2198 - 0.4 = 0.8198 x2 = 1 - 0.8198 x 0.5 = 0.5901 According to equation (4): P1 = 0.8198 1.2198 + 0 . 8 1 9 8 = 0.4019

Y = Y, Y2... Yn
1 + 3v~. + ~ r v ~ . = (1 + 3 v =, + ~,v~)
x(1 +3V~+v2V])...

(17)

and following equation (5): P2 = 1 - 0.4019 = 0.5981 Applying equations (9)-(11) we obtain: Y - 0.4019 x 1.60993 + 0.5981 x 0.59013 = 1.8000 or- - ~/0.4019 x 0.598111.60993 - 0.590131 = 1.9450

I + V~-= (I + Vx2)(l + V ~ ) . . . (I + Vn~), Vi = Vy/ (18)

x (l + 3V2n+ vnV3n),
vi = vri (19) Each function Yi may consequently be treated separately and results combined in accordance with equations (17)(19).

v r - (1/1.9450)(0.5981 - 0.4019)(1.60993 - 0.59013) = 0.400

Table 1 Statistics of Y, for a smooth function of a single random variable

B a y e s i a n statistics Let X be a real random variable whose probability density function we replace with concentrations P~ and P2 placed respectively at the points whose ordinates are X, and X2 (Figure 1). Suppose that an experiment (or observation) is performed whose result we designate by A. This will have modified probabilities P~ and P2 as follows, according to Bayes' theorem or formula of the probabilities of hypotheses:

a=0.1 ~/
ay

a=0.2 1.1200
0.6080

a=0.5

1.0300
0.3010

1.7500
1.6250

vy 0.2 Y ay vy Y ay
vy

0.4000

0.4000 1.1169

0 1.7748 1.7800 0.2002 1.8000

1.9450 0.4000 1.8076

0.4

e; = e(A IX = xt) e~
P(A)

i = 1, 2

(20)
0.8

Pt' = value o f P i in the light o f result A, P(A I X = xi) = probability that we obtain result A given that X equalled xi,

~( ay uy

0.6952 0.8000

2.2424 0.8000

332

a

X

1-

px'~x) = 1, 0 <~x ~< 1. The object is tossed n times, n = O, 1,

0.5625 O.75 = 5 ~ y 562

o o5
1 1.5

/
1

-x-1

o~"

1 x

I 2

1--

b
= - x

2
x

Figure 7 Data for second problem. (a), function ofx with discontinuous derivative; (b), probability density o f x .

2, . . . and in all o f them face exposed is a tail. It is desired to know the posterior distribution o f X in the light o f this eventuality. For the prior distribution o f X (n = 0), we c o m p u t e X = 0.5, o = ~ / i 2 = 0.2887, v = 0, ~l = ~2 = 1 , x l = 0.7887, x2 = 0.2113, P~ =/)2 = 0.5. After one toss we fred P~ = 0.2113, P2 = 0.7887, )7 = 2 x 0.2113 x 0.7887 = 0.3333, etc. Results are given in Table 2, where t h e y are compared with the exact solution. The approximation is satisfactory for small n. Although a very long series o f tails is a priori unlikely, ordinates XI and X= make the Bayesian t r e a t m e n t incapable o f adequately representing the results o f a long series o f tosses giving heads and tails in such p r o p o r t i o n that X tends asymptotically to any value appreciably higher than X~ or smaller than X2, which would n o t be too strange. To remedy this situation we will choose four points to concentrate the probability density function o f X: x = 1, 0.75, 0.25, and 0. F r o m s y m m e t r y , PI = P4 and/)2 =/)3. For the prior distribution, P1 = P= = 0.5 and 0.52P1 + 0.252P2 = o2/2 = 1/24. Hence PI = 1/18 a n d P 2 = 4/9. For n >~ 1, equation

We notice in Table I that Y is only slightly sensitive to ~, o r- is more so, and computation o f Vr is pointless i f v is erroneously taken as zero.
3

Pa

P.

Discontinuity in the first derivative

L e t y = 0 f o r x ~< 1 , y = x - 1 for l < x <<,2, p x ( x ) = ( 3 / 4 ) ( 2 - x ) x (Figure 7). We will be content to c o m p u t e Y. We directly obtain ,Y = 1, o = 0.4472, v = 0.5. If we ignored the break a t x = 1 we would find ~1 = ~2 = 1, x~ = 1.4472,x= = 0.5528,P~ =P2 = 0.5,y~ = 0.4472,y~ = 0, and Y - 0.2236. In contrast, proceeding b y segments, for x / > 1 we obtain ,Y = 1.375, o = 0.6778, ~, = 0.4015, ~ = 1.2207, ~2 = 0.8192, P~ = 0.2008, P= = 0.2992 (notice that P~ and P : are half o f the values yielded b y equations (4) and (5); this is due to M0 being 0.5 for the segment considered), x~ = 2.2024, x=--- 0.8198 and, extrapolating the f u n c t i o n y = x - 1, we f'md y~ = 1.2024,y= = - 0.1802. In the segment x ~< 1, y~ and y= are nil. According to equation (9), Y = 0.1875. This result coincides with the exact answer because in each segment Y(X) is linear and px(x) is quadratic.
2.6099

x2--2

1.5901

P22

1 -

Function o f two variables

Consider the example Y = XiX2,-~l a a = 1,-~2 = 2, o 1 = 0 . 2 , o2 = 0.5, t'l = 0.4, p = 0.4. Using equations (2)-(5) we obtain ~11 = 1.1050, ~12 = 0 . 9 0 5 0 , x l t = 1.2210,x12 = 0.8190, P11 = 0.4502,P12 = 0.5498, ~2t = 1.2198, ~22 = 0.8198, x21 = 2.6099,x22 = 1.5091 ,P=1 = 0.4019,P22 = 0.5981. According to Figure 5 we compute the concentrations shown in Figure 8. Thence Y = 12.4173, E Y 2 = 314.7407, so that o ~ = E Y 2 - ~,2= 160.5520, or- = 12.6709. Compare with the result o f assuming v~ = r2 = O. Using results o f example 1 we get Y = 12.1296, or-= 11.1252.

U 0

I 0.8190 X = I Xl

I 1.2210

Figure 8 C o n c e n t r a t i o n s f o r t h i r d e x a m p l e

Table 2 Probability that in toss n + 1 we obtain a head when the first n have come.out tails
n 0 1 2 3 4 5 (1) 0.5000 0.3333 0.2500 0.2222 0.2143 0.21 21
0.2113

(3) 0.5000 0.3333 0.2500 0.2000 0.1667 0.1429 0.0000

Bayesian statistics
An irregularly-shaped object has two faces, which we will designate respectively 'head' and 'tail'. A priori we do n o t know the probability X that on tossing the object the head be uppermost. In view o f this ignorance it is decided

(1) with two concentrations of PX(X) (2) with four concentrations o f P x ( x ) (3) exact answer

A p p l . M a t h . M o d e l l i n g , 1 9 8 1 , V o l . 5, O c t o b e r 1981

333

T w o - p o i n t estimates in p r o b a b i l i t i e s : E. R o s e n b l u e t h

(20) gives:

e;= y ei4
i

whence:

Y eixT(l -xi)
i i Substituting numerical values we obtain:

The method propounded lends itself to application in Bayesian statistics. It suffices to replace the prior probability density function of the variables of interest with concentrations in the usual way and to modify the magnitudes of the concentrations as a function of the statistical information, applying Bayes' theorem to compute their posterior values. In some cases it is advisable to introduce a larger number of concentrations to cover the possibility that statistical data make the probability density function evolve markedly toward values outside the range demed by the usual concentrations.

X=

( 3 1 1 6 ) ( I + 3 " - ' ) 1 4 n-' 118 + ( I + 3n)14 n

Acknowledgement The author thanks L. Esteva for his valuable suggestions and constructive criticism of this paper. References 1 Rosenblueth, E. 'Aproximaciones de segundos momentos en probabilidades', Boletbz del blstituto Mexicano de Planeacidn y Operacidn de Sistemas 1974, 26, 1 2 Rosenblueth, E. Proc. Nat. Acad. Sci, USA 1975, 72, (10), 3812 3 Benjamin, J. R. and Cornell, C. A. 'Probability statistics, and decision for civil engineers', McGraw-Hill,New York, 1970

noted that errors for moderate and large n decrease perceptibly

S u m m a r y and c o n c l u s i o n s Many practical problems would require a probabilistic treatment that is generally not carried out because it would be too time consuming if done rigorously. In this paper an approximate method is proposed, which is very simple and only sacrifices the accuracy slightly provided that the dispersions of the variables are not too large. The method allows estimating the first three moments of a function of random variables of which the first three moments are known. This is especially useful in decision theory, in which it is enough to estimate the first moment, or expectation, of the dependent variable. For functions Y of a single random variable X the probability density function of X is replaced by two concentrations. Expressions are available which furnish the magnitudes and ordinates of the concentrations in terms of X's first three moments, and thence it is straightforward to compute approximations to the corresponding moments of Y. When Yis sufficiently smooth the approximation to Y's expectation is third order; the one to its standard deviation and coefficient of variation are second order; and first order for Y's skewness and skewness coefficient. When Y or its first derivative have t'mite discontinuities the same expressions can still be used though possibly with too great a loss of accuracy. The situation is overcome by applying the proposed method to each of a series of segments. If Y is a function of two or more variables a larger number of concentrations are necessary. Their magnitudes and coordinates can be computed by solving certain simultaneous equations so as to satisfy the first moments of the variables, but the number of such equations increases rapidly with the number of variables, especially when the latter are not statistically independent. It is found preferable to resort to a superabundant number of concentrations located at the vertices of a rectangle, prism, or hyperprism. Solution of the simultaneous equations can then be obtained almost by inspection. With this artifice the solution is particularly simple when the tlfird moments are zero or are assumed to be zero. In a frequent practical case Y is the product of functions each of a single random variable and these variables are statistically independent. Two concentrations per variable suffice then and expressions are available for the first three moments of Y which are exact in terms of the corresponding moments of the functions.

Appendix C o n d i t i o n s to satisfy, c o n c e n t r a t i o n s , a n d n u m b e r of redundaut parameters Let n be the number of random variables, i the order of the moments of p, and N i the number of moments of order i to be satisfied. Then the number of conditions to be met, and hence the number of imposed equations, for all moments of p, from the one of order 0 to those of order k,

Table A 1 Number of conditions to be met n

No No+N t No+N~+N = No+Nz+N=+N ~

1
1 2 3 4

2
1 3 6 10

3
1 4 10 20

4
1 5 15 35

5
1 6 21 56

6
1 7 28 84

7
1 8 36 120

Table A2 Number of concentrations and number of redundant parameters

(1) 1 2 3 4 5 6 7 (2) 4 10 20 35 56 84 120 (3) 2 3.3 5 7 9.3 12 15 (4) 2 4 5 7 10 12 15 (5) 0 2 0 0 4 0 0 (6) 0 2 12 45 136 364 904

(1)n (2) N o + N I + N 2 + N 3 = number of equations imposed by number of moments o f p (3} ( N 0 + N t + N a + N 3 ) / ( n + 1) = lower bound to number of points wherep is to be concentrated (4) smallest number of concentrations o f p (5) number of redundant parameters (6) ditto when using 2n concentrations o f p

334

A p p l . M a t h . M o d e l l i n g , 1981, V o l . 5, O c t o b e r 1981

Two-point estimates in probabilities: E. Rosenblueth

is k
i=O

gi

E (,z + 1) i=o and the minimum number of concentrations is the first integer not smaller than this quantity. This number, times n + 1, minus the number of equations imposed gives the number of redundant parameters resulting from taking the smallest possible number of concentrations o f p . On the other hand if we use 2 n concentrations of the probability density functions we obtain
k

y g,--i=o-

(k+n)! k!n!

Table A 1 contains the values of this quantity for n between

1 and 7 and k between 0 and 3. For every point where the probability density function p is assumed concentrated we can write as many equations as the point has coordinates, that is n, plus one, the latter coming from the magnitude of the concentration. The number of parameters to determine is thus n + 1 times the number of concentrations. Hence, the number of points where p is to be concentrated should not be smaller than

( . + 0 2 " - Y. g~
i=0 redundant parameters. The results of these conditions for k = 3 are given in
Table A 2 .