You are on page 1of 19
Some basic results in probability and statistics This chapter contains some basic results in probability and statistics. It is intended as a reference chapter to which you may refer as you read this book. Sometimes, specific references to results in this chapter are made in the text. At other times, you may wish to refer on your own to particular results in this chapter as you feel the need. You may prefer to scan the results on probability and statistical inference in this chapter before reading Chapter 2, or you may proceed directly to the next chapter. 1.1. SUMMATION AND PRODUCT OPERATORS Summation operator The summation operator © is defined as follows: Yt hte ¥, a.) yy a 2 1 Some basic results in probability and statistics Some important properties of this operator are: (1.2a) Sk = nk — where k is a constant m (1.2b) VU+W= D+ DZ a a (1.2) S@+ ck) =nateS¥; — where a and ¢ are 1 i constants The double summation operator = is defined as follows: (1.3) im as 1 tet Yin) = Vay tees Yam + Yon Het + Yom Het + Yom An important property of the double summation operator is: » Ya DY 11 ro Az Ms (4) Product operator The product operator II is defined as follows: (1.5) [1% = M1 Yo Ys Yn 1.2. PROBABILITY Addition theorem Let A; and A; be two events defined on a sample space. Then: (1.6) P(A; U Aj) = P(A) + P(Aj) — P(A; 9 Aj) where P(A; U A,) denotes the probability of either A, or Ay or both occurring; P(A)) and P(A,) denote, respectively, the probability of A; and the probability of Aj; and P(A; 1 A;) denotes the probability of both A; and A, occurring. 1.3 Random variables / 3 Multiplication theorem Let P(A,|A;) denote the conditional probability of A; occurring, given that A; has occurred. This conditional probability is defined as follows: PAA) (1.7) P(A;|A) = P(A) y P(A) #0 The multiplication theorem states: (1.8) P(A; A) = P(ADP(A;|Ad) = P(A)P(Ai|A;) Complementary events The complementary event of A; is denoted by A;. The following results for complementary events are useful: (1.9) P(A) = P(A) (1.10) P(A, U Aj) = P(A; 9 Aj) 1.3. RANDOM VARIABLES Throughout this section, we assume that the random variable Y assumes a finite number of outcomes. (If Y is a continuous random variable, the summation process is replaced by integration.) Expected value Let the random variable Y assume the outcomes ¥},... ,¥ with probabilities given by the probability function: (1) FY) =PW=¥) s=1ee sk The expected value of Y is defined: k (1.12) EY) = > YSU) = An important property of the expectation operator E is: (1.13) E(a + cY) =a + cE(¥) where a and c are constants Special cases of this are: (1.13a) E(@) =a (1.13b) E(cY) = cE(Y) (1.13e) E(a+Y)=a+E(¥) Some basic results in probability and statistics, Variance The variance of the random variable Y is denoted by o?(¥) and is defined as follows: (14) o7(¥) = E{LY — EYP} An equivalent expression is: (1.14a) o°(Y) = E(Y?) — (EWP ‘The variance of a linear function of Y is frequently encountered. We denote the variance of a + c¥ by (a + c¥) and have: (1.15) o7(a + cY) = c?o7(Y) where a and c are constants Special cases of this result are: (1.15a) o(a + Y) = 0°(Y) (1.156) o(c¥) = c07(Y) Joint, marginal, and conditional probability distributions Let the joint probability function for the two random variables ¥ and Z be denoted by @(¥, Z): (1.16) 9%, Z) =PY=Y¥,NZ=Z) s=1,...,k t= The marginal probability function of Y, denoted by f(Y), is: (1.17a) f%) = DY ls, Z) a and the marginal probability function of Z, denoted by h(Z), is: k (1.17b) AZ)= DY e¥.Z) A The conditional probability function of Y, given Z = Z,, is , _ 8%Z) coca (1.18a) F01Z) = az WZ) #0; s and the conditional probability function of Z, given Y = ¥,, is: _ 8% Z) pe (1.18b) WZ,|¥,) = F%) FU) AO; ¢= 1,...,m 1.3 Random variables / 5 Covariance The covariance of Y and Z is denoted by o(¥, Z) and is defined: (1.19) o(¥, Z) = E{Y — EW)\Z — EZ} An equivalent expression is: (1.19a) oY, Z) = E(¥Z) — [EY)E(Z)] The covariance of a; + c\¥ and ay + c2Z is denoted by o(a; + c\¥, ay + c2Z), and we have: (1.20) ofa, + CW, ay + GZ) = c1ex0(Y, Z) where ay, a2, cy, c2 are constants Special cases of this are: (1.20a) O(\Y, 22) = ce, Z) (1.20b) o(a, + ¥, a, + Z) = oY, Z) By definition, we have: (1.21) of, ¥) = 0%) where o°(Y) is the variance of Y. Independent random variables (1.22) Random variables Y and Z are independent if and only if: 8%, Z) =fUMMZ,) 8 = 1... t= Lem If ¥ and Z are independent random variables: (1.23) o(Y,Z)=0 when Y, Z are independent (In the special case where Y and Z are jointly normally distributed, o(Y, Z) = 0 implies that Y and Z are independent.) Functions of random variables Let ¥;,..., ¥, be n random variables. Consider the function Sa,¥; where the a; are constants, We then have: (1.24a) (3 ax) = SiaE(Y;) where the a; are constants A A (1.24b) o(S at) = > Sao, ¥;) where the a; are frye constants 6 / Some basic results in probability and statistics Specifically, we have for n = 2: (1.25a) E(a,¥; + a2¥2) = ayE(¥,) + anE(¥2) (1.25b) o%(ay¥) + @¥>) = ato?(¥;) + ada7(Y2) + 2ayaro(¥;, Yo) If the random variables Y; are independent, we have: (1.26) o(S ax) = Sa?o%¥) when the Y; are it a independent Special cases of this are: (1.26a) oY, + Yo) = 6°(Y) + o7(¥2) ~~ when Y, ¥o are independent (1.26b) 0°(¥, — Yo) = 0°(Y)) + 0%) when ¥,, Y2 are independent When the Y; are independent random variables, the covariance of two linear functions 3a; and Sey, is: Ms a¥,, 5) ou) = Sacio(¥) when the ¥; are ae i independent (1.27) of mA Central limit theorem (1.28) If ¥),...,¥;, are independent random observations from a popula- tion with probability function (Y) for which 07(¥) is finite, the sample mean ¥: is approximately normally distributed when the sample size n is reasonably large, with mean E(Y) and variance o(Y)/n. 1.4 NORMAL PROBABILITY DISTRIBUTION AND RELATED DISTRIBUTIONS Normal probability distribution The density function for a normal random variable Y is: 1 (1.29) fY) = le 1.4 Normal probability distribution and related distributions / 7 where and o are the two parameters of the normal distribution and exp(a) denotes e*. The mean and variance of a normal random variable Y are: (1.30a) EW) =» (1.306) o°(¥) = 0? Function of normal random variable. A linear function of a normal ran- dom variable Y has the following property: (1.31) If Y is a normal random variable, the transformed variable Y' =a +c (a and ¢ are constants) is normally distributed, with mean a + cE(¥) and variance c?o?(¥). Standard normal variable. The standard normal variable z: _Y=# (1.32) z where Y is a normal random variable is normally distributed, with mean 0 and variance 1. We denote this as follows: (1.33) zis NO, 1) Mean Variance Table A-1 in the Appendix contains the cumulative probabilities A for percen- tiles 2(A) where: (1.34) Pfc <2(A)J}=A For instance, when 2(A) = 2.00, A = .9772. Because the normal distribution is symmetrical about 0, when 2(A) = —2.00, A = 1 ~ .9772 = .0228. Function of independent normal random variables. Let Y,,...,Y, be independent normal random variables. We then have: (1.35) When ¥;,..., ¥, are independent normal random variables, the lin- ear combination a¥; + a¥) +++++ a,Y, is normally distributed, with mean Da,E(¥;) and variance Za?o7(Y;). X? distribution Let z},...,2, be v independent standard normal variables. We then define: (1.36) XV) = 2h 4+ 234+--++23 — where the z; are independent The x? distribution has one parameter, v, which is called the degrees of freedom (df). The mean of the y* distribution with v degrees of freedom is: (1.37) EX?) = v 8 t Some basic results in probability and statistics Table A-3 in the Appendix contains percentiles of various x? distributions. We define y7(A; ») as follows: (1.38) P20) = 7(A; P= A Suppose v= 5. The 90th percentile of the x? distribution with 5 degrees of freedom is x(.90; 5) = 9.24. ¢ distribution Let z and y?(v) be independent random variables (standard normal and y?, respectively). We then define: (1.39) where z and x7(v) are independent ° ey v The t distribution has one parameter, the degrees of freedom v. The mean of the t distribution with v degrees of freedom is: (1.40) Ele] = 0 Table A~2 in the Appendix contains percentiles of various t distributions. We define 1(A; v) as follows: dab Plt(v) = (A; D} =A Suppose v = 10. The 90th percentile of the ¢ distribution with 10 degrees of freedom is 1(.90; 10) = 1.372. Because the t distribution is symmetrical about 0, we have 1(.10; 10) = —1.372. F distribution Let x(1) and y%(v,) be two independent x? random variables. We then define: x7). x7) y Vr (1.42) F(%4, v2) where x%(r4) and x2(v») are independent Numerator Denominator df The F distribution has two parameters, the numerator degrees of freedom and the denominator degrees of freedom, here v; and v2, respectively. Table A—4 in the Appendix contains percentiles of various F distributions. We define F(A; 14, »») as follows: (1.43) PAF(4), 2) S F(A; 11, )} =A Suppose v; = 2, v= 3. The 90th percentile of the F distribution with 2 and 3 degrees of freedom, respectively, in the numerator and denominator is F(.90; 2, 3) = 5.46 1.5 Statistical estimation / 9 Percentiles below 50 percent can be obtained by utilizing the relation: (1.44) F(A; 4, 2) = Fad ry Thus, F(.10; 3, 2) = 1/F(.90; 2, 3) = 1/5.46 = .183. The following relation exists between the and F random variables: (1.45) («oP = FA, ») and the percentiles of the ¢ and F distributions are related as follows: (1.456) [0.5 + A; IP = F(A; 1, ») 1.5 STATISTICAL ESTIMATION Properties of estimators (1.46) An estimator 6 of the parameter @ is unbiased if: E(6) = 6 (1.47) An estimator 6 is a consistent estimator of 0 if: Jim P(|6 - |=) = for any © >0 (1.48) An estimator 6 is a sufficient estimator of 6 if the conditional joint probability function of the sample observations, given 6, does not depend on the parameter 6. (1.49) An estimator 6 is a minimum variance estimator of @ if for any other estimator 0*: 0°(6) = 07(6*) for all 6* Maximum likelihood estimators The method of maximum likelihood is a general method of finding estimators. Suppose we are sampling a population whose probability function f(Y; 8) in- volves one parameter, 4, Given independent observations ¥;,...,Y,, the joint probability function of the sample observations is: (1.50a) ghi,--..¥n) = [] Fs 4) ist When this joint probability function is viewed as a function of 9, with the obser- vations given, it is called the likelihood function L(6). (1.506) 106) = ss 8) 40 / Some basic results in probability and statistics Maximizing L(6) with respect to @ yields the maximum likelihood estimator of 4. Under quite general conditions, maximum likelihood estimators are consistent and sufficient. Least squares estimators The method of least squares is another general method of finding estimators The sample observations are assumed to be of the form (for the case of a single parameter 0): ash Y= fi(®) + &; where f(@) is a known function of the parameter @ and the ¢; are random varia- bles, usually assumed to have expectation E(¢;) = 0. With the method of least squares, for the given sample observations, the sum of squares: (1.52) Q= > 1% — f(OP is considered as a function of @. The least squares estimator of 6 is obtained by minimizing Q with respect to 6. In many instances, least squares estimators are unbiased and consistent. 1.6 INFERENCES ABOUT POPULATION MEAN—NORMAL POPULATION ‘We have a random sample of n observations Y;,...,¥, from a normal popula- tion with mean yz and standard deviation 7. The sample mean and sample stand- ard deviation are: 7 7 Sa- vy |” -1 (1.53a) i (1.53b) se and the estimated standard deviation of the sampling distribution of Y is: (1.53) We then have: Y-p. . 7 (54) = 7 distributed as ¢ with n — 1 degrees of freedom when the st random sample is from a normal population. 1.6 Inferences about population mean—normal population / 11 Interval estimation The confidence limits for j. with a confidence coefficient of 1 — « are ob- tained by means of (1.54): (1.55) ¥ + x(1 — a/2;n — 1)s(¥) Example 1. Obtain a 95 percent confidence interval for . when: n=10 F=20 s=4 We require: s(¥) 1.265 1(.975; 9) = 2.262 10 so that the confidence limits are 20 -+ 2.262(1.265). Hence, the 95 percent con- fidence interval for yx is: 17.1 = =22.9 Tests One-sided and two-sided tests concerning the population mean jz are con- structed by means of (1.54), based on the test statistic: (1.56) t= Y) Table 1.1 contains the decision rules for each of three possible cases, with the risk of making a Type I error controlled at a. TABLE 1.1 Decision rules for tests concerning mean j1 of normal population Alternatives Decision Rule @) Ho: k= Ho If |r| $11 ~ al2;n ~ 1), conclude Ho He: WF bo If |r| > 11 ~ al2;n ~ 1), conclude H, where: _ pa too s) 0) Ho: = bo If = (a; n— 1), conclude Ho Heh < Ho If <1(a;n~ 1), conclude H, ©) Ho: WS bo If = 1(1 — an — 1), conclude Ho Hu: w> bo If f* > ¢(1 — a;n— 1), conclude H, Some basic results in probability and statistics Example 2. Choose between the alternatives: Ho: = 20 Ha: fe > 20 when a is to be controlled at .05 and: We require: 1(.95; 14) = 1.761 so that the decision rule is: If r* = 1.761, conclude Ho If r* > 1.761, conclude H, Since r* = (24 — 20)/1.549 = 2.58 > 1.761, we conclude H,. Example 3. Choose between the alternatives: Ho: p= 10 Ha: » #10 when a is to be controlled at .02 and: n=25 ¥=5.7 s=8 We require: 1(.99; 24) = 2.492 so that the decision rule is: If |#*| = 2.492, conclude Ho If |r*| > 2.492, conclude H, where the symbol | | stands for the absolute value. Since |1*| = |(5.7 — 10)/1.6| = | 2.69] = 2.69 > 2.492, we conclude Hy. P-value for sample outcome. The P-value for a sample outcome is the probability that the sample outcome could have been more extreme than the observed one when f= lo. Large P-values support Ho while small P-values support H,,. A test can be carried out by comparing the P-value with the specified a risk. If the P-value equals or is greater than the specified a, Ho is concluded. If the P-value is less than a, H, is concluded. 1.7 Comparisons of two population means—normal populations / 13 Example 4, In Example 2, r* = 2.58. The P-value for this sample outcome is the probability P[¢(14) > 2.58]. From Table A-2, we find 1(.985; 14) = 2.415 and 1(.990; 14) = 2.624. Hence, the P-value is between .010 and .015. In fact, it can be shown to be .011. Thus, for a = .05, H, is concluded. Example 5. In Example 3, r = —2.69. We find from Table A-2 that P[t(24) < —2.69] is between .005 and .0075. In fact, it can be shown to be 0064. Because the test is two-sided and the f distribution is symmetrical, the two-sided P-value is twice the one-sided value, or 2(.0064) = .013. Hence, for @ = .02, we conclude H,. Relation between tests and confidence intervals. There is a direct relation between tests and confidence intervals. For example, the two-sided confidence limits (1.55) can be used for testing: Ho: &= bo Ha: b¥ bo If 4 is contained within the 1 — a confidence interval, then the two-sided deci- sion rule in Table 1.1a, with level of significance a, will lead to conclusion Ho, and vice versa. If jg is not contained within the confidence interval, the decision rule will lead to H,, and vice versa. There are similar correspondences between one-sided confidence intervals and one-sided decision rules 1.7 COMPARISONS OF TWO POPULATION MEANS—NORMAL POPULATIONS, Independent samples There are two normal populations, with means 1, and 2, respectively, and with the same standard deviation o. The means j1; and 1p are to be compared on the basis of independent samples for each of the two populations: Sample 1: ¥j,..-,¥n, Sample 2: Z,,...,Zn, Estimators of the two population means are the sample means: xy (1.57) F- ny > (1.57b) Z=— ny and an estimator of py ~ wo is ¥ ~ Z. Some basic results in probability and statistics An estimator of the common variance o? is: DP + Ya ZP (1.58) 2 = 2 ny + 1p and an estimator of o*(Y — Z), the variance of the sampling distribution of Y¥ -Z, is: (1.59) We have: , P-D-(n- (1.60) =e is distributed as ¢ with n; + np — 2 degrees 5 of freedom when the two independent samples come from normal populations with the same standard deviation. Interval estimation. The confidence limits for jz, — pa with confidence coefficient 1 — a are obtained by means of (1.60): (1.61) (¥ —Z) * (1 — a2; ny + mp — 2s(¥ —Z) Example 6. Obtain a 95 percent confidence interval for 4 — pt. when: m=10 Y=14 S,-Y¥)?=105 m=20 Z=8 %Z,-ZP=224 We require: >, 105 + 224 =a 10+ 20-2 ao ss. 11 2(F — Z) = 11.75(— + ——] = 1.7625 s¥-Z) uas(4 +) 1.76. s(¥ — Z) = 1.328 1(.975; 28) = 2.048 3.3 = (14 — 8) — 2.048(1.328) = py — we = (14 — 8) + 2.048(1.328) = 8.7 Tests. One-sided and two-sided tests concerning jz, — 2 are constructed by means of (1.60). Table 1.2 contains the decision rules for each of three possible cases, based on the test statistic: (1.62) ee te s(¥ — Z) with the risk of making a Type I error controlled at a. 1.7 Comparisons of two population means—normal populations / 15 TABLE 1.2 Decision rules for tests concerning means j1 and jup of two normal populations (0; = 72 = a) Alternatives Decision Rule (a) Ho: ba = ba If |r*| = 1(1 ~ a/2; m + m — 2), conclude Ho Hai a F ba If |r*| > 1(1 ~ a2; n: + m2 ~ 2), conclude H, where: (b) Hoi by = ha If = 1a; ny + my — 2), conclude Hy Ha by < Ba If f < 1a; m + my — 2), conclude H (©) Ho’ tr = bo If f S11 — am + ng — 2), conclude Ho He: by > Ho If > 11 — am + my ~ 2), conclude Hy Example 7. Choose between the alternatives: Ho: Hi = be Ag: py # ba when @ is to be controlled at .10 and the data are those of Example 6. We require t(.95; 28) = 1.701, so that the decision rule is: If |r| = 1.701, conclude Ho If |¢*| > 1.701, conclude H, Since |1*| = |(14 — 8)/1.328| = [4.52] = 4.52 > 1.701, we conclude H,. The one-sided P-value here is the probability P[s(28) > 4.52]. We see from Table A-2 that this P-value is less than .000S. In fact, it can be shown to be .00005. Hence, the two-sided P-value is .0001. For a = .10, the appropriate conclusion therefore is H,. Paired observations ‘When the observations in the two samples are paired (e.g., attitude scores ¥; and Z; for the ith sample employee before and after a year’s experience on the job), we use the differences: (1.63) WeH=¥-Z i=1,..,n in the fashion of a sample from a single population. Thus, when the W; can be treated as observations from a normal population, we have: 16 / Some basic results in probability and statistics W ~ (1 ~ be) s(W) when the differences W; can be considered to be observations from a normal population and: > Sw, - wy 2) =— (1.64) is distributed as ¢ with n — 1 degrees of freedom 1.8 INFERENCES ABOUT POPULATION VARIANCE—NORMAL POPULATION When sampling from a normal population, the following holds for the sample variance s? where s is defined in (1.53b): (n= D). a (1.65) ~~ is distributed as y with n — 1 degrees of freedom when 3 the random sample is from a normal population. Interval estimation The lower confidence limit Z and the upper confidence limit U in a confidence interval for the population variance a? with confidence coefficient 1 — @ are obtained by means of (1.65): (n — 1s* (n — 1)s? 1.66 L=-—= == ee XL = a2; n = 1) xX(al2; n— 1) Example 8. Obtain a 98 percent confidence interval for o”, using the data of Example 1 (n = 10, s = 4) We require: 16 x(.01;9)=2.09 _y°(,99; 9) = 21.67 = 209 2 gn - 00) 6.6 = 21.67 2.09 = 68.9 Tests One-sided and two-sided tests concerning the population variance a? are con- structed by means of (1.65). Table 1.3 contains the decision rule for each of three possible cases, with the risk of making a Type I error controlled at a. 1.9 Comparisons of two population variances—normal populations / 17 TABLE 1.3 Decision rules for tests concerning variance a® of normal population Alternatives Decision Rule @) If x°(@/2; n — 1 X70 = ai2;n- 1), conclude Hy Hy 0? 4 0% Otherwise conclude Hy ) - ne Hy: 0? = 03 ODS 2a; ~ 1), conclude Ho oe = He Hy: 0? < 03 If ape < x(a; n ~ 1), conclude H, oe © =e He: 0? $03 1 2D 01 — a; 0 — 1), conclude Ho a = Hs Hg o? > oR 1 CoD 5 20 - a; n— 1), conclude Hy ob 1.9 COMPARISONS OF TWO POPULATION VARIANCES—NORMAL POPULATIONS Independent samples are selected from two normal populations, with means and variances of 4, and oj and p12 and 03, respectively. Using the notation of Section 1.7, the two sample variances are: Sa - vy (1.67a) 33 = mal Da-zyP (1.67) 3=— m-i We have: (1.68) 5 + +2 is distributed as F(m — 1, m2 — 1) when the two inde- oT o2 pendent samples come from normal populations. Interval estimation The lower and upper confidence limits L and U for 07/03 with confidence coefficient 1 — a are obtained by means of (1.68): 18 / Some basic results in probability and statistics 1 F(1 — al2;m — 1, 7m — I) St 1 8 F(ai2;n, —1,m—-1 2 1 2 2 (1.69) U Example 9. Obtain a 90 percent confidence interval for 7/03 when the data are: nm = 16 ng = 21 s} = 54.2 s3=178 We require: F(.05; 15, 20) = 1/F(.95; 20, 15) = 1/2.33 = .429 F(.95; 15, 20) = 2.20 54.2 1 _ of _ 542 1 14=—— > == = TA 17.8 2.20 6% 17.8 .429 Tests One-sided and two-sided tests concerning o7/c3 are constructed by means of (1.68). Table 1.4 contains the decision rules for each of three possible cases, with the risk of making a Type I error controlled at @. TABLE 1.4 Decision rules for tests concerning variances 0? and a3 of two normal populations Alternatives Decision Rule (a) of = 0% If F(ai2; my — 1, m2 — 1) = F(1 ~ al; n; — 1, m — 1), conclude Hy Ha of 4 0} Otherwise conclude H., ) era my ~ 1, nz ~ 1), conclude Hy if oT < F(a: m — 1, nz 1), conclude H, 4 © Hy: 0 = 03 It} SFU ~ am ~ 1, ~ 1), conclude Ho Ho} > 0} 2 ap > Fl ~ asm, ~ I,m ~ 1), conclude H, 4.9 Comparisons of two population variances—normal populations / 19 Example 10. Choose between the alternatives: Hy: 03 = 03 Hy 0} 4 03 when @ is to be controlled at .02 and the data are those of Example 9. We require: F(.01; 15, 20) = 1/F(.99; 20, 15) = 1/3.37 = .297 F(.99; 15, 20) = 3.09 so that the decision rule is: 2 If .297= Fal 3.09, conclude Hy 3 Otherwise conclude H, Since st/s3 = 54.2/17.8 = 3.04, we conclude Ho.

You might also like